Upload
oana-spanu
View
217
Download
0
Tags:
Embed Size (px)
DESCRIPTION
curs
Citation preview
Oxytricha as a modern analog ofancient genome evolutionAaron David Goldman and Laura F. Landweber
Department of Ecology and Evolutionary Biology, Princeton University, Guyot Hall, Princeton, NJ 08544, USA
Review
Several independent lines of evidence suggest that themodern genetic system was preceded by the ‘RNAworld’ in which RNA genes encoded RNA catalysts.Current gaps in our conceptual framework of early ge-netic systems make it difficult to imagine how a stableRNA genome may have functioned and how the transi-tion to a DNA genome could have taken place. Here weuse the single-celled ciliate, Oxytricha, as an analog tosome of the genetic and genomic traits that may havebeen present in organisms before and during the estab-lishment of a DNA genome. Oxytricha and its closerelatives have a unique genome architecture involvingtwo differentiated nuclei, one of which encodes thegenome on small, linear nanochromosomes. While itsunique genomic characteristics are relatively modern,some physiological processes related to the genomesand nuclei of Oxytricha may exemplify primitive states ofthe developing genetic system.
Early genome evolutionThe modern genetic system requires the synthesis andfunctional orchestration of three distinct biopolymers:DNA, RNA, and proteins. This complex system was likelypreceded by a stage in which RNA played a central roleboth in information storage and as the only genetically-encoded catalyst (Figure 1) [1,2]. The early prominence ofRNA is substantiated by its ability to store genetic infor-mation, as in mRNA, and to impart catalysis, as demon-strated by the abundance of catalytic RNAs present innature and produced in laboratories [3]. The primacy offunctional RNAs in the process of protein translation(transfer and ribosomal RNAs and other functional RNAsthat modify them), coupled to the ubiquity of those RNAsacross all extant life, suggests that the translation systememerged from an RNA-catalyzed metabolism [4]. The cen-tral role of nucleotide-derived cofactors (such as ATP,NADH, and CoA) in metabolism is consistent with a sce-nario in which those functions were previously catalyzedby ribozymes [5].
The catalytic range of RNA is limited and a ribozyme-based metabolic system probably remained dependent onthe background chemistry from which it emerged. Thedevelopment of protein translation may have evolved asa mechanism to bring this crucial chemistry under thecontrol of genetically-encoded enzymes [6]. Deoxyribonu-cleotides were probably unavailable until the evolution ofribonucleotide reductase proteins [7], implying that the
Corresponding author: Goldman, A.D. ([email protected]).
382 0168-9525/$ – see front matter � 2012 Elsevier Ltd. All righ
development of the DNA genome was not even possibleuntil substantial evolution of protein enzymes had takenplace. By this point, the translation system seems to havereached a moderate level of its modern sophistication andthe range of protein fold architectures encoded by earlygenomes had significantly expanded [8].
The transition from an RNA genome to a DNA genome isnot well understood. Many protein fold architectures seemto have evolved before this stage, because modern ribonu-cleotide reductase enzymes fall into three distinct classesthat share no noticeable similarity in amino acid sequencebut appear to be homologous when their active site aminoacids are compared in 3D structure alignments [9]. Al-though DNA-processing functions are similar across thetree of life, no ancient core of enzymes can be detected bysequence comparison (Figure 2) [10]. Six distinct familiesof DNA polymerase are known, but only those with specificfunctions related to excision repair have a universal taxo-nomic distribution [11]. Two distinct families of DNAprimase, one bacterial and one archaeal/eukaryotic, areobserved in modern life. A similar phylogenetic pattern isobserved in DNA ligases. This lack of a universal DNAmetabolism may imply that a complete protein-catalyzedDNA-processing system was not present in the last uni-versal common ancestor (LUCA) or that ancient non-ortho-logous gene displacements [12], in either the ancestor ofBacteria or the ancestor of Archaea and Eukarya, erasedthe phylogenetic evidence of most DNA processes in LUCA.
In lieu of the current inability to reconstruct early ge-nome-related metabolism through bioinformatics, someresearchers have used features of modern biological systemsas analogs to traits of ancient organisms and their genomes.For example, it has been argued that viruses provide anideal evolutionary platform to acquire a DNA genome in anRNA world and to distribute this trait to cellular life [13]. Asimilar approach compares the notion of early genomes to aciliate macronucleus in which genes are encoded on smalllinear chromosomes [14]. Here, we expand the latter ideaand use the remarkable genetic system of the ciliate genusOxytricha [15] to improve our understanding of the earlytransition from RNA to DNA genomes.
Oxytricha
Oxytricha is a genus of single-celled ciliated protists. Theyare predatory, mitochondrion-bearing, free-living organ-isms that inhabit freshwater environments. Its lineagediverged �1 Gya ago from the common ancestor of Tetra-hymena and Paramecium [16]. Oxytricha spp, like most
ts reserved. doi:10.1016/j.tig.2012.03.010 Trends in Genetics, August 2012, Vol. 28, No. 8
Informational RNAs
AUGCGGCAUUAUGGGAC..
Functional RNAs
Prebiotic reaction networks
Functional proteins
Metabolicnetworks
Informational RNAsFunctional RNAs
AUGCGGCAUUAUGGGAC..
Functional proteins
Metabolicnetworks
Informational RNAsFunctional RNAs
AUGCGGCAUUAUGGGAC..
DNA genome
(a)
(b)
(c)
TRENDS in Genetics
Figure 1. The development of the modern genetic system from an RNA-dominated precursor genetic system. (a) The first genetic system probably involved informational
RNAs encoding ribozymes which facilitated the replication of those informational RNAs [1]. Given the narrow catalytic range of ribozymes, this system probably relied on
substantial networks of prebiotic chemistry to provide activated nucleotides [6]. (b) Protein synthesis by translation most likely arose from this RNA-based system [7] and
rapidly developed into a highly processive, high-fidelity system [8]. Appropriately, the translation system is dominated by functional RNAs, including the ribosome itself,
which has a ribozyme active site in its highly conserved core [57,58]. (c) The DNA genome probably arose from an RNA–protein precursor system. Deoxyribonucleotides
seem to have been unavailable until the evolution of the ribonucleotide reductase protein enzymes [7]. Unlike translation, DNA replication and processing are dominated by
protein functions rather than RNA functions, and core DNA-related functions do not appear to be universally conserved [10,11]. In the absence of significant bioinformatic
evidence, the transition from an RNA genome to a DNA genome remains enigmatic.
Review Trends in Genetics August 2012, Vol. 28, No. 8
ciliates, have two types of nuclei, a micronucleus and amacronucleus (reviewed in [17]). The macronucleus istranscriptionally active during vegetative growth, whereasthe micronucleus is almost always transcriptionally silent.However, only the micronucleus is exchanged during theciliate sexual cycle, after which a new macronucleus andmacronuclear genome are formed from micronuclear DNA.Although these general traits are common throughout thephylum Ciliophora, the architectures of the macronuclearand micronuclear genomes, as well as the process of mac-ronuclear development, differ among ciliate taxa.
The Oxytricha micronuclear genome contains approxi-mately 1Gb of sequence, while the macronuclear genomecontains approximately 50Mb of sequence, representing a95% reduction in genome content during development [16].
In addition, thousands of micronuclear genes are scram-bled with respect to their macronuclear counterparts, withsegments of micronuclear genes present in a permuted orinverted order relative to their order in the macronucleus(Figure 3) (reviewed in [18]). Following sexual exchange ofhaploid micronuclei, the macronuclear genome assemblesfrom dispersed segments of micronuclear DNA through aprocess of genome rearrangement that is guided by mac-ronuclear RNA templates (Figure 3) [19]. It is likely thatthese RNA templates represent a transient cache of theentire macronuclear genome during this developmentalstage.
The roles of RNA may surpass those of DNA in regulat-ing the information in the genome of Oxytricha at threelevels. At the first level, RNA transcripts of complete
383
Polα
DNAprimases
Hel
DNAligases
ATP NAD
DNA polymerase familiesRibonucleotide
reductase classes
A B CI II III D X Y
Nanoarchaeum equitansPyrobaculum aerophilumAeropyrum pernixSulfolobus
Thermoplasma
Methanosarcina
Pyrococcus
Methanococcales
Apicomplexa
Streptophyta
Saccharomycetaceae
Diptera
Caenorhabditis
GnathostomataFirmicutesChlamydiaceae
Bacteroidales
ActinobacteridaePlanctomycetaceaeLeptospira
Spirochaetaceae
Cyanobacteria
DeinococciAcidobacteria
CampylobacteralesProteobacteria subclades
Alphaproteobacteria
Archaeoglobus fulgidusHalobacterium sp. NRC-1
Methanobacterium thermoautotrophicumMethanopyrus kandleri
Giardia lambliaLeishmania majorThalassiosira pseudonana
Cyanidioschyzon merolae
Disctyostelium discoideumSchizosaccharomyces pomber
Fibrobacter succinogenesChlorobium tepidum
Fusobacterium necleatumAquifex aeolicusThermotoga maritima
Dehalococcoides ethenogenes
Desulfovibrio vulgarisGeobacter sulfurreducensBdellovibrio bacteriovorus
TRENDS in Genetics
Figure 2. A phylogenetic distribution of key enzymes involved in DNA synthesis. Unlike the protein translation system, very few features of DNA synthesis and processing are
universally conserved. Ribonucleotide reductase is an enzyme required to produce deoxyribonucleotides from ribonucleotides. It is found in three distinct classes, I, II, and III,
although ancient homology between them can be inferred from structural and mechanistic similarity. Six distinct families of DNA polymerases are known. None of the four
standard DNA polymerase families (A, B, C, and D) has a universal taxonomic distribution. DNA polymerase families X and Y are universally distributed, but impart functions
that are related to excision repair rather than DNA replication. The DNA polymerase X family catalyzes non-template-dependent DNA synthesis, while the DNA polymerase Y
family polymerizes short segments across lesions. Bacteria use an ATP-dependent DNA ligase that is unrelated to the NADH-dependent DNA ligase used by Eukarya and
Archaea. Similarly, Bacteria use a helicase associated DNA primase, whereas Archaea and Eukarya use a DNA polymerase a-associated DNA primase. The lack of a universally
distributed set of enzymes involved in DNA synthesis suggests that modern pathways were still in the process of forming during the time of the last universal common ancestor
(LUCA). Alternatively, DNA-related pathways may simply be more evolutionarily malleable than, for example, translation pathways, and this property would obscure their
ancient phylogenetic signatures. The universal phylogenetic tree was previously generated in [59] and is based on 31 universal gene sequences from 191 genomes. The tree
image was produced using the Interactive Tree of Life web server [60]. Clades representing groups of 25–40% similarity were collapsed to conserve space. Taxonomic
distribution of ribonucleotide reductase enzymes were identified from the RNR database [61]. Taxonomic distributions of DNA polymerase families, DNA ligases, and DNA
primases are extrapolated from [10,11], and do not represent a resolution capable of illustrating horizontal gene transfer. Ciliates are members of the Alveolata.
Review Trends in Genetics August 2012, Vol. 28, No. 8
nanochromosomes from the previous generation can pro-gram the pattern of DNA rearrangements during macro-nuclear development [19]. The microinjection of syntheticRNA molecules into Oxytricha cells can introduce an
384
alternative order of micronuclear DNA segments in theresulting progeny [18,19]. These new DNA rearrangementpatterns can transfer to the sexual offspring of thoseprogeny and even their progeny’s progeny. Given that
1 2 3 4
2 4 1 3Micronuclear chromosomes
Macronuclear nanochromosomes
... ...
Old macronucleus
TranscriptiondsDNA
1 2 3 4
2 4 1 3... ...New micronucleus
2 4 1 3... ...2 4 1 3... ...2 4 1 3... ...
Developing macronucleus
2 4 1 3
(a) (b)
(c)(d)
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 41 2 3 4
Developingmacronucleus
1 2 3 4
Micronuclearmeiosis
ssRNA
TRENDS in Genetics
Figure 3. A model for development of the Oxytricha macronuclear genome
following conjugation. During conjugation (center) the micronucleus undergoes
meiosis to produce four haploid nuclei, two of which exchange between partnering
cells to form a new diploid micronucleus. During this process, the old macronucleus
degrades and a new macronucleus differentiates from one copy of the new
micronucleus [22]. The outer panels depict the process of macronuclear genome
development by DNA rearrangement. (a) Conjugation triggers transcription of old
macronuclear chromosomes into RNA. (b) The old macronucleus becomes
dismantled, while the RNA transcripts of the chromosomes are retained and
transported to the developing macronucleus. (c) The micronucleus replicates by
mitosis and one micronucleus undergoes DNA amplification to produce material for
the macronuclear genome. (d) Segments of micronuclear DNA (numbered 1–4) are
reorganized using the macronuclear transcripts as a template for RNA-guided DNA
rearrangement (including inversion of segment 3). Red bars indicate telomeres at the
ends of nanochromosomes. Orange rectangles indicate deleted micronuclear DNA
that separates DNA segments retained in the macronucleus.
Review Trends in Genetics August 2012, Vol. 28, No. 8
the micronuclear DNA remains unchanged, the inheri-tance of altered rearrangement patterns in Oxytrichaappears to be a transgenerational RNA-mediated epige-netic phenomenon.
At the second level, point substitutions can also transferfrom the RNA template to the macronuclear DNA [19],particularly near regions where junctions form betweenmacronuclear segments. These point substitutions can alsotransfer to the sexual progeny and their progeny’s progeny.Given that the micronuclear DNA does not share thesepoint substitutions [19], this observation implicates a rolefor RNA-templated DNA repair [20] in DNA rearrange-ment. These somatically acquired point mutations repre-sent another level at which epigenetically-inherited RNAmolecules instruct the sequence and interpretation of theDNA genome.
At the third level, the RNA macronuclear genome cachealso appears to be responsible for determining the copynumber of macronuclear chromosomes. Artificially in-creasing or decreasing the available levels of RNA chro-mosome templates by microinjection or RNAi, respectively,leads to a relative increase or decrease in the copy number
of the corresponding DNA molecules in the next genera-tion. This effect also lasts at least two sexual generations[21], demonstrating a further example of RNA-mediatedtransgenerational epigenetic inheritance in Oxytricha.
Apart from its unique sequence features, ciliate micro-nuclear genomes have a normal eukaryotic structure.Their genome architecture is in the form of large chromo-somes with telomeres and a centromere, and micronucleireproduce via mitosis during cell division. During thesexual cycle the diploid genome undergoes meiosis toproduce haploid gametes, one of which is retained andthe other of which passes to the mating partner(Figure 3) [22,23].
The macronucleus is very different. The macronucleargenome contains on the order of 20 million small DNAchromosomes, or ‘nanochromosomes’, most of which encodea single protein-coding gene or functional RNA. In fact, thelack of a centromere has led some to argue that the term‘chromosome’ is inappropriate for macronuclear DNA [16].The extraordinary number of DNA molecules in the mac-ronucleus results from approximately 20 000 uniquenanochromosomes averaging roughly 1000 copies per mac-ronucleus. Their average length is approximately 2.7 kb[24]. These unusual properties of the Oxytricha macronu-clear genome and macronucleus, and the powerful role ofRNA in sculpting these genomes, offer a compelling systemwithin which to consider possible transitions from simpleRNA genomes to complex DNA genomes.
Oxytricha and early genome replicationSmall, single-gene chromosomes, such as those in theOxytricha macronucleus, represent one of the simplestpossible states of a genome and thus were probably pre-decessors to more complex genome architectures. A ge-nome of small linear chromosomes would have presentedless of a challenge to primitive polymerases [14], whichprobably copied nucleic acids with low fidelity and wereunable to process long sequences. The nature of theseprimitive DNA polymerases is unknown. None of the fourfamilies of standard DNA polymerases has a universaldistribution [11], although the sliding clamp function ofthe DnaN polymerase in E. coli and the 50–30 exonucleasefunction of the Pol1-A polymerase in E. coli appear to havebeen present in LUCA [25,26]. Three subunits of DNA-dependent RNA polymerases appear to be universal aswell [25]. Structural and functional comparisons of DNA-dependent RNA polymerases suggest that they may sharea multi-subunit ancestor with proofreading capabilitiesthat was present in LUCA [27].
It is generally assumed that the RNA-only stage in thedevelopment of the genetic system would have required anRNA-dependent RNA polymerase ribozyme to have repli-cated the genome. Although no such enzyme has beenfound in extant biology, several have been produced syn-thetically through laboratory evolution techniques [24,28–30]. So far, all of these ribozymes are over a hundrednucleotides long and exhibit very tight constraints onsequence space, making it difficult to imagine how similarribozymes could have evolved de novo in an RNA worldscenario. In addition, even the most capable of theselaboratory-generated polymerase ribozymes is not able
385
Review Trends in Genetics August 2012, Vol. 28, No. 8
to sustain the processivity required to replicate RNAmolecules of its own size or larger.
During the process of Oxytricha genome rearrangement,segments of DNA from the micronuclear genome assembleaccording to RNA templates of the macronuclear genome(Figure 3). This process represents a unique scenario inextant biology in which a complete copy of a genome isproduced, not by polymerizing a complementary strandone nucleotide at a time, but by recycling DNA polymersfrom a precursor genome. It is likely that these pieces ofmicronuclear DNA ligate together after assembling on thecomplementary RNA template, although there is also evi-dence that gaps or errors between the DNA segments arerepaired by the activity of an RNA-dependent DNA poly-merase [19].
A similar mode of replication would have conferredseveral benefits to early life and perhaps created a viableselection regime in which polymerases with high fidelityand processivity might have evolved. In contrast to ribo-zyme polymerases, several ribozyme ligases are present inmodern organisms [3] and more have been synthesized bydirected evolution [31,32]. Polymerases are in fact a spe-cialized kind of ligase in which one of the ligated partners isa single nucleotide [31]. It follows, then, that the centralchallenge to a polymerase is not the catalytic step ofligation, but the ability to perform that step repeatedlyover the full length of a gene-sized molecule, a limitationthat is borne out by the difficulty of producing a highlyprocessive ribozyme polymerase [24,29,30].
If early nanochromosomes replicated in an Oxytricha-like fashion, the number of catalytic ligation steps would bemuch smaller than that in a complete polymerase-depen-dent replication. The source of these DNA segments in aprimitive system is not clear. Perhaps if the GC% was veryhigh or very low, the sequence complexity of the nanochro-mosomes would also be low, and short abiotically synthe-sized segments with random sequences [33,34] wouldprovide enough matches to the template to permit assem-bly of most of the genome from these small, modular pieces[35]. The need to fill or repair small gaps between segmentswould create a selective environment for the evolution of aweakly processive polymerase into the ancestor of a mod-ern, highly processive polymerase. This model of earlygenome replication is consistent with the observation thatthe only universally conserved DNA polymerase familiesare involved in excision repair (Figure 2). Once a high-fidelity, high-processivity polymerase became available,genome replication could move towards its current poly-merase-dependent form and longer chromosome lengthswould be possible.
Oxytricha and early cell divisionIn most Eukaryotes, cell division is orchestrated by thecomplex process of mitosis, wherein duplicate chromo-somes segregate evenly between the dividing cells. Theprocess is controlled by dynamic motor complexes that pullchromosomes along organized microtubules [36,37]. Func-tionally analogous but non-homologous processes arethought to take place in Bacteria [38–40] and Archaea[41,42]. It is difficult to imagine that such a complex systemwas present in early life forms. Early cell division probably
386
involved uncontrolled membrane division with chromo-somes segregating at random.
Similarly, the Oxytricha macronucleus does not divideby way of mitosis. The approximately 20 million nanochro-mosome molecules probably present an overwhelmingchallenge to organized mitotic segregation. Although ami-tosis in Oxytricha is microtubule-dependent [43,44], thesemicrotubules appear to control membrane division ratherthan chromosome segregation. Macronuclear nanochromo-somes lack centromeres to which mitotic motors, or kine-tochores, would normally attach [16,45]. As a result, thesegregation of DNA between macronuclei is unpredictableand often uneven [16,46]. Amitotic division of macronucleiseems to have arisen early in the ciliates [47] althoughprevious phylogenies have predicted three independentorigins of amitosis in ciliates, with one origin in the com-mon lineage of the genera, Oxytricha and Euplotes [48].
It is possible that the high chromosome copy-numbersobserved in Oxytricha, and to an equal or lesser extent inother ciliates, are related to the imprecise segregation ofchromosomes during amitotic division [49]. If a singlechromosome is duplicated and the two copies are allowedto segregate randomly to one of the two daughter cells,then the probability of losing that gene in one of thedaughter cells is 0.5. A greater number of chromosomeswill statistically ensure an approximately even segrega-tion of the chromosomes between daughter cells. Thisfeature of amitosis in Oxytricha may be similar to thedivision of primitive cells, which would have also benefitedfrom carrying chromosomes in high copy-numbers to safe-guard against uneven segregation.
Oxytricha and early genome stabilityA single common ancestor of all life is the most statisticallysatisfying explanation for common traits observed in mod-ern organisms [50]. This explanation, however, does notdistinguish between a single organism and a communityof organisms with highly pervasive lateral gene transfer[14]. Even if we assume the former scenario, the complexityof a single LUCA organism may have been generated in partby lateral gene transfer within a heterogeneous populationof organisms [26]. If early genomes did indeed resembleOxytricha macronuclear genomes, then the nature of theRNA-mediated gene transfer observed in Oxytricha [19,23]may also help describe the sort of communal inheritancethat preceded the predominantly vertical inheritance ofmodern organisms.
The nanochromosome structure of the macronucleargenome and its regeneration through RNA-template-di-rected DNA unscrambling provide a form of lateral genetransfer that differs from mechanisms described in anyother organisms. Unlike conventional conjugation in bac-teria or sexual reproduction in eukaryotes, an RNA-drivenepigenetic mode of inheritance does not require the intro-duction of new genes, but instead new ‘alleles’ can spreadvia conversion of existing ones (through RNA-guided mech-anisms). Allele frequencies can be increased or decreasedby the introduction of foreign nucleic acids, and theseacquired traits are passed on to subsequent generations.
These phenomena are similar to horizontal gene trans-fer, in that somatic DNA or RNA variants provide an
Review Trends in Genetics August 2012, Vol. 28, No. 8
external source of genetic variation. But the nanochromo-some structure of the macronuclear genome and its capac-ity to receive new alleles during the process of DNAarrangement make the Oxytricha macronucleus uniquelypermissive to somatically acquired genetic change. Never-theless, an epigenetic system such as that of Oxytricha isalso robust to such perturbations because the high copy-number of original alleles will initially act as a bufferagainst sequence change, restricting the spread of delete-rious somatic alterations. Perhaps early genomes withstructures similar to the Oxytricha macronucleus wouldalso be permissive to genetic acquisitions, but stableagainst their deleterious effects.
Oxytricha and early organismal identityThe genetic openness that existed during the transition tomodern life was probably also prone to invasions by selfishreplicators that may have easily infiltrated and takenadvantage of emerging organismal replicating systems[51]. This effect is generally modeled through self-propa-gating metabolism-like networks, or hypercycles. Thesereplicating entities may be parasitic if they either receivereplication support from the host system without confer-ring a reciprocal benefit, or shortcut the host system insome deleterious way. Vesicles can barricade replicatingsystems against selfish entities if they provide a mecha-nism of blocking the entry of external replicators [52].Selfish replicators can also be eventually incorporated intothe metabolism of the host system, balancing their delete-rious effects with beneficial ones [53].
Although the dynamics of nuclear dimorphism in Oxy-tricha do not resemble a hypercycle, the scrambling of themicronucleus and its rearrangement to form the macronu-clear genome illustrate the properties of stable systemsthat host selfish replicators. The unique genomic traits ofOxytricha seem to be both caused by and assisted by aninvasion of DNA transposons (typically regarded as selfishgenetic agents). The micronucleus hosts thousands oftransposons, which probably contributed to the scramblingof its genome, either through actual transposition or viaectopic recombination between transposons of the samefamily. Unlike domesticated transposases in other eukar-yotes, micronuclear transposons display evidence of puri-fying selection acting on their encoded proteins [23,54] andmay still be active outside the control of the host cell. Thepresence of active transposons in the micronucleus mayhave provided the selective pressure for acquisition of atemplate-directed genome unscrambling system as part ofmacronuclear development [55] as a mechanism for pro-moting the long-term stability of the genome and robust-ness to perturbations.
Recent discoveries reveal that micronuclear transpo-sons play a surprisingly direct role in both macronucleardevelopment and genome rearrangement [23]. Micronucle-us-limited transposase genes are expressed during macro-nuclear development, but silent during vegetative growth.The experimental silencing of these transposases by RNAiresults in aberrant unscrambling patterns in the macro-nuclear genome, suggesting that transposons play an ac-tive role in genome rearrangement. It is possible thatthe nanochromosome templates are composed of RNA to
protect the developing macronucleus from the integrationof active transposons. In this regard, Oxytricha seems tohave avoided the deleterious effects of internal transposonactivity through template-directed genome rearrangementthat, itself, employs the transposon proteins. Thus, theproperties of nuclear dimorphism and template-directedmacronuclear development in Oxytricha demonstrate theprinciples of spatial separation and metabolic incorpo-ration that are thought to make early replicating systemsresistant to selfish replicators.
Concluding remarksHere, we have discussed the nuclear dimorphism andgenome structures of Oxytricha to demonstrate severalplausible dynamics of early genetic systems during thetransition to modern genomes. Oxytricha is not by anymeans a ‘living fossil’, given that its phylum, Ciliophora, isboth eukaryotic and not particularly deep branching. How-ever, by analogy we have used Oxytricha to introduceseveral new hypotheses about early genomes. We invokethe process of template-directed genome rearrangement inOxytricha to model an evolutionary landscape in whichprotein polymerases could evolve gradually from ligases.We have also observed that the dynamics of Oxytrichaamitotic macronuclear division suggest that unmanagedcell division in early life could be viable if hereditarymolecules were present in high copy-numbers. Finally,we employed observations of lateral gene transfer andactive transposon mediation in Oxytricha to improve ourunderstanding of the consequences of genome instabilityfor early life. Although the particular genomic traits thatwe discuss are unique to Oxytricha and closely relatedgenera, we encourage the further exploration of extantorganisms, particularly those with atypical genetic sys-tems [56], to help elucidate features of early cellular life.
AcknowledgmentsWe thank members of the Landweber laboratory for critical discussions ofthis work. This work was supported by a National Aeronautics and SpaceAdministration Postdoctoral Program fellowship to A.D.G. and byNational Institutes of Health grant GM59708 and National ScienceFoundation grant 0923810 to L.F.L.
References1 Gilbert, W. (1986) The RNA world. Nature 319, 6182 Gesteland, R. and Atkins, J.F., eds (1993) The RNA World, Cold
Spring Harbor Laboratory Press3 Landweber, L. et al. (1998) Ribozyme engineering and early evolution.
Bioscience 48, 94–1034 Fox, G. (2010) Origin and evolution of the ribosome. Cold Spring Harb.
Perspect. Biol. 2, a0034835 White, H. (1976) Coenzymes as fossils of an earlier metabolic state. J.
Mol. Evol. 7, 101–1046 Goldman, A. et al. (2012) Evolution of the protein repertoire. In
Encyclopedia of Molecular Cell Biology and Molecular Medicine(Meyers, R.A., ed.), Wiley-VCH
7 Freeland, S. et al. (1999) Do proteins predate DNA? Science 286, 690–6928 Goldman, A. et al. (2010) The evolution and functional repertoire of
translation proteins following the origin of life. Biol. Direct 5, 159 Torrents, E. et al. (2002) Ribonucleotide reductases: divergent
evolution of an ancient enzyme. J. Mol. Evol. 55, 138–15210 Forterre, P. (2002) The origin of DNA genomes and DNA replication
proteins. Curr. Opin. Microbiol. 5, 525–53211 Filee, J. et al. (2002) Evolution of DNA polymerase families: evidences
for multiple gene exchange between cellular and viral proteins. J. Mol.Evol. 54, 763–773
387
Review Trends in Genetics August 2012, Vol. 28, No. 8
12 Koonin, E. (2003) Comparative genomics, minimal gene-sets and thelast universal common ancestor. Nat. Rev. Microbiol. 1, 127–136
13 Forterre, P. (2006) Three RNA cells for ribosomal lineages and threeDNA viruses to replicate their genomes: a hypothesis for the origin ofcellular domain. Proc. Nat. Acad. Sci. U.S.A. 103, 3669–3674
14 Woese, C. (1998) The universal ancestor. Proc. Natl. Acad. Sci. U.S.A.95, 6854–6859
15 Zoller, S. et al. (2012) Characterization and taxonomic validity of theciliate Oxytricha trifallax (Class Spirotrichea) based on multiple genesequences: limitations in identifying genera solely by morphology.Protist DOI: 10.1016/j.protis.2011.12.006
16 Prescott, D. (1994) The DNA of ciliated protozoa. Microbiol. Rev. 58,233–267
17 Prescott, D. (2000) Genome gymnastics: unique modes of DNAevolution and processing in ciliates. Nat. Rev. Genet. 1, 191–198
18 Nowacki, M. et al. (2011) RNA-mediated epigenetic programming ofgenome rearrangements. Annu. Rev. Genomics Hum. Genet. 12, 367–389
19 Nowacki, M. et al. (2008) RNA-mediated epigenetic programming of agenome-rearrangement pathway. Nature 451, 153–158
20 Storici, F. et al. (2007) RNA-templated DNA repair. Nature 447, 338–341
21 Nowacki, M. et al. (2010) RNA-mediated epigenetic regulation of DNAcopy number. Proc. Natl. Acad. Sci. U.S.A. 107, 22140–22144
22 Nowacki, M. and Landweber, L.F. (2009) Epigenetic inheritance inciliates. Curr. Opin. Microbiol. 12, 638–643
23 Nowacki, M. et al. (2009) A functional role for transposases in a largeeukaryotic genome. Science 324, 935–938
24 Green, R. and Szostak, J.W. (1992) Selection of a ribozyme thatfunctions as a superior template in a self-copying reaction. Science258, 1910–1915
25 Harris, J. et al. (2003) The genetic core of the universal ancestor.Genome Res. 13, 407–412
26 Becerra, A. et al. (2007) The very early stages of biological evolution andthe nature of the last common ancestor of the three major cell domains.Annu. Rev. Ecol. Evol. Syst. 38, 361–379
27 Poole, A. and Logan, D.T. (2005) Modern mRNA proofreading andrepair: clues that the last universal common ancestor possessed anRNA genome? Mol. Biol. Evol. 22, 1444–1455
28 Doudna, J. et al. (1991) A multisubunit ribozyme that is a catalyst ofand template for complementary strand RNA synthesis. Science 251,1605–1608
29 Johnston, W. et al. (2001) RNA-catalyzed RNA polymerization:accurate and general RNA-templated primer extension. Science 292,1319–1325
30 Wochner, A. et al. (2011) Ribozyme-catalyzed transcription of an activeribozyme. Science 332, 209–212
31 Bartel, D. and Szostak, J.W. (1993) Isolation of new ribozymes from alarge pool of random sequences. Science 261, 1411–1418
32 Landweber, L. and Pokrovskaya, I.D. (1999) Emergence of a dualcatalytic RNA with metal specific cleavage and ligase activities: thespandrels of RNA evolution. Proc. Natl. Acad. Sci. U.S.A. 96, 173–178
33 Huang, W. and Ferris, J.P. (2006) One-step, regioselective synthesis ofup to 50-mers of RNA oligomers by montmorillonite catalysis. J. Am.Chem. Soc. 128, 8914–8919
34 Aldersley, M. et al. (2009) RNA synthesis by mineral catalysis. Orig.Life Evol. Biosph. 39, 200
35 Kotler, L. et al. (1993) DNA sequencing: modular primers assembledfrom a library of hexamers or pentamers. Proc. Natl. Acad. Sci. U.S.A.90, 4241–4245
388
36 Sharp, D. et al. (2000) Microtubule motors in mitosis. Nature 407, 41–47
37 Maiato, H. et al. (2004) The dynamic kinetochore-microtubuleinterface. J. Cell Sci. 117, 5461–5477
38 Fogel, M. and Waldor, M.K. (2006) A dynamic, mitotic-like mechanismfor bacterial chromosome segregation. Genes Dev. 20, 3269–3282
39 Ptacin, J. et al. (2010) A spindle-like apparatus guides bacterialchromosome segregation. Nat. Cell Biol. 12, 791–798
40 Draper, G. and Gober, J.W. (2002) Bacterial chromosome segregation.Annu. Rev. Microbiol. 56, 567–597
41 Lundgren, M. and Bernander, R. (2007) Genome-wide transcriptionmap of an archaeal cell cycle. Proc. Natl. Acad. Sci. U.S.A. 104, 2939–2944
42 Cortez, D. et al. (2010) Evidence for a Xer/dif system for chromosomeresolution in Archaea. PLoS Genet. 6, e1001166
43 Tucker, B. et al. (1980) Microtubules and control of macronuclear‘amitosis’ in Paramecium. J. Cell Sci. 44, 135–151
44 Kushida, Y. et al. (2010) Amitosis requires gamma-tubulin-mediatedmicrotubule assembly in Tetrahymena thermophila. Cytoskeleton 68,89–96
45 Jung, S. et al. (2011) Exploiting Oxytricha trifallax nanochromosomesto screen for non-coding RNA genes. Nucleic Acids Res. 39, 7529–7547
46 Witt, P. (1977) Unequal distribution of DNA in the macronucleardivision of the ciliate Euplotes eurystomus. Chromosoma 60, 59–67
47 Katz, L. (2001) Evolution of nuclear dualism in ciliates: a reanalysis inlight of recent molecular data. Int. J. System. Evol. Microbiol. 51, 1587–1592
48 Orias, E. (1991) Evolution of amitosis of the ciliate macro-nucleus: gainof the capacity to divide. J. Protozool. 38, 217–221
49 Duerra, H. et al. (2004) Modeling senescence in hypotrichous ciliates.Protist 155, 45–52
50 Theobald, D. (2010) A formal test of the theory of universal commonancestry. Nature 465, 219–222
51 Smith, S. (2003) Nucleoprotein assemblies. Encycl. Nanosci. Nanotech.X, 1–10
52 Eigen, M. et al. (1981) The origin of genetic information. Sci. Am. 244,88–92
53 Konnyu, B. et al. (2008) Prebiotic replicase evolution in a surface-boundmetabolic system: parasites as a source of adaptive evolution. BMCEvol. Biol. 8, 267
54 Doak, T. et al. (1994) A proposed superfamily of transposase genes:transposon-like elements in ciliated protozoa and a common ‘D35E’motif. Proc. Natl. Acad. Sci. U.S.A. 91, 942–946
55 Klobutcher, L. and Herrick, G. (1997) Developmental genomereorganization in ciliated protozoa: the transposon link. Prog.Nucleic Acid Res. Mol. Biol. 56, 1–62
56 Reyes-Prieto, F. et al. (2012) Coenzymes, viruses and the RNA world.Biochimie DOI: 10.1016/j.biochi.2012.01.004
57 Cech, T. (2000) The ribosome is a ribozyme. Science 289, 878–87958 Hsiao, C. et al. (2009) Peeling the onion: ribosomes are ancient
molecular fossils. Mol. Biol. Evol. 26, 2415–242559 Ciccarelli, F.D. et al. (2006) Toward automatic reconstruction of a
highly resolved tree of life. Science 311, 1283–128760 Letunic, I. and Bork, P. (2011) Interactive Tree Of Life v2: online
annotation and display of phylogenetic trees made easy. Nucleic AcidsRes. 39, W475–W478
61 Lundin, D. et al. (2009) RNRdb, a curated database of the universalenzyme family ribonucleotide reductase, reveals a high level ofmisannotation in sequences deposited to Genbank. BMC Genomics10, 589
Replication timing and its emergencefrom stochastic processesJohn Bechhoefer1 and Nicholas Rhind2
1 Department of Physics, Simon Fraser University, Burnaby, BC, V5A 1S6, Canada2 Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester,
MA 01605, USA
Review
The temporal organization of DNA replication has puz-zled cell biologists since before the mechanism of repli-cation was understood. The realization that replicationtiming correlates with important features, such as tran-scription, chromatin structure and genome evolution,and is misregulated in cancer and aging has only deep-ened the fascination. Many ideas about replication tim-ing have been proposed, but most have been short onmechanistic detail. However, recent work has begun toelucidate basic principles of replication timing. In partic-ular, mathematical modeling of replication kinetics inseveral systems has shown that the reproducible repli-cation timing patterns seen in population studies can beexplained by stochastic origin firing at the single-celllevel. This work suggests that replication timing neednot be controlled by a hierarchical mechanism thatimposes replication timing from a central regulator,but instead results from simple rules that affect individ-ual origins.
Replication origins: correlated or independent?The duplication of the genome of a cell by DNA replicationis an essential step in the cell cycle. In bacteria, the overallsituation is straightforward, in that DNA replication initi-ates at a single, well-defined location in the genome (e.g.the oriC site in Escherichia coli) and terminates at asecond, well-defined region (ter in E. coli) [1]. Eukaryoticorganisms, with 10–1000 times more DNA and with 10–100 times slower replication forks, depend on the firing ofmultiple origins of replication along the DNA. These ori-gins are defined by a two-step process [2]. Licensing, thefirst step, occurs in G1 phase, when the origin recognitioncomplex (ORC) binds to chromatin and, with the aid ofCdc6 and Cdt1, loads onto the DNA head-to-head pairs ofthe barrel-shaped heterohexameric MCM complex, thecatalytic core of the replicative helicase [3,4]. Each pairof MCM complexes is a potential origin of DNA replication.Initiation (or origin firing), the second step, occurs in Sphase, when a pair of MCMs is activated via a complexprocess involving numerous proteins, including recruit-ment of Sld2, Sld3, the GINS complex and Cdc45, as wellas the phosphorylation of various components by the CDKand DDK replication kinases [5]. The regulation of thespatial binding of the ORC and the temporal activation
Corresponding author: Bechhoefer, J. ([email protected]).Keywords: DNA replication timing; stochastic models; replication initiation; ORC;MCM
374 0168-9525/$ – see front matter � 2012 Elsevier Ltd. All rights reserved. h
of MCMs largely determines the kinetics of replicationduring S phase, which is referred to as the replicationprogram.
The question of how replication programs are regulatedis an active, and sometimes controversial, field. Althoughthe specific mechanisms that control timing are still ob-scure, recent work has revealed basic principles that ap-pear to apply to eukaryotic replication in general. Inparticular, mathematical modeling of genome-wide repli-cation timing data shows that replication timing can beexplained by stochastic mechanisms. The significance ofthis conclusion is that it explains the regulation of replica-tion timing in terms of simple rules that affect the individ-ual probabilities of origin firing. In such models, replicationtiming is controlled by changing the firing rate of individ-ual origins, instead of by directly regulating the time atwhich origins fire. Although this distinction may seemsemantic, it is important because it recasts black-boxmechanisms of global replication timing in terms of bio-chemically plausible effects on individual origins.
Over the past decade, two views about replication tim-ing mechanisms have been developed. In the first, originfiring is a stochastic event that is (largely) independent ofthe replication state of neighboring origins. In particular, ithas been postulated that there is an initiation functionI(x,t) that describes the rate of initiation, per time and perlength of unreplicated DNA, of a site x along the genome attime t after the beginning of S phase [6,7] (Figure 1; Box 1).This type of origin firing can manifest in at least threedifferent ways, depending on the experimental model con-sidered. In species such as budding yeast, in which repli-cation initiates at well-defined loci, the function I(x,t) formsa discrete spike at the replication origin [8] (Figure 1a). Atthe other end of the spectrum, amphibian embryos lackorigin specificity, and DNA replication can initiate any-where along the genome [6]. In an intermediate case,mammalian somatic cells can display clusters of originsor broad initiation zones that are not homogeneouslydistributed throughout the genome [9–11] (Figure 1b).Each of these three cases is discussed in detail below.We refer to the hypothesis of a locally determined initiationrate as the independent origin hypothesis because it isdistinguished by the feature that origins fire independent-ly from the firing of neighboring origins. The attraction ofthe independent origin hypothesis is its simplicity: onedoes not need to postulate biological mechanisms thatwould cause correlated initiations. The potential weakness
ttp://dx.doi.org/10.1016/j.tig.2012.03.011 Trends in Genetics, August 2012, Vol. 28, No. 8
Yeast Metazoan
f(t)
f(t)
f(x) f(x)
I(x) I(x)
I(t)
I(t)
f(x,t)
I(x,t)
0 0
1
1
Genome position
Tim
e
Genome position
Tim
eT
ime
Tim
eIn
itiat
ion
rate
# /k
b /ti
me
Genome position
1
0Rep
licat
ion
frac
.f(x)
I(x)
1
0
Rep
licat
ion
frac
.In
itiat
i on
rate
# /k
b /ti
me
Genome position
I(t)
f(t)(a)
(c)
(b)
TRENDS in Genetics
Figure 1. Replication fractions and initiation rates. (a,b) The relation between
replication fractions f and initiation rates I, as illustrated for budding yeast. (a)
Spatially resolved data, averaged over an asynchronous cell population. (b) Time
course data, averaged over the genome. (c) Illustration of typical replication timing
data for budding yeast (left) and a metazoan organism (right). Top-left image
shows the replication fraction f(x,t), as it might be inferred from a microarray
timing experiment with several time points of data from synchronized cell
populations. Black represents low-replication levels and white represents high-
replication levels. Averaging the replication fraction over the genome gives the
curve f(t), depicted to the left of the f(x,t) image, which goes from 0 to 1. Averaging
the replication fraction over time, as in an experiment on asynchronous cell
populations, gives the curve above the f(x,t) image. The bottom-right group shows
the inferred I(x,t) image, as well as the averaged curves I(t) and I(x). Note that, in
budding yeast, replication origins are well localized, as indicated by the spikes in
the function I(x). [When viewed or printed at low resolution, not all spikes in I(x,t)
may be visible.] The right-hand groups illustrate similar concepts for a typical
metazoan organism. The main difference is that origins are not well localized, so
that the function I(x) has broad features, representing zones where initiations are
more or less likely to occur.
Review Trends in Genetics August 2012, Vol. 28, No. 8
of the independent origin hypothesis is that, if too simple, itmay fail to describe experiments accurately or that im-plausible coincidences of parameters may be required to fitthe data.
In a second scenario, the initiation of an origin, al-though still stochastic, is linked to the state of the genomein its vicinity. For example, observations of origin cluster-ing [12–14] have led several authors to hypothesize thatthe presence of a replication fork can increase the firingrate of nearby origins, for instance, the ‘next-in-line’ model[15] and the ‘domino-cascade’ model [16,17]. We refer tothis second scenario in general as the correlated originhypothesis.
Previously, there was considerable debate as to whetherreplication was stochastic and whether origins are inde-pendent. At present, it is generally accepted that all models
of replication are stochastic at the level of molecular inter-actions. It is important to note that stochastic models do notrequire that origins all fire with the same probability, nor isstochastic firing incompatible with late firing origin [18].However, there is evidence in some cases for correlation inorigin initiation activity. As a result, the current picture isan intermediate one that mixes both stochastic elementsand mechanisms for correlations in origin initiation [15,19].Still, differences remain concerning what is essential andwhat is incidental in the above picture and what kind ofunderlying mechanisms are likely to be important in con-trolling the replication program. In this review, we arguethat, for the simpler cases such as unicellular yeast and forthe embryonic cells of some multicellular animals, recentexperiments and modeling efforts have shown that much ofthe available replication data may be understood in terms ofthe simpler independent origin hypothesis and that correla-tions probably play a minor role in the replication program.Replication in the somatic cells of metazoan organisms ismore complex, and we outline recent efforts in this area.
Replication in yeastThe past few years have marked a turning point in theunderstanding of replication in yeast. First came a seriesof high-resolution combing and microarray experiments(Box 2). For example, high-resolution timing data ofsynchronized populations of wild type and clb5D Saccharo-myces cerevisiae show clear average timing patterns [20].Their measurements, as mentioned in Box 2, amounted tomeasurements of f(x,t), with spatial information resolved toa few kilobases and temporal information resolved to 5 min.At around the same time, DNA combing studies in buddingand fission yeast showed that initiations at the single-mole-cule scale are stochastic, with different sets of origins chosenin each cell cycle [21,22]. Indeed, in budding yeast, it is nowclear that there are as many as 700 potential origin sites, ofwhich only approximately 200 are used in any given cycle.
In parallel work, the rate of origin firing in budding andfission yeast was shown to be regulated by competition forlimiting activators, such as the Cdc45 initiation factor andthe DDK initiation kinase [23–26]. Competition for limit-ing activators provides an explanation for why origin firingis less efficient than might be possible. The stochasticinteraction between origins and diffusible activators alsoprovides a mechanism for stochastic firing of origins.
The stochasticity of individual origins turns out to be animportant effect. In contrast to earlier models, in which thefiring of specific origins was envisaged to be limited tonarrow windows of S phase, it is now clear that the width ofthe firing-time distribution for an individual origin can be asubstantial fraction of S phase. Indeed, models that fail toincorporate the width of the timing distribution fail toreproduce many of the experimental details adequately[27]. By contrast, stochastic models that take into accountthe width of the firing-time distribution can successfully fitthe microarray data [8,28,29]. Several notable insights andresults come from these analyses: first, it is possible togenerate models with independent initiation scenarios[initiation rate I(x,t) and constant fork velocity v] that leadto good fits of the data. This result shows that the indepen-dent origin hypothesis suffices to explain microarray data
375
Box 1. f and I: mathematical functions that describe
replication kinetics
DNA replication kinetics can be described using two related but
distinct mathematical functions: the replication fraction f and the
initiation rate I. The first, f, is a complete description of replication
kinetics and can be directly determined from experimental data (Box
2). The second, I, only describes the kinetics of origin initiation and
cannot be directly measured; it must be inferred from f. However, if
fork rates are assumed to be nearly constant, as is frequently done
in models of replication kinetics, then I is sufficient to completely
determine f. Both f and I can be defined for every spatial point (x) in
the genome and every time point (t) in S phase, to give f(x,t) and
I(x,t) (Figure 1c, main text).
It is often useful to consider the time-averaged functions, f(x) and
I(x) (Figure 1a, main text). f(x) can be thought of as the average
replication time of each point in the genome and is generally
measured on asynchronous populations of cells. It is closely related
to the median replication time trep at a site that is inferred from time
course data on synchronized cell populations. The peaks in f(x)
represent the origins, and taller peaks indicate origins that fire, on
average, earlier in S phase. I(x) represents the average initiation rate
of each point in the genome. In yeast, where origins are well
defined, I(x) = 0 for most of the genome and forms spikes over the
origins, with taller spikes reflecting a higher average probability of
origin firing (Figure 1a,c). In metazoans, origins appear to be more
diffuse, and thus so is I(x) (Figure 1c). It is important to realize that
the height of the peaks in I(x) (e.g. the average firing probability of
an origin) cannot be directly inferred from the height of the peaks in
f(x), because f(x) convolves both passive replication and active firing
of each origin; I(x) can only be extracted by mathematical modeling
of f(x).
It can also be useful to consider the spatially averaged functions,
f(t) and I(t) (Figure 1b, main text). The replication fraction f(t) is
generally sigmoidal, as cells go from unreplicated in G1 to
replicated in G2. The exact shape of the sigmoid depends on the
details of the replication program, such as of the distribution of
origins and the shape of I(t). As discussed in the main text, I(t) has
been proposed to generally increase for most of S phase and then
decline in late S phase.
Review Trends in Genetics August 2012, Vol. 28, No. 8
on replication timing in yeast. Second, the intrinsic param-eters characterizing each origin have values that are inde-pendent of their neighbors, again suggesting that theinitiation of each origin is an independent stochastic event[8]. Studies in fission yeast have also led to the conclusionthat local initiation models suffice to explain the availableexperimental data [30,31]. However, several biologicallydifferent scenarios can lead to similar overall timing pat-terns [32], and more complicated mechanisms, such astrans-acting regulators of origin activity and chromosomestructure, can affect origin timing [33,34]. Clearly, furtheriterations of modeling and experiment will be needed tocome to a final picture.
Replication in embryosEmbryonic cells in metazoans represent an interestingintermediate case of complexity. On the one hand, theyhave the full amount of DNA of somatic cells. On the otherhand, they undergo a rapid, simplified cell cycle that islargely transcriptionally silent, which removes one majorsource of complication in the replication of somatic cells. Invitro studies of Xenopus cell-free extracts have been espe-cially detailed and fruitful [13,35–37] and have led toassociated modeling efforts [6,7,19,38,39]. The replicationprogram in Xenopus embryos is relatively simple and muchfaster than in somatic cells. In particular, there are no fixed
376
origin sites, presumably because the lack of transcriptionand more uniform chromatin structure allows the ORC toload MCM anywhere in the genome [40,41]. Althoughvariation in initiation rates and, hence, replication timingdoes occur at the megabase scale [37], modeling efforts todate have focused on understanding the temporal varia-tion of the initiation rate, I(t), averaged over the genome.The main conclusion is that the initiation rate increasesover most of S phase, before decreasing to zero near the endof it. This variation of initiation rates over S phase issignificant because it leads to a relatively narrow distribu-tion of lengths for S phase, which, because of the stochas-ticity of origin placement and initiation time, varies witheach cell cycle [42]. In embryos, it is particularly importantthat there be little variation in genome duplication time, asthe cell cycle lacks checkpoints that can delay the start ofmitosis if replication is not complete. In Xenopus embryos,for example, the typical S phase duration is 20 min andthat of mitosis is 5 min, all within a 25-min cell cycle [43].Thus, variations of S phase of more than 5 min can belethal for a cell. Such variations are proposed to be sup-pressed by the increasing nature of initiation rate I(t). Ithas even been postulated that the increasing form of I(t) isa universal characteristic of eukaryotic replication [44].Preliminary assessment of replication data from S. cerevi-siae, Schizosaccharomyces pombe, Drosophila melanoga-ster and Homo sapiens supports this scenario, althoughbetter data and more extensive analysis are required. Theinitial increase of initiation rate I(t) has been attributed tocompetition for a limiting factor required for replicationfork function [35,38] or origin firing (e.g. the DDK replica-tion kinase [23]), whereas the decrease of I(t) at the end of Sphase has been variously attributed to a fork-dependentcontrol mechanism [38] or to increasing diffusion searchtimes for the limiting factor to find its target [39].
Replication in metazoan somatic cellsThe replication of DNA in metazoan germline and somaticcells is more complicated than in embryonic cells. Replica-tion in somatic cells can take up to 100 times longer than inembryonic cells [45], and this increase in replication time isnot spread equally across the genome. Instead, differentregions of the genome replicate at characteristic timesduring the elongated S phase, and the replication timingof a locus correlates with several other important chromo-somal characteristics. The best-established correlation isbetween late replication and constitutive heterochromatin,the repetitive, transcriptionally inactive regions of thegenome that remain condensed throughout the cell cycle[46]. Conversely, gene-rich, transcriptionally activeregions of the genome tend to replicate earlier in S phase[47]. The correlation between the transcriptional activity ofindividual genes and their replication timing is not strong[48]. However, when averaged over large groups of neigh-boring genes, transcriptional activity correlates well withreplication timing [49,50]. An even more remarkable cor-relation is seen between chromosome interaction maps andreplication timing [51,52]. The contiguous regions of thegenome that replicate with similar timing are referred toas replication domains. The correlations between the av-erage transcriptional activity, chromatin interactions and
Box 2. Experimental techniques for analyzing DNA
replication timing
The recent gains in our understanding of replication timing are built
on experimental advances that have greatly increased the quality
and quantity of data available. Defined patterns of DNA replication
were first observed in fiber autoradiography studies of tritiated
thymidine incorporation in bacterial and mammalian cells [12,79].
By in vivo pulse-labeling cells with tritiated thymidine and then
stretching the labeled DNA on a photosensitive film, it was possible
to map replication patterns (which regions have replicated and
which have not) at a given time. A significant technical improvement
was the substitution of fluorescently labeled thymine analogs, such
as BrdU, that could be observed using an optical microscope [80,81].
Molecular combing, which stretches DNA more controllably,
improved the latter technique by allowing one to more reliably
associate positions on an image of a stretched fiber with genomic
positions and by simplifying the identification of individual fibers
taken anonymously from the genome [82,83] or with the genome
location identified [54,84]. In parallel with fiber-based techniques,
live-cell imaging has also yielded much valuable information.
Although the size of origins and even their separations are well
below the resolution of conventional light microscopy, clever
techniques can yield spatial and temporal information. For example,
specific sites can be labeled with fusion proteins whose intensity
doubles after replication, an event that can readily be observed [85].
In the future, ‘live’ single-molecule studies based on flow and
optical or magnetic tweezers [86], nano-engineered capillaries
[87,88] and other molecular-scale structures may lead to even
greater insights, especially into local mechanisms at the fork and
initiation sites.
A second set of techniques provides information about the
fraction of cells in a population that has replicated at a particular
location x and time t. This fraction of replicated cells can be
described by the function f(x,t), if replication kinetics throughout S
phase are measured, or simply as f(x), if measurements are
performed on asynchronous cell populations (Figure 1, main text).
Such measurements originally used microarrays [89,90], with one
approach based on local changes in copy number during replica-
tion. In a population of unreplicated cells, a baseline intensity is
measured at each locus [f(x) = 0]. After all cells have replicated, the
measured intensity at each locus should be double [f(x) = 1]. During
replication, intermediate levels of replication are detected as
intermediate intensity levels [0<f(x)<1]. For example, if half of the
cells in the population have replicated at a location x, then f(x) = 0.5.
More recently, direct sequencing to determine local DNA copy
number has given similar information with fewer artifacts [91,92].
Initial studies used multiple time points in cultures of synchronized
cells to directly measure f(x,t) [89,93], and this approach is still the
state of the art in yeast [20,64]. However, comparable results can be
derived by sorting asynchronous cells of any type into G1 and S
populations [90].
Review Trends in Genetics August 2012, Vol. 28, No. 8
the replication timing of replication domains has led toqualitative models in which the chromosome accessibilityof a domain affects its replication timing [53].
Although replication domains replicate with reproduc-ible timing, origin firing within domains is heterogeneousbecause of stochastic origin firing [10,54,55]. As in yeast,origin firing in metazoans appears to be regulated bylimiting activators. Mammalian Cdc45 is substoichio-metric, relative to OCR and MCM, and increasingCdc45 levels increases the rate of origin firing [56]. More-over, modulating the levels of the CDK replication kinaseaffects the efficiency of origin firing [57–59]. An additionalreason for the heterogeneity of origin firing in metazoansis that metazoan origins are not well-defined loci; at leastin some cases, MCM seems to be loaded heterogeneouslythroughout a region [60–62], which can be thought of as a
cluster of many inefficient origins or as a diffuse initiationzone.
Mechanisms for timingAlthough replication timing appears to be uniform andwell coordinated at the population level, this averagebehavior hides heterogeneous replication kinetics in indi-vidual cells. This apparent conflict between heterogeneityat the single-cell level and organization at the populationlevel is resolved by observing that the average of theheterogeneous single-cell data recapitulates the resultsfrom ensemble studies [22]. This observation has led tomodels in which the average replication time of a locus is afunction of the firing probability of individual origins,regardless of whether those probabilities are independentor coordinated (Box 3). Such models predict a correlationbetween the probability and timing of origin firing, acorrelation seen in budding yeast [8,28]. Furthermore,recent budding yeast studies have shown that, in mostcases in which the length of S phase is significantly in-creased, the relative timing program is maintained [63,64];that is, the overall ordering of replication timing of differ-ent regions is preserved, even as the scale of timing isaltered. Such a result would be expected if S phase lengthchanges because the initiation rates have been alteredglobally (Naama Barkai, personal communication). Asdiscussed above, initiation rates are thought to be regulat-ed by competition among origins for limiting activationfactors. One recently proposed model makes the case boththeoretically and experimentally that the limiting factor is aprotein associated with active replication forks [65]. TheCdc45 protein, which is required to activate the MCM heli-case complex, is one such candidate [56]. Alternatively,factors such as DDK, which phosphorylates and activatesMCM, have been seen to be rate limiting in fission yeast [23].
The competition for limiting activators explains whyorigins fire stochastically but not why some origins firewith higher probability than others. One obvious explana-tion for differing probabilities of origin firing is the effect ofchromatin structure on the accessibility of origins to initi-ation factors [53]. In the context of competition betweenorigins for limiting activators, it is natural to imagine thatchromatin structure affects that competition, allowingeuchromatic origins greater access to activators and sohigher firing probabilities. This possibility fits well withthe strong correlation observed between heterochromatinand late replication [46]. Another possibility that we haverecently proposed is based on the observation that multipleMCMs are loaded at each origin [8,60]. In this model, eachMCM loaded has a low probability of firing; however,because multiple MCMs are loaded at each origin, originsthat have more MCMs loaded will have a higher aggregatefiring probability. Thus, the probability of origin firing isset in part by the number of MCMs loaded at a given originsite. The probability of origin firing can then be subse-quently altered by chromatin context. For example, arecent study has shown that Rif1, which affects telomerechromatin structure, also binds to chromosome arms andalters origin initiation rates at these sites, perhaps byaltering the loading of the Cdc45 that is required forMCM helicase activation [33].
377
Box 3. Theoretical techniques for analyzing DNA replication
timing
Although determining the firing time of an origin would seem
straightforward, particularly for the relatively simple yeast genome,
the heterogeneous nature of origin firing and the passive replication
of origins by forks from neighboring origins mean that the
distribution of origin firing times cannot be directly inferred from
its average replication time [94]. Therefore, rigorous analysis of
replication timing patterns has relied on more sophisticated
analytical tools. One of the most straightforward and widespread
methods is computer simulation [6,27,28,30,38]. An advantage of
simulation is that, with modest computer resources (especially if
simulations keep track of only positions of forks and origins rather
than use a lattice for each point on the genome [95]), one can
recreate in silico not only the ideal experimental scenario envisaged,
but also any relevant experimental details. For example, it is
straightforward to include the effects of asynchrony in the cell
population, finite microscope resolution, labeling artifacts, and the
like [96]. Once the artifacts and the replication scenario are chosen
correctly, the simulation can reproduce, within statistical error, the
data from any given scenario.
The main disadvantage of simulations is that to analyze experi-
mental data, one must first determine both the appropriate type of
replication scenario to simulate and ways to incorporate experi-
mental details and then determine the appropriate parameters to
use. In situations in which origin firing is not uniformly distributed,
each origin will be characterized by several parameters, and so the
simulation may depend on hundreds or even thousands of
parameters, depending on the type of organism. Curve-fitting
techniques, which amount to a search in the space of parameters,
require simulating a large number of scenarios. Analytical models,
which can be used to directly calculate replication profiles instead of
needing to simulate replication step by step, are one way to get
around such obstacles. Analytical models may be evaluated faster
than simulations. The difficulties are that one must be able to
determine an appropriate model and be able to solve it. Thus,
beginning with [6], a variety of analytical models have been
proposed [8,39,42,94,97]. Because models based on independent
origins are simpler than ones that allow correlated initiations, most
of the above work has assumed such a scenario. Nonetheless, some
analysis of correlated initiations has been done, as discussed in the
main text.
Review Trends in Genetics August 2012, Vol. 28, No. 8
A scenario comprising stochastically firing origins withdifferent firing probabilities naturally leads to a reproduc-ible replication-timing program [66]. Origins with highfiring probabilities will be more likely to fire in early Sphase and so will have early average replication times. Ingeneral, low-probability origins would be unlikely to fireefficiently even in late S phase. However, if the firing rate,I(t), increases during S phase, as described above, even low-probability origins, if not passively replicated, will have ahigh probability of firing late in S phase, leading to efficientreplication of late-replicating regions [18]. Here, we distin-guish between I(t), which describes the timing program,and the underlying biological mechanisms, which try toexplain why I(t) has an observed form. This description oforigin timing applies not only to the individual origins ofsimpler genomes, such as budding yeast, but also to thecomplicated replication domains of metazoan genomes. Inthe latter case, euchromatic replication domains of high-probability origins reproducibly replicate earlier than dodomains of lower-probability origins, but heterochromaticdomains, which harbor the lowest-probability origins,nonetheless replicate efficiently in late S phase. Thus,the order in which various domains of metazoan genomesreplicate may be a secondary consequence of the effect of
378
their chromatin structure on the firing probabilities oftheir origins. This possibility is consistent with the strongcorrelation between chromatin interactions and replica-tion timing [52].
Correlated origin initiationsAlthough much of observed replication timing can beexplained in terms of a picture of independent initiations,there is also evidence for correlations in initiation. Forexample, DNA fiber studies observe clusters of nearbyorigins that initiated at approximately the same time[12,13]. One plausible mechanism for origin clustering isthat the polymerases and other proteins responsible forreplication are localized within the nucleus in small fociknown as replication factories [67,68]. As a consequence, ifthe DNA is tethered to a location in the cell nucleus whilereplicating, it may loop around and find another set ofreplication machinery in the same factory. Such loopingcould increase the likelihood of origin firing of originslocated approximately 10 kb from an active fork and de-crease origin firing for closer origins [19].
Another line of argument suggesting the possibility ofcorrelated initiation lies in an observation of small biasesin the DNA base sequence near certain regions. It has beenshown that if a region of the genome is repeatedly repli-cated by a polymerase on the leading strand, mutationswill eventually lead to strand compositional asymmetries(an excess of G over C and T over A) [69]. Indeed, a largeproportion of known origins for H. sapiens have been foundby looking for signatures of compositional skew [70]. Earlyreplicating regions are then marked by an abrupt jump inthe local skew. Because adjacent early replicating regionsare separated by approximately 1 Mb and because theaverage distance between origins is approximately100 kb, there must be multiple initiations between eachearly region. To explain the observation that the composi-tional skew varies linearly between compositional discon-tinuities associated with origins, it was postulated that awave of correlated initiations occurs, which leads to a‘domino’ [16,17,71] or ‘next-in-line’ model [15]. It is notclear whether a looping mechanism [19] can explain sucheffects, whether some more complicated form of couplingbetween initiation and fork progression is required, orwhether the difference in chromatin structure betweenearly- and late-replicating regions can account for theseobservations. Such a possibility would avoid the need toinvoke coordinated origin firing. In support of this idea, arecent single-molecule replication kinetics analysis of themouse Igh locus is consistent with a stochastic model thatlacks any origin coordination [11] (Paolo Norio, personalcommunication).
In addition to temporal ordering of origin initiation,some models include spatial correlations in the positioningof origins. Recently, it was proposed that the clustering ofinitiated origins observed in Xenopus embryos and, to alesser extent in yeast origins, may speed up the overallcompletion of S phase [72]. A shorter S phase is particu-larly helpful in Xenopus embryos, as it prevents the mitoticcatastrophe discussed above. Clustering several inefficientorigins together can lead to a group that is collectivelyefficient in that one or the other of the origins is likely to
Review Trends in Genetics August 2012, Vol. 28, No. 8
fire early. Although the periodic distribution of such groupsof origins would be an efficient way to replicate the genome,mechanisms that could achieve this global order are notclear, at present.
Concluding remarksThe hypothesis that replication is largely controlled by thelocal rate of initiation has received wide support fromrecent experiments and analyses. Models based on localreplication rates I(x,t) have successfully described thereplication process in budding and fission yeast, in Xeno-pus embryos and in the Igh locus of mouse pro-B cells[6,8,11,28,30,38] (Paolo Norio, personal communication). Alimiting factor in this work is that each of the aboveanalyses involved a long-term collaboration between ex-perimental biologists and modeling laboratories (the latterfrom a variety of fields, including physics, engineering andcomputer science). To broaden the use of quantitativeanalyses of replication and to analyze the growing numberof data sets, it is important that the software and analysisprocedures be usable by non-specialists. The recent deri-vation of ‘inversion’ formulas (A. Baker, PhD thesis, ENSde Lyon, France, 2011) that give I(x,t) directly from data onthe local average replication fraction f(x,t) obtainable frommicroarray or deep sequencing studies on synchronizedcell populations are a first step in that direction.
A second research direction is a more precise under-standing of the relation between the replication program,as described above, and the effects of DNA damage, with itsconcomitant activation of DNA repair mechanisms. Forexample, one consequence of damage that stalls replicationforks is the activation of additional origins, which now havemore time to initiate [73,74], an effect that is straightfor-ward to simulate [75] and model analytically [76]. Themodeling of fork stalls predicts that there is a criticaldensity of stalled forks (approximately one per replicon),above which there is a global delay in S phase and belowwhich the effects are minor and localized. Interestingly,this threshold density matches the observed stall densitiesin fragile zones and in cells with activated oncogenes [76].However, DNA damage can also induce checkpoints thatinhibit subsequent origin firing [77], complicating theoverall effect of DNA damage on replication timing. Arelated topic is the interrelation between mutation ratesand events in S phase. Although formal models to handlesuch situations are beginning to be developed [69], morework is needed to understand observations, such as thelink between mutation rate and S phase timing [78].
Although the independent origin hypothesis is attrac-tive in its simplicity and so far remarkably successful in itsapplication, there is evidence for correlated initiations insomatic metazoan cells. Some of the correlation is explain-able as straightforward consequences of the physical con-straints of clustering polymerases. In such a view, theprimary method of controlling timing in S phase remainsthe local modulation of overall initiation rates, and thecorrelations in the initiation of neighboring origins areproduced by the geometrical effects of loops induced byreplication factories. Whether such mechanisms suffice orwhether a more complicated control mechanism is at playis at present unclear. Time will tell.
AcknowledgmentsJB has been supported by grants from NSERC (Canada) and the HumanFrontiers Science Program. NR has been supported by NIH grantGM098815 and an American Cancer Society Research Scholar Grant.
References1 Baker, T.A. and Wickner, S.H. (1992) Genetics and enzymology of DNA
replication in Escherichia coli. Annu. Rev. Genet. 26, 447–4772 Masai, H. et al. (2010) Eukaryotic chromosome DNA replication: where,
when, and how? Annu. Rev. Biochem. 79, 89–1303 Remus, D. et al. (2009) Concerted loading of Mcm2-7 double hexamers
around DNA during DNA replication origin licensing. Cell 139, 719–7304 Evrin, C. et al. (2009) A double-hexameric MCM2-7 complex is loaded
onto origin DNA during licensing of eukaryotic DNA replication. Proc.Natl. Acad. Sci. U.S.A. 106, 20240–20245
5 Labib, K. (2010) How do Cdc7 and cyclin-dependent kinases trigger theinitiation of chromosome replication in eukaryotic cells? Genes Dev. 24,1208–1219
6 Herrick, J. et al. (2002) Kinetic model of DNA replication in eukaryoticorganisms. J. Mol. Biol. 320, 741–750
7 Jun, S. and Bechhoefer, J. (2005) Nucleation and growth in onedimension. II. Application to DNA replication kinetics. Phys. Rev. E71, 011909
8 Yang, S.C. et al. (2010) Modeling genome-wide replication kineticsreveals a mechanism for regulation of replication timing. Mol. Syst.Biol. 6, 404
9 Hamlin, J.L. et al. (2008) A revisionist replicon model for highereukaryotic genomes. J. Cell. Biochem. 105, 321–329
10 Norio, P. et al. (2005) Progressive activation of DNA replicationinitiation in large domains of the immunoglobulin heavy chain locusduring B cell development. Mol. Cell 20, 575–587
11 Gauthier, M.G. et al. (2012) Modeling inhomogeneous DNA replicationkinetics. PLoS ONE 7, e32053
12 Huberman, J.A. and Riggs, A.D. (1968) On the mechanism of DNAreplication in mammalian chromosomes. J. Mol. Biol. 32, 327–341
13 Blow, J.J. et al. (2001) Replication origins in Xenopus egg extract are 5–15 kilobases apart and are activated in clusters that fire at differenttimes. J. Cell Biol. 152, 15–25
14 Pasero, P. et al. (2002) Single-molecule analysis reveals clustering andepigenetic regulation of replication origins at the yeast rDNA locus.Genes Dev. 16, 2479–2484
15 Shaw, A. et al. (2010) S-phase progression in mammalian cells:modelling the influence of nuclear organization. Chromosome Res.18, 163–178
16 Audit, B. et al. (2009) Open chromatin encoded in DNA sequence is thesignature of ‘master’ replication origins in human cells. Nucleic AcidsRes. 37, 6064–6075
17 Guilbaud, G. et al. (2011) Evidence for sequential and increasingactivation of replication origins along replication timing gradients inthe human genome. PLoS Comput. Biol. 7, e1002322
18 Rhind, N. et al. (2010) Reconciling stochastic origin firing with definedreplication timing. Chromosome Res. 18, 35–43
19 Jun, S. et al. (2004) Persistence length of chromatin determines originspacing in Xenopus early-embryo DNA replication: quantitativecomparisons between theory and experiment. Cell Cycle 3, 223–229
20 McCune, H.J. et al. (2008) The temporal program of chromosomereplication: genomewide replication in clb5D Saccharomycescerevisiae. Genetics 180, 1833–1847
21 Patel, P.K. et al. (2006) DNA replication origins fire stochastically infission yeast. Mol. Biol. Cell 17, 308–316
22 Czajkowsky, D.M. et al. (2008) DNA combing reveals intrinsic temporaldisorder in the replication of yeast chromosome VI. J. Mol. Biol. 375,12–19
23 Patel, P.K. et al. (2008) The Hsk1(Cdc7) replication kinase regulatesorigin efficiency. Mol. Biol. Cell 19, 5550–5558
24 Mantiero, D. et al. (2011) Limiting replication initiation factors executethe temporal programme of origin firing in budding yeast. EMBO J. 30,4805–4814
25 Wu, P.Y. and Nurse, P. (2009) Establishing the program of origin firingduring S phase in fission yeast. Cell 136, 852–864
26 Tanaka, S. et al. (2011) Origin association of sld3, sld7, and cdc45proteins is a key step for determination of origin-firing timing. Curr.Biol. 21, 2055–2063
379
Review Trends in Genetics August 2012, Vol. 28, No. 8
27 Spiesser, T.W. et al. (2009) A model for the spatiotemporal organizationof DNA replication in Saccharomyces cerevisiae. Mol. Genet. Genomics282, 25–35
28 de Moura, A.P. et al. (2010) Mathematical modelling of wholechromosome replication. Nucleic Acids Res. 38, 5623–5633
29 Luo, H. et al. (2010) Genome-wide estimation of firing efficiencies oforigins of DNA replication from time-course copy number variationdata. BMC Bioinform. 11, 247
30 Lygeros, J. et al. (2008) Stochastic hybrid modeling of DNA replicationacross a complete genome. Proc. Natl. Acad. Sci. U.S.A. 105, 12295–12300
31 Koutroumpas, K. and Lygeros, J. (2011) Modeling and analysis of DNAreplication. Automatica 47, 1156–1164
32 Raghuraman, M.K. and Brewer, B.J. (2010) Molecular analysis of thereplication program in unicellular model organisms. Chromosome Res.18, 19–34
33 Hayano, M. et al. (2011) Mrc1 marks early-firing origins andcoordinates timing and efficiency of initiation in fission yeast. Mol.Cell. Biol. 31, 2380–2391
34 Knott, S.R. et al. (2012) Forkhead transcription factors establish origintiming and long-range clustering in S. cerevisiae. Cell 148, 99–111
35 Herrick, J. et al. (2000) Replication fork density increases during DNAsynthesis in X. laevis egg extracts. J. Mol. Biol. 300, 1133–1142
36 Lucas, I. et al. (2000) Mechanisms ensuring rapid and complete DNAreplication despite random initiation in Xenopus early embryos. J. Mol.Biol. 296, 769–786
37 Labit, H. et al. (2008) DNA replication timing is deterministic at thelevel of chromosomal domains but stochastic at the level of replicons inXenopus egg extracts. Nucleic Acids Res. 36, 5623–5634
38 Goldar, A. et al. (2008) A dynamic stochastic model for DNA replicationinitiation in early embryos. PLoS ONE 3, e2919
39 Gauthier, M.G. and Bechhoefer, J. (2009) Control of DNA replication byanomalous reaction–diffusion kinetics. Phys. Rev.Lett. 102, 158104
40 Harland, R.M. and Laskey, R.A. (1980) Regulated replication of DNAmicroinjected into eggs of Xenopus laevis. Cell 21, 761–771
41 Hyrien, O. and Mechali, M. (1993) Chromosomal replication initiatesand terminates at random sequences but at regular intervals in theribosomal DNA of Xenopus early embryos. EMBO J. 12, 4511–4520
42 Yang, S.C. and Bechhoefer, J. (2008) How Xenopus laevis embryosreplicate reliably: investigating the random-completion problem. Phys.Rev. E 78, 041917
43 Graham, C.F. (1966) The regulation of DNA synthesis and mitosis inmultinucleate frog eggs. J. Cell Sci. 1, 363–374
44 Goldar, A. et al. (2009) Universal temporal profile of replication originactivation in eukaryotes. PLoS ONE 4, e5899
45 Blumenthal, A.B. et al. (1974) The units of DNA replication inDrosophila melanogaster chromosomes. Cold Spring Harb. Symp.Quant. Biol. 38, 205–223
46 Lima-de-Faria, A. and Jaworska, H. (1968) Late DNA synthesis inheterochromatin. Nature 217, 138–142
47 Gilbert, N. et al. (2004) Chromatin architecture of the human genome:gene-rich domains are enriched in open chromatin fibers. Cell 118,555–566
48 Schwaiger, M. and Schubeler, D. (2006) A question of timing: emerginglinks between transcription and replication. Curr. Opin. Genet. Dev. 16,177–183
49 MacAlpine, D.M. et al. (2004) Coordination of replication andtranscription along a Drosophila chromosome. Genes Dev. 18, 3094–3105
50 Hiratani, I. et al. (2009) Replication timing and transcriptional control:beyond cause and effect: part II. Curr. Opin. Genet. Dev. 19, 142–149
51 Lieberman-Aiden, E. et al. (2009) Comprehensive mapping of long-range interactions reveals folding principles of the human genome.Science 326, 289–293
52 Ryba, T. et al. (2010) Evolutionarily conserved replication timingprofiles predict long-range chromatin interactions and distinguishclosely related cell types. Genome Res. 20, 761–770
53 Hayashi, M.T. and Masukata, H. (2011) Regulation of DNA replicationby chromatin structures: accessibility and recruitment. Chromosoma120, 39–46
54 Lebofsky, R. et al. (2006) DNA replication origin interference increasesthe spacing between initiation events in human cells. Mol. Biol. Cell 17,5337–5345
380
55 Cayrou, C. et al. (2011) Genome-scale analysis of metazoan replicationorigins reveals their organization in specific but flexible sites defined byconserved features. Genome Res. 21, 1438–1449
56 Wong, P.G. et al. (2011) Cdc45 limits replicon usage from a low densityof preRCs in mammalian cells. PLoS ONE 6, e17533
57 Krasinska, L. et al. (2008) Cdk1 and Cdk2 activity levels determine theefficiency of replication origin firing in Xenopus. EMBO J. 27, 758–769
58 Katsuno, Y. et al. (2009) Cyclin A-Cdk1 regulates the origin firingprogram in mammalian cells. Proc. Natl. Acad. Sci. U.S.A. 106, 3184–3189
59 Thomson, A.M. et al. (2010) Replication factory activation can bedecoupled from the replication timing program by modulating Cdklevels. J. Cell Biol. 188, 209–221
60 Edwards, M.C. et al. (2002) MCM2-7 complexes bind chromatin in adistributed pattern surrounding the origin recognition complex inXenopus egg extracts. J. Biol. Chem. 277, 33049–33057
61 Dijkwel, P.A. et al. (2002) Initiation sites are distributed at frequentintervals in the Chinese hamster dihydrofolate reductase origin ofreplication but are used with very different efficiencies. Mol. Cell.Biol. 22, 3053–3065
62 Harvey, K.J. and Newport, J. (2003) CpG methylation of DNA restrictsprereplication complex assembly in Xenopus egg extracts. Mol. Cell.Biol. 23, 6769–6779
63 Koren, A. et al. (2010) MRC1-dependent scaling of the budding yeastDNA replication timing program. Genome Res. 20, 781–790
64 Alvino, G.M. et al. (2007) Replication in hydroxyurea: it’s a matter oftime. Mol. Cell. Biol. 27, 6396–6406
65 Ma, E. et al. (2012) Do replication forks control late origin firing inSaccharomyces cerevisiae? Nucleic Acids Res. 40, 2010–2019
66 Rhind, N. (2006) DNA replication timing: random thoughts aboutorigin firing. Nat. Cell Biol. 8, 1313–1316
67 Hozak, P. and Cook, P.R. (1994) Replication factories. Trends Cell Biol.4, 48–52
68 Baddeley, D. et al. (2010) Measurement of replication structures at thenanometer scale using super-resolution light microscopy. Nucleic AcidsRes. 38, e8
69 Chen, C.L. et al. (2011) Replication-associated mutational asymmetryin the human genome. Mol. Biol. Evol. 28, 2327–2337
70 Touchon, M. et al. (2005) Replication-associated strand asymmetries inmammalian genomes: toward detection of replication origins. Proc.Natl. Acad. Sci. U.S.A. 102, 9836–9841
71 Chagin, V.O. et al. (2010) Organization of DNA replication. Cold SpringHarb. Perspect. Biol. 2, a000737
72 Karschau, J. et al. (2012) Optimal placement of origins for DNAreplication. Phys. Rev. Lett. 108, 058101
73 Ge, X.Q. et al. (2007) Dormant origins licensed by excess Mcm2-7 arerequired for human cells to survive replicative stress. Genes Dev. 21,3331–3341
74 Blow, J.J. et al. (2011) How dormant origins promote complete genomereplication. Trends Biochem. Sci. 36, 405–414
75 Blow, J.J. and Ge, X.Q. (2009) A model for DNA replication showinghow dormant origins safeguard against replication fork failure. EMBORep. 10, 406–412
76 Gauthier, M.G. et al. (2010) Defects and DNA replication. Phys. Rev.Lett. 104, 218104
77 Sancar, A. etal. (2004)Molecularmechanisms of mammalian DNA repairand the DNA damage checkpoints. Annu. Rev. Biochem. 73, 39–85
78 Herrick, J. (2011) Genetic variation and DNA replication timing, orwhy is there late replicating DNA? Evolution 65, 3031–3047
79 Cairns, J. (1963) The bacterial chromosome and its manner ofreplication as seen by autoradiography. J. Mol. Biol. 6, 208–213
80 Gratzner, H.G. (1982) Monoclonal antibody to 5-bromo- and 5-iododeoxyuridine: a new reagent for detection of DNA replication.Science 218, 474–475
81 Jackson, D.A. and Pombo, A. (1998) Replicon clusters are stable units ofchromosome structure: evidence that nuclear organization contributesto the efficient activation and propagation of S phase in human cells. J.Cell Biol. 140, 1285–1295
82 Bensimon, A. et al. (1994) Alignment and sensitive detection of DNA bya moving interface. Science 265, 2096–2098
83 Michalet, X. et al. (1997) Dynamic molecular combing: stretching thewhole human genome for high-resolution studies. Science 277, 1518–2123
Review Trends in Genetics August 2012, Vol. 28, No. 8
84 Norio, P. and Schildkraut, C.L. (2001) Visualization of DNAreplication on individual Epstein–Barr virus episomes. Science 294,2361–2364
85 Kitamura, E. et al. (2006) Live-cell imaging reveals replication ofindividual replicons in eukaryotic replication factories. Cell 125,1297–1308
86 van Oijen, A.M. and Loparo, J.J. (2010) Single-molecule studies of thereplisome. Annu. Rev. Biophys. 39, 429–448
87 Riehn, R. et al. (2005) Restriction mapping in nanofluidic devices. Proc.Natl. Acad. Sci. U.S.A. 102, 10012–10016
88 Sidorova, J.M. et al. (2009) Microfluidic-assisted analysis of replicatingDNA molecules. Nat. Protoc. 4, 849–861
89 Raghuraman, M.K. et al. (2001) Replication dynamics of the yeastgenome. Science 294, 115–121
90 Woodfine, K. et al. (2004) Replication timing of the human genome.Hum. Mol. Genet. 13, 191–202
91 Desprat, R. et al. (2009) Predictable dynamic program of timing of DNAreplication in human cells. Genome Res. 19, 2288–2299
92 Chen, C.L. et al. (2010) Impact of replication timing on non-CpG and CpGsubstitution rates in mammalian genomes. Genome Res. 20, 447–457
93 Yabuki, N. et al. (2002) Mapping of early firing origins on a replicationprofile of budding yeast. Genes Cells 7, 781–789
94 Retkute, R. et al. (2011) Dynamics of DNA replication in yeast. Phys.Rev. Lett. 107, 068103
95 Jun, S. et al. (2005) Nucleation and growth in one dimension. I. Thegeneralized Kolmogorov–Johnson–Mehl–Avrami model. Phys. Rev. E71, 011908
96 Yang, S.C. et al. (2009) Computational methods to study kinetics ofDNA replication. Methods Mol. Biol. 521, 555–573
97 Brummer, A. et al. (2010) Mathematical modelling of DNA replicationreveals a trade-off between coherence of origin activation androbustness against rereplication. PLoS Comput. Biol. 6, e1000783
381
Human limb abnormalities caused bydisruption of hedgehog signalingEve Anderson, Silvia Peluso, Laura A. Lettice and Robert E. Hill
MRC Human Genetics Unit at the MRC Institute of Genetics and Molecular Medicine, University of Edinburgh,
Edinburgh, EH4 2XU, UK
Review
Glossary
Acheiropodia: an autosomal recessive disorder that results in severe trunca-
tions of the arms and legs, such that there is lack of the distal extremities.
Acrocapitofemoral dysplasia: a rare recessive condition characterized mainly
by short limbs, dwarfism and cone-shaped epiphyses at the joints, mainly in
the hands and hips.
Apical ectodermal ridge (AER): a specialized ectodermal structure that forms
along the distal edge of the limb bud and acts as a major signaling center
through the FGFs.
Brachydactyly: a condition that affects the length of the digits, making the
fingers and toes appear shorter.
Craniosynostosis Philadelphia type: craniosynostosis is a condition in which
one or more of the bony primordia of the infant skull prematurely ossifies, thus
changing the growth pattern of the skull. Philadelphia type has associated
syndactyly of the hands and feet.
Preaxial and postaxial polydactyly: polydactyly means additional digits and
pre-and postaxial refer to the side of the hand or foot that the extra digit
appears. Preaxial is the thumb and big toe side; whereas postaxial is the
opposite side.
Syndactyly: a condition in which two or more digits are fused together.
Syndromic: a syndromic condition is characterized by having several
recognizable clinical features that occur together and are associated for
diagnosis. A nonsyndromic condition has a single clinical feature.
Triphalangeal thumb: whereas each finger has three phalanges (the small
bones of the digits), the thumb only has two. In this condition, the thumb has
an extra phalanx and often has the appearance of a finger.
Zone of polarizing activity (ZPA): an area of mesenchymal cells located along
the posterior margin of the limb bud that produces SHH. SHH patterns the early
limb bud along the A–P axis, specifying digit identity and the number of digits
that will form.
Human hands and feet contain bones of a particular sizeand shape arranged in a precise pattern. The secretedfactor sonic hedgehog (SHH) acts through the con-served hedgehog (Hh) signaling pathway to regulatethe digital pattern in the limbs of tetrapods (i.e. land-based vertebrates). Genetic analysis is now uncoveringa remarkable set of pathogenetic mutations that alterthe Hh pathway, thus compromising both digit numberand identity. Several of these are regulatory mutationsthat have the surprising attribute of misdirecting ex-pression of Hh ligands to ectopic sites in the developinglimb buds. In addition, other mutations affect a funda-mental structural property of the embryonic cell that isessential to Hh signaling. In this review, we focus on therole that the Hh pathway plays in limb development, andhow the many human genetic defects in this pathwayare providing clues to the mechanisms that regulatelimb development.
Human limb abnormalities that affect digit numberStructural abnormalities of the hands and feet are fre-quent birth defects, several of which have known geneticcauses. These defects may affect just the limbs or may bepart of a complex syndrome affecting several organs. Mam-malian limb-bud development is based on a highly con-served pentadactyl pattern for the digits in the hands andfeet [1], and deviation from five digits can be informativefor clinicians and developmental biologists. Too manydigits, or polydactyly (Glossary), is the most frequentlyobserved congenital hand malformation, with a prevalenceof approximately two per 1000 live births [2]. Depending onthe anatomical location of the extra digits, polydactyly isclassified as preaxial (on the side of the thumb and big toe)or postaxial (the opposite side). The genetic contribution topolydactyly was recently surveyed [3] and a remarkablenumber of individual clinical classifications (80) that in-clude polydactyly have been assigned to 99 different genes.
Mechanism that polarizes the limbDuring development, digit number and identity is regulat-ed by a mechanism that initially polarizes the limb bud andthen specifies digit identity and regulates growth. Thecomplementary expression of the transcription factorsGLI3, a zinc finger-containing DNA-binding protein, inthe anterior half and HAND2, a member of the basic
Corresponding author: Hill, R.E. ([email protected]).Keywords: limb development; sonic hedgehog; limb abnormalities; polydactyly; cilia.
364 0168-9525/$ – see front matter � 2012 Elsevier Ltd. All rights reserved. h
helix–loop–helix family of DNA-binding proteins, in theposterior half of the limb [4] are the first molecular indica-tions that the early limb is polarized (Figure 1). This thenpredisposes the posterior margin of the limb bud to expressthe Shh gene, which is the crucial step in regulating spatialvariation along the anterioposterior (A–P) axis of the earlylimb bud. The Shh gene is expressed at the posteriormargin of the limb in a region that was defined in trans-plantation experiments during the 1960s as the zone ofpolarizing activity (ZPA) [5]. These experiments showedthat chick embryonic limb tissue transplanted from theposterior to the anterior limb-bud margin secreted a factor,now known to be SHH, that induced the generation of extradigits.
SHH acts via the Hh signaling pathway, which is re-markably conserved from flies to mammals [6]. Much ofwhat is known about the pathway initially came fromanalysis in Drosophila, which has only a single Hh gene;in mice, three homologs exist [desert hedgehog (Dhh),Indian hedgehog (Ihh) and Shh] (Box 1 and Figure 2).
ZPA regulatory sequence (ZRS): an approximately 800-bp cis-regulatory
sequence that is necessary and sufficient for the limb specific expression of
the Shh gene.
ttp://dx.doi.org/10.1016/j.tig.2012.03.012 Trends in Genetics, August 2012, Vol. 28, No. 8
(a) (b)
AER
GLI3
A P
HAND2
5′ HOXD
GLI3R
GLI3A
HAND2
SHH
ETV4/ETV5
ETS1/GABPα
SHH
AER (FGFs)AER (FGFs) (c)
TRENDS in Genetics
Figure 1. Expression of genes that polarize the limb and regulate sonic hedgehog (Shh) expression in the zone of polarizing activity (ZPA). The earliest limb bud (a) is
polarized by the expression of GLI3 in the anterior (A) and by HAND2 (which downregulates GLI3) in the posterior (P). The expression of the 50Hoxd and Shh genes follow
Hand2 expression and Shh is upregulated by HAND2 in the ZPA. Once SHH is produced (b), it maintains the expression of Hand2 and the 50Hoxd genes in a regulatory loop.
The gradient of GLI3A is shown below. (c) Distal production of ETV4/ETV5 and ETS1/GABPa in overlapping patterns. ETV4/ETV5 ensures that ectopic expression does not
occur in the wild-type limb, whereas ETS1/GABPa determines the position of the Shh expression boundary. Abbreviations: AER, apical ectodermal ridge; FGFs, fibroblast
growth factors 4, 8, 9 and 17.
Review Trends in Genetics August 2012, Vol. 28, No. 8
SHH signaling regulates the proteolytic processing ofmembers of the GLI (after glioma) family of proteins,one of which, GLI3, is of particular interest early in limbdevelopment (Figure 1). GLI3 is expressed across the limbbud; however, in the posterior of the limb bud, where SHHconcentrations are high, GLI3 is present in the full-lengthactivator form, Gli3A; by contrast, in the anterior, whereSHH is low or undetectable, GLI3 is proteolytically pro-cessed into the repressor form, GLI3R [7]. The relativeconcentration of GLI3A:GLI3R across the developing limbbud specifies the differences between the fingers. The mostdistinctive digit, the thumb, develops from a region of the
Box 1. Conservation of the Hh pathway
The Hh gene and much of the Hh signaling pathway is conserved
from flies to mice [6] (Figure 2, main text). In Drosophila, one Hh
gene has been identified, whereas in mice three homologs exist:
Dhh, Ihh and Shh. In signaling cells, Hh is synthesized, cleaved and
lipid modified before being secreted [73]. In responding cells, Hh
binds to the Patched (Ptc) coreceptor, alleviating Ptc inhibition of the
seven-pass transmembrane protein Smoothened (Smo) and activat-
ing the downstream pathway [74].
In Drosophila, the transcriptional effecter of Hh signaling is called
Cubitus interruptus (Ci) and exists in two forms; a full-length activator
protein, and a truncated repressor protein generated by proteolytic
processing [75]. The processing of Ci is blocked by Hh, which also
serves to increase activity of the activator form [76]. In mammals,
members of the GLI protein family are homologs of Ci. There are three
members of the Gli family in vertebrates. Gli1 acts as a transcriptional
activator, whereas Gli2 and Gli3 exist in two forms: a full-length
activator form and a truncated transcriptional repressor [6].
Phosphorylation of Ci or Gli allows binding of the ubiquitin ligase
Slimb (Drosophila) or B-TrCP (mammals) and subsequent polyubi-
quitination and proteasome-mediated processing to their activator
forms [77]. Meanwhile, the activity of both CiA and GliA can be
inhibited by Suppressor of fused [SUFU (mouse) or Su(Fu) (flies)] [78].
The proteins Fused (Fu) and Costal 2 (Cos2) play an important role
in the Hh signaling in Drosophila [79,80]. Knockdown of Fu in mouse
cells does not disrupt Gli signaling. The Cos2 homologs in
mammals, Kif7 and Kif27, as well as Cos2 from Drosophila itself,
have been shown to regulate GLI in mammalian cells, suggesting a
conserved regulatory interaction [81].
limb bud that has the highest concentration of GLI3R andno detectable SHH activity. In addition, SHH and GLI3function together to constrain the number of digits pro-duced, thus ensuring pentadactyly [8,9]. In mice, the ab-sence of both SHH and GLI3 (Shh–/–;Gli3–/–), gives rise tomultiple, unspecified digits forming a polydactylous pawwith as many as six to 11 unspecified digits. This indicatesthat limb buds have an intrinsic capacity to produce digitprimordia and that this process is unregulated in theabsence of both SHH and GLI3.
Several models have been produced to explain SHHactivity [10]; a recent model suggests that SHH [11,12]integrates two different activities to regulate early limb-bud development. SHH initially acts as a morphogen tospecify digit identity at the earliest stages of limb develop-ment. Subsequently, it exhibits mitogenic activity thatensures the production of a sufficient number of cells topromote the normal complement of digits. Together, thesetwo activities of SHH are responsible for specifying theidentity of each digit and, as the limb bud expands, theposition within the limb bud in which each forms. This isobserved as a progressive formation of the digits, such thatthere is a stereotypical order in which each digit appears.For example, in the mouse, digit 4 appears first in the limbbud followed in order and rapid succession by digits 2, 5and then 3 (digit 1 appears to be the last to form). If cellularexpansion in the limb bud is reduced by attenuating SHHactivity, the digits are lost in the reverse order, with digit 3being the first to disappear.
Limb polarity and digit specificationAttempts to understand the genetic basis of preaxial poly-dactyly led to the identification of the cis-regulatory ele-ment responsible for controlling expression of Shh in theposterior part of the limb [13]. This 750–800-bp enhancersequence is both necessary and sufficient for regulating thespatial and temporal activity of Shh, which in turn definesthe ZPA; therefore, it was called the ZPA regulatory se-quence (ZRS) (Figure 3). The ZRS is highly conserved in all
365
HH
VertebrateNo SHH signaling
SMO
PTC
CI-R no transcriptionof target
genes
COS2SUFU Ci
kinases
FU
CI-R
SMOPTC
CI-A
transcriptionof target
genes
COS2
SUFUCi
kinasesFU
DrosophilaNo HH signaling
DrosophilaHH signaling
VertebrateSHH signaling
GLi3-R
KIF7
SUFU
GLI3
SUFU
GLI3
kinases
BTRCP
GLI3
Proteasome
GLI3-R
PTCH1
SMO
SHH
PTCH1
KIF7
SUFU
GLI3GRK2
B-ARRESTIN
KIF3A
SMO
GLi3-A
GLi3-A
(a) (b)
no transcriptionof target
genes
transcriptionof target
genes
SLIMBProteasome
TRENDS in Genetics
Figure 2. Conservation of the Hh signaling pathway. (a) Schematic representation of key components of the Drosophila HH signaling pathway in the absence (top) or
presence (bottom) of HH. In the absence of ligand, Patched (PTC) inhibits Smoothened (SMO), which is held in intracellular vesicles (yellow ovals). A complex of proteins,
including cubitus interruptus (CI), Costal2 (COS2) and several kinases, is established. Phosphorylation of CI establishes recognition signals for SLIMB leading to partial
degradation of CI by the proteasome and formation of the repressor form (CI-R). CI-R then translocates to the nucleus, where it represses transcription of HH targets.
Binding of secreted HH to PTC, blocks PTC activity and releases SMO from inhibition. SMO moves to the plasma membrane, where phosphorylation allows interaction with
COS2. Subsequent phosphorylation of COS2 by FU leads to release of unphosphorylated, full-length CI, which can translocate to the nucleus where it promotes
transcriptional activation. (b) Schematic representation of key conserved components of the vertebrate HH signaling pathway. The cilium (which is absent in Drosophila) is
represented by the central axoneme and the centrosome and basal bodies (gray). In the absence of SHH ligand, PTCH1 inhibits SMO, which is held in intracellular vesicles.
GLI3 is kept in the primary cilium in a complex with KIF7 and SUFU. Phosphorylation of GLI3 by kinases allows its recognition by b-TRCP and leads to partial degradation by
the proteasome, resulting in the formation of the repressor molecule. Activation by SHH relieves the inhibition of SMO by PTCH1. SMO becomes phosphorylated by GRK2,
binds to b-ARRESTIN and KIF3A, and is trafficked to the cilium. This relieves the inhibitory effect of SUFU and allows the full-length GLI3 to translocate to the nucleus and
activate target genes. Homologous genes in Drosophila and vertebrates are colored similarly.
Review Trends in Genetics August 2012, Vol. 28, No. 8
vertebrates with opposing appendages, including fish. Inaddition, the ZRS is located inside an intron of the limbregion 1 homolog (Lmbr1) gene, which has no known role inlimb development and operates over a long distance toactivate the Shh promoter (800 kb–1 Mb away) in mice andhumans.
It is still an open question how the ZRS directs Shhexpression to the ZPA. It is known that the expression ofShh depends on the initial establishment of A–P polarity,and targeted mutations of the Hand2 gene show a role forthis gene in the early determinative process that functionsupstream of Shh expression [14]. However, a low basal levelof Shh expression is established in the absence of HAND2[14]. Therefore, initiation of Shh expression may rely onadditional signals, and one of these may emanate from aspecialized ectoderm that resides at the proximal border ofthe apical ectodermal ridge (AER; Figure 1), operatingthrough the T box-containing transcription factor TBX2[15]. As the limb bud emerges, the Hoxd gene cluster isactivated and becomes confined to the posterior mesen-chyme. Genetic analysis has shown that the 50HOXD factors
366
are essential for activation of Shh expression in the ZPA [16]and, in agreement with this, the 50HOXD proteins (specifi-cally HOXD10 and 13) may bind to the ZRS [17]. In addition,the regulatory function of HAND2 is mediated by directbinding near or at the ZRS and may bind as an activatingprotein complex with HOXD13 (and other 50HOXD proteins)[14]. Given that the ZRS is located a long distance from itstarget gene, it is crucial to convert transcription factorbinding to the ZRS into expression of Shh. In accord, chro-mosome architecture changes specifically in the expressingcells within the limb bud and two events are observed tooccur. First, a chromosomal looping mechanism brings theZRS close to the Shh promoter where they interact. Second,the Shh locus moves out of its chromosomal territory; fur-ther genetic analysis suggests that this is the event thatrelates directly to Shh activation [18].
Once Shh expression is initiated, members of the ETStranscription factor family act to establish the boundary ofexpression (Figure 1). Five ETS binding sites have beenidentified in the ZRS (Figure 3), at which two ETS factors,ETS1 and GABPa, were shown to bind [19]. Occupancy by
(a)
ZRS point mutations 105C
252G
258G
295T
297G
305A
329T
334T
396C
402C
404G
463T
475A
477A
406A
407T
555G
621C
739A
743T
769T
C A C A T C G T
Werner mesomelicsyndrome
T G G T A G G G CG AA,C
G
Wild-type Shh locus
ZRS duplication
ZRS
800kb
Lmbr1 Rnf32 Shh
ETS sites / ETV sites
(b)
Shh Rnf32 ZRS Lmbr1
Lmbr1ZRSRnf32Shh
Hs chromosome 7
Wild-type Shh locus
Inverted Shh locus
Limbenhancer
Limbenhancer
q22.1 q36.3
Chromosomalbreakpoint
TRENDS in Genetics
Figure 3. Mutations and chromosomal lesions in the Shh locus responsible for limb abnormalities. (a) Shh gene and the upstream regulatory region (including enhancers
shown as pink boxes). The ZRS cis-activator is shown as a gray box inside the Lmbr1 gene, enlarged above to show the number and position of the point mutations that
cause preaxial polydactyly. Some of the other Shh enhancers are shown in pink. The position [30] of the human (black), mouse (red), cat (green) and chicken (gray)
mutations are shown above the enlarged ZRS. The position of the ETS (ETS1/GABPa) (green ovals) and ETV (ETV4/ETV5) (blue ovals) binding sites identified by biochemical
and chromatin immunoprecipitation methods [17] are shown below the ZRS. The position of the Werner mesomelic syndrome mutations [29] are highlighted in blue. Below
the wild-type Shh locus is a representative intrachromosomal duplication that results in triphalangeal thumb-polysyndactyly (TPTPS). These duplications can be of various
sizes, such that the duplicated ZRSs can reside at various distances from the other. (b) Approximate position of the breakpoints of the intrachromosomal inversion on
human chromosome 7. Below the chromosome is a representation of the position of the Shh gene before and after the inversion, showing that Shh is now regulated by
another limb enhancer, a process called (enhancer adoption) [41].
Review Trends in Genetics August 2012, Vol. 28, No. 8
these factors at multiple sites within the ZRS is required toset the appropriate boundary of expression in the posteriormesenchyme. Two other ETS factors, the closely relatedETV4 and ETV5, act redundantly to oppose ETS1/GABPa
activation. Limb buds that are deficient in both ETV4 andETV5 ectopically express Shh in a domain in mesenchymeat the anterior margin of the limb [20,21], indicating that
their normal role is to restrict Shh expression to theposterior region of the limb. Although ETV4/5 bind totwo sites within the ZRS, the binding at one of these sitesis sufficient to regulate Shh negatively in the anteriordomain [19]. In addition to this regulatory role, ETV4and ETV5 were shown to modulate the activity of twoother transcription factors, TWIST1 and HAND2, which
367
Review Trends in Genetics August 2012, Vol. 28, No. 8
regulate Shh expression. A fine balance is proposed forTWIST1, an inhibitor of Shh expression, in the anterior ofthe limb and the positive regulator HAND2. An ETV4/5–TWIST1 complex is important in promoting the TWIST1inhibitory activity in the ectopic domain, perhaps by inhi-biting dimerization of TWIST1–HAND2 [22], which acts asan activator.
Finally, fibroblast growth factor (FGF) signaling is cen-tral to Shh expression, both as a positive and a negativeregulator (Figure 1). The FGFs 4, 8, 9 and 17 are expressedin the AER [23] (Figure 1) and mediate limb bud outgrowthand maintenance of Shh expression in the ZPA. In addi-tion, FGFs regulate the production of ETV4 and ETV5 andso are responsible for repression of Shh expression ectopi-cally at the anterior margin [20,21,23].
Limb deformities due to aberrant Shh expressionRegulatory mutations in the ZRS cause misexpression ofShh and are associated with limb malformations [24]. Thelimb defects that result from a mutant ZRS fall into
1
2(a)
(b) (c)
2 23 3 3
∗
4 45
Normal hand Isolatedtriphangial thumb
Preaxiapolydactyly
5
T T
Figure 4. Representative phenotypes for each of the limb abnormalities caused by mis
misexpression of the Shh gene. Bones are represented along the top, and each digit is nu
identified are labeled with an asterisk. Below are pictures of hands of patients with the va
with short-limb dwarfism. The X-ray shows the tibial hypoplasia in the right leg (the whit
(c) exhibiting the severe limb truncations that characterize this abnormality.
368
different clinical classifications that show a robust geno-type–phenotype correlation but comprise an overlappingspectrum of digit abnormalities. These are preaxial poly-dactyly type II (PPD2, MIM# 174500), which includesisolated triphalangeal thumb, triphalangeal thumb-polysyndactyly syndrome (TPTPS), syndactyly type IV(SD4, MIM# 186200) and Werner mesomelic syndrome(WMS) [13,25–32] (Figure 4). It has been suggested thatthis group of limb defects should be collectively referred toas ‘ZRS-associated syndromes’ [29].
PPD2 is characterized by a triphalangeal thumb(Figure 4) sometimes leading to the appearance of a five-fingered hand and, in some cases, may be accompanied byadditional digits. Fifteen single-point mutations in thehuman ZRS have been identified that are associated withthis limb abnormality (Figure 3). Extra toes have also beenfrequently observed in other species, including mice[13,33,34], cats [35] and chickens [36–38], and these ab-normalities are associated with seven more point muta-tions in the ZRS. A mutation in polydactylous dogs was
2 233 4
5
∗∗∗
4 4
l type2
Postaxialpolydactyly typeA
Triphalangial thumbpolysyndactyly
55
1
TRENDS in Genetics
expression of the Shh gene. (a) Types of digit abnormality of the hands caused by
mbered, the triphalangeal thumb is labeled ‘T’ and digits that cannot be accurately
rious disorders [26,28,87,88]. Werner mesomelic syndrome [29] in (b) is associated
e arrow indicates the end of the femur). A patient with acheiropodia [44] is shown in
Review Trends in Genetics August 2012, Vol. 28, No. 8
found in a conserved domain upstream of the ZRS, calledthe pre-ZRS; however, it is not clear how this domainregulates Shh expression [39]. Much of the understandingof the molecular mechanism underlying PPD2 comes fromstudies in mice. A single nucleotide change in the sequenceof the ZRS is sufficient to generate ectopic production ofShh such that it is anomalously expressed at the opposite,anterior margin of the limb bud [30,35,40]. Ectopic Shhexpression presumably produces an additional ZPA and,consequently, affects the GLI3R:GLI3A ratio, leading torespecification of the developing anterior digits. The phe-notypic outcome is seen in some cases as the transforma-tion of the thumb to a fifth finger, often accompanied by theproduction of additional digits.
Mechanisms that give rise to anomalous expression ofShh are being investigated. The two ETS factors thatregulate the SHH expression boundary play a central rolein generating polydactyly in two different families [19]. Inthese families, ZRS point mutations were shown to giverise to new, additional ETS1/GABPa binding sites, leadingto the upregulation of the ZRS in the posterior margin ofthe limb bud, setting a wider boundary of expression andcausing ectopic expression at the anterior margin. Becauseboth Ets1 and Gabpa genes are expressed at the anteriormargin (in mice) and the ZRS is primed for expression inthis ectopic region [18], the additional binding sites aresufficient to override the inhibition of Shh expression andcause ectopic expression. Another point mutation thatchanges transcription factor binding to the ZRS wasreported for a polydactylous mouse designated DZ [34].In this case, the point mutation introduced a higher affinitybinding site in the ZRS recognized by the nuclear factorHnRNP U, which was postulated to mediate the interac-tion between the cis-regulator and the 50 end of the Shhgene.
WMS (Figure 4) is an autosomal dominant disorder withpreaxial polydactyly of the hands and feet that also showsthe additional, distinctive characteristic of associateddwarfism [13,29]. This condition appears to be at the severeend of the phenotypic spectrum of ZRS mutations. Theshort stature is the result of tibial hypoplasia (i.e. verysmall or absent tibia). The molecular basis for this disorderis also a point mutation, but at a specific position, nucleo-tide 404 (either a G>A or G>C change) (Figure 3), of theZRS. Again, this mutation is likely to have an effect ontranscription factor binding that is causative of the pheno-type. Analysis of ZRS activity carrying the G>A mutationby mouse transgenesis suggests that expression in theectopic domain occurs at a high level and extends broadlyalong the anterior limb-bud margin [35]. This level ofectopic SHH production may disrupt specification of thetibia and affect chondrogenesis.
Recently, the genetic basis of a severe form of polysyn-dactyly (extra digits with fusions of digits, particularly ofthe hands) was reported. Haas type (syndactyly type IV)polysyndactyly and TPTPS [27–29] (Figure 4) show aconsistent association with intrachromosomal duplica-tions involving the genomic region that contains theZRS, leading to a tandem duplication of the ZRS (ortriplication in one patient) (Figure 3). The molecularmechanism that gives rise to this limb phenotype is not
known; however, it is reasonable to speculate that ectopicexpression of SHH in the anterior margin of the limb bud isresponsible for the polydactyly. The role that SHH expres-sion plays in the syndactyly phenotype in patients witheither Haas-type polysyndactyly or TPTPS is less clear;however, an isolated case of a patient with a distinct formof syndactyly was recently reported that may shed somelight on this process. This patient had fusions of all fingersand toes along the entire length of each digit, which wasshown to involve misregulation of the SHH gene in thelimb but did not involve the ZRS [41]. Chromosomalanalysis revealed that this patient had an intrachromo-somal inversion (Figure 3) with one breakpoint upstreamof the SHH gene such that it ended up under the influenceof a different enhancer at the other end of the breakpoint,freeing it from the influence of the ZRS and other regula-tors. This event was termed ‘enhancer adoption’. In mousetransgenics, this new enhancer was shown to drive ex-pression broadly in the limb, extending to later develop-mental stages and persisting in the interdigitalmesenchyme. Further transgenic studies showed that,by placing this enhancer upstream of the mouse Shh gene,expression was directed to the interdigital cells at a laterthan normal stage in development. Syndactyly is probablythe result of the rescue of the interdigital tissue from celldeath due to this abnormal expression of Shh. This alsosuggests that the ZRS duplications that cause TPTPSsimilarly affect the temporal expression of SHH in theinterdigital regions of the limb.
Another chromosomal inversion with a breakpoint be-tween Shh and the ZRS was previously reported in mice forthe Dsh (short digits) mutation [42] Shh is ectopicallyactivated in the cartilage of early digit primordia of theDsh heterozygous embryo, representing another exampleof spatial and temporal misregulation due to a chromo-somal rearrangement. However, in this case, it was postu-lated that the misexpression of Shh was the result of theremoval of a repressor that enabled additional expressionto occur in the early developing digits.
One other potential regulatory mutation of the SHHgene was uncovered in the genetic analysis of a conditioncalled acheiropodia (MIM# 200500), a rare, recessive con-dition in which the hands and feet are lacking. Thisphenotype is similar to that seen in mice lacking ZRSactivity [43]. However, the genetic lesion reported for thesefamilies [44] is a 4–6 kb deletion upstream (approximately30 kb) of the ZRS. These genetic data suggest that a second,limb-specific regulatory component exists within the de-leted DNA, the role of which may be to modify ZRS activity.
Taken together, these examples illustrate several dif-ferent mechanisms that can alter the regulation of Shh,with a significant impact on the developing embryo. Theserange from simple nucleotide changes in the ZRS thatcause ectopic expression to duplications that, more sur-prisingly, appear to affect both spatial and temporal ex-pression. These mutational mechanisms only appear toaffect Shh limb expression, as there is no evidence that theexpression is misdirected outside the limb bud. Finally,acheiropodia deletions appear to result in a lack of Shhlimb expression due to removal of an element that modifiesZRS activity.
369
Review Trends in Genetics August 2012, Vol. 28, No. 8
Ihh misregulation also causes limb abnormalitiesAnother Hh signaling factor, Ihh, is expressed in thecartilage of the developing long bones in the limb. Here,Ihh is expressed within the growth plate, where it isresponsible for regulation of chondrocyte proliferationand differentiation [45]. Ihh is not expressed at earlylimb-bud stages when Shh is expressed in the posteriormesenchyme, suggesting that Ihh has a role distinct fromShh. Despite this, IHH and Shh operate along similarsignaling pathways, including regulation of the conservedtarget GLI [46].
In humans, loss-of-function mutations in the IHH generesult in the autosomal recessive condition acrocapitofe-moral dysplasia (MIM# 607778) [47], while gain-of-function mutations of IHH result in brachydactyly typeA1 (MIM# 112500) [48,49]. Evidence suggests that the Ihhgene has a similar regulatory landscape as Shh and thatIhh is also under long-range regulatory control. This hasbeen highlighted through analysis of three families withsyndactyly type 1 (including some family members withpolydactyly) and craniosynostosis Philadelphia type(MIM# 601222). This condition was found to map to asingle locus at 2q35. Further analysis revealed that allthree families contained distinct microduplications, but allshared the same 9-kb region located within the intron of agene 40 kb upstream of IHH. This shared region contains aputative distant regulator of IHH and represents a similarsituation to the duplication of the ZRS in the cases ofTPTPS [50].
Disruption of the long-range regulation of Ihh is alsoconsidered to be the cause of the polydactyly phenotypeseen in the Doublefoot (Dbf ) mouse mutant. Dbf is anautosomal dominant mutation that results in extremepolydactyly of all four limbs, containing six to nine digitson each paw that are triphalangeal and arise preaxially[51,52]. Ihh is expressed ectopically within the mutantlimb bud across the A–P axis, disrupting normal SHHactivity and overriding Shh expression usually driven bythe ZRS. A 600-kb deletion starting approximately 50 kbupstream of Ihh underlies the Dbf phenotype. This regionis expected to contain a cis-acting regulatory element,which could be a repressor of Ihh expression that is re-moved by the deletion or, alternatively, a cryptic enhancerthat may normally be located beyond the deleted regionand moves into an activating position [53].
Gli3 mutants affect Hh signalingThe zinc finger-containing transcription factor GLI3 is theultimate target for Shh signaling in the early limb bud [54].Heterozygous mutations in the GLI3 gene cause Greigcephalopolysyndactyly syndrome (GCPS: MIM# 175700)and Pallister–Hall syndrome (PHS MIM# 146510), bothof which include polydactyly in the spectrum of disorders[55,56]. In addition, in rare cases, GLI3 mutations causenonsyndromic polydactyly (MIM# 174700). The PHS andGCPS phenotypes are clinically distinct and, as with theShh regulatory mutants, there is a robust genotype–pheno-type correlation [57]. The polydactyly phenotype in PHS hasa central or insertional polydactyly; whereas GCPS exhibitspre- or postaxial polydactyly (most commonly preaxial of thefeet and postaxial of the hands) with variable syndactyly
370
[58]. Truncating mutations in the middle third of the Gli3gene cause PHS, whereas large deletions or truncationmutations in the amino or carboxy terminal third of thegene cause GCPS. PHS mutations are predicted to be domi-nant mutations in which the truncated protein ends near theproteolytic cleavage site to constitutively produce a repres-sor protein with a similar activity to GLI3R. This wouldskew the balance of the activator and the repressor forms ofGLI3, resulting in an anteriorizing affect on the limb bud.GCPS mutations are predicted to be null mutations, and thephenotype results from a haploinsufficiency, suggestingthat absolute amounts of GLI3R and GLI3A are requiredfor development. Mouse mutations that represent Gli3 lossof function (Gli3xt and Gli3pdn) and a mutation (Gli3D699)that causes a PHS-like truncation of the protein near theproteolytic cleavage site support the notion that GCPS andPHS are clinically distinct [59,60]. Mouse studies suggestthat GLI3 has a Shh-independent activity in early limb-budstages [60], acting to restrict HAND2 expression (Figure 1);however, it is not clear what role this independent activityplays in heterozygous human GLI3 mutations.
Cilia, the Hh pathway and limb patterningTransduction of the Hh signal to the GLI protein is amultistep process (Box 2 and Figure 2) and, over the pastdecade, it has become clear that there is a connectionbetween the complex steps of the Hh pathway and a uniquestructural component of the cell, the cilia (reviewed in [61–63]). Although the cilia have several signaling roles, it hasbeen suggested [64] that, in early development, primarycilia in vertebrates are dedicated to Hh signal transduc-tion. The phenotypes caused by loss of cilia-associatedproteins are syndromic, and not all patients show limbabnormalities, which suggests that cilia play an active rolein mediating Hh signaling and do not simply serve as acompartment in which pathway components are concen-trated.
The primary cilium is a small organelle that projectsfrom the surface of the cell. It comprises a central structureof microtubules, called the axoneme, that functions tomaintain the cilium and extends the structure by transportof particles along its length. This intraflagellar transport(IFT) mechanism transports molecules from the base to thetip of the cilium. Evidence suggests that components of theIFT machinery are involved in the Hh signaling pathway.The GLI proteins (Gli2 and Gli3) are localized at the ciliatip and trafficked along the axoneme in response to Hhsignaling [65]. Thus, mutations in those proteins involvedin the trafficking process often have phenotypes reminis-cent of Hh signaling defects (Figure 2).
Several congenital human disorders, called ciliopathies,result from recessive mutations in genes that have a role inthe cilia or the basal body [66]. Ciliopathies are a hetero-geneous group of diseases presenting with a broad spec-trum of clinical phenotypes, including pre- and postaxialpolydactyly. For example, Joubert syndrome (MIM#213300), Meckel–Gruber syndrome (MIM# 249000), andBardet–Biedl syndrome (BBS, MIM# 209900) are all asso-ciated with polydactyly. Joubert syndrome can result frommutations in at least ten genes and is characterized by aspecific brain malformation with additional pathologies. A
Box 2. Vertebrate Hh signaling in the cilium
The main difference between mammalian and Drosophila Hh
signaling is the central role played by cilia in mammals but not in
flies [6] (Figure 2, main text). Drosophila lacking cilia develop almost
normally, indicating that cilia are not required for Drosophila Hh
signaling [82]. In vertebrates, several steps from recognition of SHH
to the processing of GLI1-3 (here referred to as GLI) in the limb
involve the cilia and IFT [83]. The cilium is maintained and extended
by transport of particles along the axoneme (reviewed in [60–62]).
The transport of molecules toward the cilia tip, via IFT, is called
‘anterograde trafficking’ (kinesin motor driven) and down the
axoneme toward the base of the cilia is referred to as ‘retrograde
trafficking’ (dynein driven) [62,63].
Signal transduction takes place in the cilia, where PTCH1 is
located in the absence of the ligand and represses the function of
smoothened (SMO), which resides in the repressed state in
cytoplasmic vesicles [84]. Upon activation by SHH, PTCH1 is
internalized and SMO is phosphorylated by a G protein-coupled
receptor kinase (GRK2). This phosphorylation promotes SMO
binding to b-arrestin and Kif3a, a requirement for the trafficking of
SMO into the cilium, where it activates GLI.
Full-length GLIs are present in the cilia in a complex with the
anterograde IFT kinesin motor KIF7 [68]. SUFU promotes the
truncation of GLI into the repressor form (GLIR) and the retrograde
IFT-dynein motor enables GLIR to reach the nucleus. Activation of
SMO relieves the inhibition that SUFU exerts and promotes the
activator form of GLI (GLIA) [85,86]. This process is promoted by
KIF7, which may also block the function of SUFU. GLIA reaches the
nucleus and activates the transcription of Hh targets genes, which
include PTCH.
In the absence of SHH signaling, the processing of GLIs requires
regulated proteolysis by the large multiprotein proteasome com-
plex. The GLIs are sequentially phosphorylated by kinases produ-
cing a phosphopeptide domain that is recognized by b-TrCP, which
recruits an SCF E3 ubiquitin ligase complex. Ubiquitination targets
Gli3 to the proteosome and initiates a limited degradation process,
allowing GLIR to be transported to the nucleus, where it inhibits
transcription [6].
Review Trends in Genetics August 2012, Vol. 28, No. 8
recent report highlights a mutation in the KIF7 gene [67],an ortholog of the Drosophila kinesin-encoding gene Cos-tal2 (Cos2), which is involved in Hh signaling (Figure 2).Reduction of KIF7 leads to a decrease in the number of cellsdisplaying primary cilia and misregulation of GLI. Alter-ation in the GLI3R:GLI3A ratio (as seen in GCPS) may beresponsible for the polydactyly. In mice, KIF7 was shown tobe a core regulator of SHH signaling and a putative ciliarymotor protein [68–70]. Interestingly, Cos2 differs in that ithas lost its kinesin motor function; this is in accord with theobservation that Drosophila do not use cilia for develop-mental signaling.
Mutations that broadly affect cilia structure and func-tion probably disrupt GLI3 processing, leading to polydac-tyly. BBS is a multisystem disorder that results frommutations in any one of 16 different genes and limb defectsusually appear as postaxial polydactyly. BBS is primarily adisease of the basal body [71], a microtubule-based, modi-fied centriole located at the base of the axoneme that servesas a nucleation site for the growth of the axoneme micro-tubules. Thus, cilia assembly (a complex process requiringhundreds of proteins), SHH signaling and GLI3 processingare tightly amalgamated. It seems probable that polydac-tyly in ciliopathies arises for various reasons. Clearly, someof the disease-causing mutations block important steps inthe transduction of the SHH signal. However, other defectsmay be more general and act to disrupt cilia architecture,
thus inhibiting the signaling process [72]. Both routeswould disrupt GLI3 processing, affecting the GLI3R:-GLI3A ratio and creating digit abnormalities, some phe-nocopying GCPS or PHS.
Concluding remarksSeveral mutational mechanisms alter Hh signaling atdifferent points in the pathway often impacting on thedeveloping limb bud. Regulatory mutations affectingShh expression play a central role in generating preaxialpolydactyly. In some cases, regulatory mutations affectingexpression of the closely related molecule, IHH, overridenormal developmental processes to affect adversely thedeveloping limb. It appears that these and an increasinglylarge number of other mutations ultimately disrupt pro-teolytic processing of GLI3, the prime target for Hh signal-ing. These other mutations include those that directlyaffect the structure of GLI3 and those that affect thecilia, a complex cellular structure that has a significantinvestment in Hh signaling. The large number of differentclinical manifestations, that includes the polydactyly phe-notype (at least 80 have been described) [3], is a hallmarkof the hundreds of genes, especially those that affect cilio-genesis, involved in Hh signaling, presenting a consider-able overall target for pathogenetic mutations. It is clearthat further genetic analysis of limb patterning will beinformative in generating insights into not only develop-mental biology, but also the basic biology of the cell.
References1 Abbasi, A.A. (2011) Evolution of vertebrate appendicular structures:
insight from genetic and palaeontological data. Dev. Dyn. 240, 1005–1016
2 Sun, G. et al. (2011) Twelve-year prevalence of common neonatalcongenital malformations in Zhejiang Province, China. World J.Pediatr. 7, 331–336
3 Biesecker, L.G. (2011) Polydactyly: how many disorders and how manygenes? 2010 update. Dev. Dyn. 240, 931–942
4 te Welscher, P. et al. (2002) Mutual genetic antagonism involving GLI3and dHAND prepatterns the vertebrate limb bud mesenchyme prior toSHH signaling. Genes Dev. 16, 421–426
5 Saunders, J.W. and Gasseling, M.T. (1968) Ectodermal–mesenchymalinteractions in the origin of limb symmetry. In Epithelial–Mesenchymal Interactions (Fleischmeyer, R. and Billingham, R.E.,eds), pp. 78–97, Williams & Wilkins
6 Wilson, C.W. and Chuang, P.T. (2010) Mechanism and evolution ofcytosolic Hedgehog signal transduction. Development 137, 2079–2094
7 Wang, B. et al. (2000) Hedgehog-regulated processing of GLI3 producesan anterior/posterior repressor gradient in the developing vertebratelimb. Cell 100, 423–434
8 Litingtung, Y. et al. (2002) Shh and Gli3 are dispensable for limbskeleton formation but regulate digit number and identity. Nature 418,979–983
9 te Welscher, P. et al. (2002) Progression of vertebrate limb developmentthrough SHH-mediated counteraction of GLI3. Science 298, 827–830
10 Towers, M. and Tickle, C. (2009) Growing models of vertebrate limbdevelopment. Development 136, 179–190
11 Zhu, J. et al. (2008) Uncoupling Sonic hedgehog control of pattern andexpansion of the developing limb bud. Dev. Cell 14, 624–632
12 Towers, M. et al. (2008) Integration of growth and specification in chickwing digit-patterning. Nature 452, 882–886
13 Lettice, L.A. et al. (2003) A long-range Shh enhancer regulatesexpression in the developing limb and fin and is associated withpreaxial polydactyly. Hum. Mol. Genet. 12, 1725–1735
14 Galli, A. et al. (2010) Distinct roles of Hand2 in initiating polarity andposterior Shh expression during the onset of mouse limb buddevelopment. PLoS Genet. 6, e1000901
371
Review Trends in Genetics August 2012, Vol. 28, No. 8
15 Nissim, S. et al. (2007) Characterization of a novel ectodermal signalingcenter regulating Tbx2 and Shh in the vertebrate limb. Dev. Biol. 304,9–21
16 Tarchini, B. and Duboule, D. (2006) Control of Hoxd genes’ colinearityduring early limb development. Dev. Cell 10, 93–103
17 Capellini, T.D. et al. (2006) Pbx1/Pbx2 requirement for distal limbpatterning is mediated by the hierarchical control of Hox gene spatialdistribution and Shh expression. Development 133, 2263–2273
18 Amano, T. et al. (2009) Chromosomal dynamics at the Shh locus: limbbud-specific differential regulation of competence and activetranscription. Dev. Cell 16, 47–57
19 Lettice, L.A. et al. (2012) Opposing functions of the ETS factor familydefine Shh spatial expression in limb buds and underlie polydactyly.Dev. Cell 22, 459–467
20 Mao, J. et al. (2009) Fgf-dependent Etv4/5 activity is required forposterior restriction of Sonic Hedgehog and promoting outgrowth ofthe vertebrate limb. Dev. Cell 16, 600–606
21 Zhang, Z. et al. (2009) FGF-regulated Etv genes are essential forrepressing Shh expression in mouse limb buds. Dev. Cell 16, 607–613
22 Zhang, Z. et al. (2010) Preaxial polydactyly: interactions among ETV,TWIST1 and HAND2 control anterior-posterior patterning of the limb.Development 137, 3417–3426
23 Fernandez-Teran, M. and Ros, M.A. (2008) The apical ectodermalridge: morphological aspects and signaling pathways. Int. J. Dev.Biol. 52, 857–871
24 Hill, R.E. (2007) How to make a zone of polarizing activity: insights intolimb development via the abnormality preaxial polydactyly. Dev.Growth Differ. 49, 439–448
25 Albuisson, J. et al. (2011) Identification of two novel mutations in Shhlong-range regulator associated with familial pre-axial polydactyly.Clin. Genet. 79, 371–377
26 Gurnett, C.A. et al. (2007) Two novel point mutations in the long-rangeSHH enhancer in three families with triphalangeal thumb andpreaxial polydactyly. Am. J. Med. Genet. A 143, 27–32
27 Klopocki, E. et al. (2008) A microduplication of the long range SHHlimb regulator (ZRS) is associated with triphalangeal thumb-polysyndactyly syndrome. J. Med. Genet. 45, 370–375
28 Sun, M. et al. (2008) Triphalangeal thumb-polysyndactyly syndromeand syndactyly type IV are caused by genomic duplications involvingthe long range, limb-specific SHH enhancer. J. Med. Genet. 45, 589–595
29 Wieczorek, D. et al. (2010) A specific mutation in the distant sonichedgehog (SHH) cis-regulator (ZRS) causes Werner mesomelicsyndrome (WMS) while complete ZRS duplications underlie Haastype polysyndactyly and preaxial polydactyly (PPD) with or withouttriphalangeal thumb. Hum. Mutat. 31, 81–89
30 Furniss, D. et al. (2008) A variant in the sonic hedgehog regulatorysequence (ZRS) is associated with triphalangeal thumb andderegulates expression in the developing limb. Hum. Mol. Genet. 17,2417–2423
31 Farooq, M. et al. (2010) Preaxial polydactyly/triphalangeal thumb isassociated with changed transcription factor-binding affinity in afamily with a novel point mutation in the long-range cis-regulatoryelement ZRS. Eur. J. Hum. Genet. 18, 733–736
32 Semerci, C.N. et al. (2009) Homozygous feature of isolatedtriphalangeal thumb-preaxial polydactyly linked to 7q36: nophenotypic difference between homozygotes and heterozygotes. Clin.Genet. 76, 85–90
33 Masuya, H. et al. (2007) A series of ENU-induced single-basesubstitutions in a long-range cis-element altering Sonic hedgehogexpression in the developing mouse limb bud. Genomics 89, 207–214
34 Zhao, J. et al. (2009) HnRNP U mediates the long-range regulation ofShh expression during limb development. Hum. Mol. Genet. 18, 3090–3097
35 Lettice, L.A. et al. (2008) Point mutations in a distant sonic hedgehogcis-regulator generate a variable regulatory output responsible forpreaxial polydactyly. Hum. Mol. Genet. 17, 978–985
36 Dunn, I.C. et al. (2011) The chicken polydactyly (Po) locus causes allelicimbalance and ectopic expression of Shh during limb development.Dev. Dyn. 240, 1163–1172
37 Maas, S.A. et al. (2011) Identification of spontaneous mutations withinthe long-range limb-specific Sonic hedgehog enhancer (ZRS) that alterSonic hedgehog expression in the chicken limb mutantsoligozeugodactyly and Silkie breed. Dev. Dyn. 240, 1212–1222
372
38 Dorshorst, B. et al. (2010) Genomic regions associated with dermalhyperpigmentation, polydactyly and other morphological traits in theSilkie chicken. J. Hered. 101, 339–350
39 Park, K. et al. (2008) Canine polydactyl mutations with heterogeneousorigin in the conserved intronic sequence of LMBR1. Genetics 179,2163–2172
40 Maas, S.A. and Fallon, J.F. (2005) Single base pair change in the long-range Sonic hedgehog limb-specific enhancer is a genetic basis forpreaxial polydactyly. Dev. Dyn. 232, 345–348
41 Lettice, L.A. et al. (2011) Enhancer-adoption as a mechanism of humandevelopmental disease. Hum. Mutat. 32, 1492–1499
42 Niedermaier, M. et al. (2005) An inversion involving the mouse Shhlocus results in brachydactyly through dysregulation of Shhexpression. J. Clin. Invest. 115, 900–909
43 Sagai, T. et al. (2005) Elimination of a long-range cis-regulatory modulecauses complete loss of limb-specific Shh expression and truncation ofthe mouse limb. Development 132, 797–803
44 Ianakiev, P. et al. (2001) Acheiropodia is caused by a genomic deletionin C7orf2, the human orthologue of the Lmbr1 gene. Am. J. Hum.Genet. 68, 38–45
45 Kronenberg, H.M. (2003) Developmental regulation of the growthplate. Nature 423, 332–336
46 Koziel, L. et al. (2005) GLI3 acts as a repressor downstream of Ihh inregulating two distinct steps of chondrocyte differentiation.Development 132, 5249–5260
47 Hellemans, J. et al. (2003) Homozygous mutations in IHH causeacrocapitofemoral dysplasia, an autosomal recessive disorder withcone-shaped epiphyses in hands and hips. Am. J. Hum. Genet. 72,1040–1046
48 Guo, S. et al. (2010) Missense mutations in IHH impair IndianHedgehog signaling in C3H10T1/2 cells: implications forbrachydactyly type A1, and new targets for Hedgehog signaling.Cell. Mol. Biol. Lett. 15, 153–176
49 Gao, B. et al. (2001) Mutations in IHH, encoding Indian hedgehog,cause brachydactyly type A-1. Nat. Genet. 28, 386–388
50 Klopocki, E. et al. (2011) Copy-number variations involving the IHHlocus are associated with syndactyly and craniosynostosis. Am. J.Hum. Genet. 88, 70–75
51 Yang, Y. et al. (1998) Evidence that preaxial polydactyly in theDoublefoot mutant is due to ectopic Indian Hedgehog signaling.Development 125, 3123–3132
52 Hayes, C. et al. (1998) Sonic hedgehog is not required for polarisingactivity in the Doublefoot mutant mouse limb bud. Development 125,351–357
53 Babbs, C. et al. (2008) Polydactyly in the mouse mutant Doublefootinvolves altered GLI3 processing and is caused by a large deletion in cisto Indian hedgehog. Mech. Dev. 125, 517–526
54 Hui, C.C. and Angers, S. (2011) GLI proteins in development anddisease. Annu. Rev. Cell Dev. Biol. 27, 513–537
55 Shin, S.H. et al. (1999) GLI3 mutations in human disorders mimicDrosophila cubitus interruptus protein functions and localization.Proc. Natl Acad. Sci. U.S.A 96, 2880–2884
56 Johnston, J.J. et al. (2010) Molecular analysis expands the spectrum ofphenotypes associated with GLI3 mutations. Hum. Mutat. 31, 1142–1154
57 Naruse, I. et al. (2010) Birth defects caused by mutations in humanGLI3 and mouse Gli3 genes. Congenit. Anom. 50, 1–7
58 Biesecker, L.G. (2008) The Greig cephalopolysyndactyly syndrome.Orphanet J. Rare. Dis. 3, 10
59 Hill, P. et al. (2007) The molecular basis of Pallister Hall associatedpolydactyly. Hum. Mol. Genet. 16, 2089–2096
60 Hill, P. et al. (2009) A SHH-independent regulation of Gli3 is asignificant determinant of anteroposterior patterning of the limbbud. Dev. Biol. 328, 506–516
61 Bettencourt-Dias, M. et al. (2011) Centrosomes and cilia in humandisease. Trends Genet. 27, 307–315
62 Gerdes, J.M. et al. (2009) The vertebrate primary cilium indevelopment, homeostasis, and disease. Cell 137, 32–45
63 Wong, S.Y. and Reiter, J.F. (2008) The primary cilium at thecrossroads of mammalian hedgehog signaling. Curr. Top. Dev. Biol.85, 225–260
64 Goetz, S.C. and Anderson, K.V. (2010) The primary cilium: a signallingcentre during vertebrate development. Nat. Rev. Genet. 11, 331–344
Review Trends in Genetics August 2012, Vol. 28, No. 8
65 Haycraft, C.J. et al. (2005) GLI2 and GLI3 localize to cilia and requirethe intraflagellar transport protein polaris for processing and function.PLoS Genet. 1, e53
66 Hildebrandt, F. et al. (2011) Ciliopathies. N. Engl. J. Med. 364, 1533–1543
67 Dafinger, C. et al. (2011) Mutations in KIF7 link Joubert syndromewith Sonic Hedgehog signaling and microtubule dynamics. J. Clin.Invest. 121, 2662–2667
68 Liem, K.F., Jr et al. (2009) Mouse Kif7/Costal2 is a cilia-associatedprotein that regulates Sonic hedgehog signaling. Proc. Natl Acad. Sci.U.S.A 106, 13377–13382
69 Endoh-Yamagami, S. et al. (2009) The mammalian Cos2 homolog Kif7plays an essential role in modulating Hh signal transduction duringdevelopment. Curr. Biol. 19, 1320–1326
70 Cheung, H.O. et al. (2009) The kinesin protein KIF7 is a criticalregulator of Gli transcription factors in mammalian hedgehogsignaling. Sci. Signal. 2, ra29
71 Zaghloul, N.A. and Katsanis, N. (2009) Mechanistic insights intoBardet–Biedl syndrome, a model ciliopathy. J. Clin. Invest. 119, 428–437
72 Ocbina, P.J. et al. (2011) Complex interactions between genescontrolling trafficking in primary cilia. Nat. Genet. 43, 547–553
73 Gallet, A. (2011) Hedgehog morphogen: from secretion to reception.Trends Cell Biol. 21, 238–246
74 Murone, M. et al. (1999) Sonic hedgehog signaling by the patched-smoothened receptor complex. Curr. Biol. 9, 76–84
75 Aza-Blanc, P. et al. (1997) Proteolysis that is inhibited by hedgehogtargets Cubitus interruptus protein to the nucleus and converts it to arepressor. Cell 89, 1043–1053
76 Hooper, J.E. and Scott, M.P. (2005) Communicating with Hedgehogs.Nat. Rev. Mol. Cell Biol. 6, 306–317
77 Jiang, J. (2006) Regulation of Hh/Gli signaling by dual ubiquitinpathways. Cell Cycle 5, 2457–2463
78 Ruel, L. and Therond, P.P. (2009) Variations in Hedgehog signaling:divergence and perpetuation in SUFU regulation of Gli. Genes Dev. 23,1843–1848
79 Hooper, J.E. (2003) Smoothened translates Hedgehog levels intodistinct responses. Development 130, 3951–3963
80 Jia, J. et al. (2003) Smoothened transduces Hedgehog signal byphysically interacting with Costal2/Fused complex through its C-terminal tail. Genes Dev. 17, 2709–2720
81 Marks, S.A. and Kalderon, D. (2011) Regulation of mammalian Gliproteins by Costal 2 and PKA in Drosophila reveals Hedgehog pathwayconservation. Development 138, 2533–2542
82 Basto, R. et al. (2006) Flies without centrioles. Cell 125, 1375–138683 Oh, E.C. and Katsanis, N. (2012) Cilia in vertebrate development and
disease. Development 139, 443–44884 Lum, L. and Beachy, P.A. (2004) The Hedgehog response network:
sensors, switches, and routers. Science 304, 1755–175985 Chen, M.H. et al. (2009) Cilium-independent regulation of Gli protein
function by Sufu in Hedgehog signaling is evolutionarily conserved.Genes Dev. 23, 1910–1928
86 Humke, E.W. et al. (2010) The output of Hedgehog signaling iscontrolled by the dynamic association between Suppressor of Fusedand the Gli proteins. Genes Dev. 24, 670–682
87 Heus, H.C. et al. (1999) A physical and transcriptional map of thepreaxial polydactyly locus on chromosome 7q36. Genomics 57, 342–351
88 Radhakrishna, U. et al. (1999) The phenotypic spectrum of GLI3morphopathies includes autosomal dominant preaxial polydactylytype-IV and postaxial polydactyly type-A/B; no phenotype predictionfrom the position of GLI3 mutations. Am. J. Hum. Genet. 65, 645–655
373
Regulation of chromatin structure bylong noncoding RNAs: focus on naturalantisense transcriptsMarco Magistri1, Mohammad Ali Faghihi1, Georges St Laurent III2 and ClaesWahlestedt1
1 Department of Psychiatry and Behavioral Sciences, and Center for Therapeutic Innovation, University of Miami Miller School of
Medicine, Miami, FL 33136, USA2 St Laurent Institute, Cambridge, MA 02139, USA
Review
In the decade following the publication of the HumanGenome, noncoding RNAs (ncRNAs) have reshaped ourunderstanding of the broad landscape of genome regu-lation. During this period, natural antisense transcripts(NATs), which are transcribed from the opposite strandof either protein or non-protein coding genes, havevaulted to prominence. Recent findings have shown thatNATs can exert their regulatory functions by acting asepigenetic regulators of gene expression and chromatinremodeling. Here, we review recent work on the mecha-nisms of epigenetic modifications by NATs and theiremerging role as master regulators of chromatin states.Unlike other long ncRNAs, antisense RNAs usually reg-ulate their counterpart sense mRNA in cis by bridgingepigenetic effectors and regulatory complexes at specificgenomic loci. Understanding the broad range of effectsof NATs will shed light on the complex mechanisms thatregulate chromatin remodeling and gene expression indevelopment and disease.
Chromatin and ncRNAs: coupling structure anddynamic informationHistone octamer proteins and their tightly associated146 bp of DNA form the nucleosome, the structural andfunctional core of eukaryotic chromatin. Specific combina-tions of DNA and histone post-translational modificationpatterns lead to diverse changes in chromatin states anddistinct functional genomic outputs [1,2]. DNA methyla-tion is perhaps the best-characterized chemical modifica-tion of DNA that impacts chromatin structure andfunction. In mammalian cells, DNA methylation occurson cytosine residues in CpG dinucleotides and correlateswith transcriptional repression. Promoter regions have ahigh density of CpG dinucleotides, whose methylationstate dictates the transcriptional activity of the gene.Chromatin structure and function are also regulated bypost-translational modifications of histone proteins.Histone-modifying enzymes are protein complexes thatdynamically recognize (read), add (write), remove (erase)or replace various chromatin modifications. Examplesof writers include EZH2, the catalytic subunit of the
Corresponding author: Wahlestedt, C. ([email protected]).Keywords: antisense RNA; epigenetics; transcriptome; chromatin; ncRNAs; NATs.
0168-9525/$ – see front matter � 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.101
polycomb repressive complex 2 (PRC2) which is responsiblefor the trimethylation of histone H3 at lysine 27(H3K27me3), and G9a, the histone methyltransferase(HMT) that catalyzes the di- or trimethylation of histoneH3 at lysine 9 (H3K9me2/3) [2,3]. ‘Erasers’, such as thedemethylase LSD1, specifically remove particular histonemarks [4]. ‘Readers’ function as interpreters and includeeffector proteins that recognize specific histone marks andtransduce this information into a genomic response [5–7].Writers, erasers and readers have to work in concert withtheir action tightly coordinated to produce an integratedregulatory effect. Recent discoveries of frequent interactionsbetween ncRNAs and chromatin strongly suggest pivotalroles for ncRNAs in orchestrating the function of theseprotein complexes. How chromatin-modifying enzymesspecifically recognize and bind to their target loci stillremains mysterious. One tempting hypothesis is that localtranscription of low-abundance ncRNAs might be the keyevent in the locus-specific recruitment of different reader,eraser and writer complexes.
Dynamic transcriptional regulation at the level ofchromatinThe classic division of chromatin into two opposing states,gene-rich euchromatin versus the silenced, tightly packedheterochromatin, has been challenged by recent discover-ies suggesting the existence of different chromatin states invarious organisms, including humans [8–13]. The two-statechromatin model assumed that the chromatin structure wasessentially an on/off switch whereby a gene was either activeor repressed, without any intermediate states. By contrast,a dynamic chromatin state varies between these extremesand represents an integration of information derivedfrom an intricate network of histone-modifying enzymes,chromatin-binding proteins, transcription factors andchromatin-associated RNA transcripts [14,15].
Globally, RNA, which is an integral structural compo-nent of chromatin, is required for the maintenance ofcompact chromatin fibers [16]. RNA has also been shownto be involved in the maintenance of higher-order chroma-tin structure at pericentric heterochromatin in mouse cells[17], highlighting the important contribution of RNA to theregulation of chromatin structure and function. Recently, a
6/j.tig.2012.03.013 Trends in Genetics, August 2012, Vol. 28, No. 8 389
Review Trends in Genetics August 2012, Vol. 28, No. 8
genome-wide next-generation RNA sequencing approachwas used to identify the RNA content of chromatin inhuman fibroblasts [18]. Surprisingly, more than 70% ofthe sequencing reads aligned with intergenic and intronicregions of the human genome. Functional experiments on asmall number of chromatin RNA transcripts imply aninteraction with chromatin-modifying enzymes, whichraises the possibility of a functional role of these tran-scripts in chromatin regulation [18].
Further support for the notion that RNA regulateschromatin comes from a small but growing number ofantisense transcripts [19,20] and other long ncRNAs[21–24] that interact with epigenetic effectors to orches-trate chromatin remodeling and epigenetic changes duringdevelopment and disease. Cell type-specific ncRNAs inter-act with ubiquitously expressed regulatory proteins toform RNA–protein complexes that can interact with his-tones, DNA, other RNAs, and other chromatin-modifyingcomplexes, to dynamically coordinate changes in geneexpression programs (reviewed in [25]). RNA motifs com-posed of primary sequence information coupled to highlydiverse secondary structure elements underlie the com-plexity and dynamic nature of these interactions. Thecombination of structural and regulatory elements of thechromatin contributes to the acquisition of a specific chro-matin state and is key to understanding the mechanismsgoverning the organization of the human genome and theregulation of gene expression.
Natural antisense transcripts (NATs)A substantial fraction of the mammalian genome is tran-scribed in the form of non-protein-coding RNAs [26–29]that have important regulatory functions in development,differentiation [30–32] and human diseases [19,33–35].Although there is no unequivocal classification of
mRNA
Figure 1. Epigenetic regulation induced by NATs. NATs regulate the epigenetic lands
secondary structure permits the NAT to interact with different chromatin-modifying enzy
specific epigenetic modifications of the nearby chromatin (green and red flags). Locus sp
and the DNA.
390
non-protein-coding transcripts found in the mammaliangenome, ncRNAs can be roughly divided on the basis of sizeinto short ncRNAs (less than 200 nt in length) and longncRNAs (lncRNAs) [36,37]. Short ncRNAs include miR-NAs, piRNA, endogenous siRNAs and snoRNAs, whichhave been extensively reviewed elsewhere [38–40] andtherefore will not be discussed here. lncRNAs are a hetero-geneous group of RNAs transcribed from intergenic [41] orintragenic regions [42], which vary in length from 200 nt toover 100 kb [37]. NATs are a class of lncRNA molecules [43]that are transcribed from the opposite DNA strand of otherRNA transcripts with which they share sequence comple-mentarity [26,44–46]. Antisense RNAs could potentiallyexert a regulatory function on their corresponding sensemRNA at different levels [47]. NAT regulatory mecha-nisms fall into four main categories: mechanisms relatedto transcription (including epigenetic interactions), RNA–DNA interactions, RNA–RNA interactions in the nucleusand RNA–RNA interactions in the cytoplasm [48]. Amongthese four mechanisms, RNA-mediated epigenetic modifi-cation has received an increasing amount of experimentalsupport. Antisense transcripts can provide a scaffold foreffector proteins to interact with DNA and chromatin in alocus specific way.
NATs: cis-acting epigenetic silencersUnlike transcription factors, many histone-modifyingenzymes lack specific DNA-binding domains [15]. Basedon this important observation, it has been postulated thatncRNAs might interact with ubiquitously expressed his-tone-modifying enzymes providing the required level ofbinding specificity (Figure 1).
In mammalian cells, dosage compensation offeredthe first characterized examples of antisense lncRNA-mediated chromatin remodeling and gene silencing [49].
NAT
TRENDS in Genetics
cape of genomic loci from which they are transcribed (cis regulation). A specific
mes (green, red and purple shapes), thereby coordinating their action and directing
ecificity may be achieved through sequence-specific interactions between the NAT
Review Trends in Genetics August 2012, Vol. 28, No. 8
One of the two mammalian female X chromosomes isinactivated via an RNA-based mechanism in which theantisense ncRNA Xist, expressed from the X chromosome,mediates the recruitment of polycomb repressive complex2 (PRC2) that in turn catalyzes the heterochromatinizationof the entire X chromosome [21,49].
A similar mechanism of RNA-based epigenetic regula-tion of gene expression was found to silence variousimprinted mammalian alleles. Most imprinted mammali-an genes associate in clusters [50], and the presence ofNATs is a common feature of these loci [26,51,52]. Forexample, Air is an imprinted, paternally expressedlncRNA transcribed from the second intron of the mouseinsulin-like growth factor 2 receptor (Igf2r) gene [53]. Inmouse placenta, expression of Air induces the epigeneticsilencing of both the paternal allele of Igf2r, from which Airis expressed, and neighboring upstream genes. Althoughthe transcription unit of Air only overlaps with Igf2r, Airrecognizes and binds to the promoter regions of its neigh-boring genes. The molecular mechanisms underlying theseinteractions have not been clarified and might rely on aspecific secondary structure adopted by Air or on theinvolvement of mediator proteins. The Air ncRNA interac-tion with the promoter of upstream genes in the clusterresults in the recruitment of the HMT G9a, which gener-ates a repressive chromatin state [54]. The ability of Air tosilence non-overlapping genes in cis is reminiscent of Xist-induced X-chromosome inactivation. In the case of Xist,epigenetic silencing spreads through the entire X chromo-some, in contrast to the case of imprinted genes whereepigenetic silencing spreads only to a significant portion ofthe locus. The extent of the spread of epigenetic silencingmay be related to the presence of insulator elements in theDNA sequence and their association with the CCCTC-binding factor (CTCF) [55], a multifunctional protein thatenables insulator function and facilitates higher-orderchromatin interactions [56].
Another interesting example of imprinting regulation isthe antisense ncRNA transcript Kcnq1ot1, which is tran-scribed from intron 10 of the imprinted gene Kcnq1 [57].This paternally expressed NAT silences Kcnq1 in cis, aswell as neighboring genes on the paternal chromosome, bycontrolling chromatin and DNA modifications at that locus[58]. Kcnq1ot1 mediates the allele-specific deposition of therepressive histone marks H3K27me3 and H3K9me3 bydirect interaction with the PRC2 components Ezh2,Suz12 and the H3K9-specific HMT, G9a [58,59]. Similarto the situation with Air, the epigenetic changes caused byKcnq1ot1 occur outside the sequence boundary of thislncRNA, emanating bidirectionally from the Kcnq1 locus.Some of the imprinted genes in this cluster, althoughsilenced, lack Kcnq1ot1 enrichment [60].
Based on these examples, cis-acting NATs may remainlinked to their transcription loci but exert their regulatoryfunction on the neighboring genes via the recruitment ofdifferent proteins and the organization of higher-orderchromatin structures. The presence or absence of insulatorelements may influence the extension of chromatin altera-tions in each locus [61]. In this hypothetical scenario, theantisense transcript acts as a scaffold for the recruitmentof chromatin-modifying enzymes, initiating events that
expand in both directions to the entire chromosome, as inthe case of X-chromosome inactivation, or to the entireimprinted cluster. In this model, the recruitment of chro-matin-modifying complexes is dependent on antisense RNAexpression, whereas the expansion of these effects dependson the subsequent involvement of DNA insulator elements.
Taken together, these imprinting studies imply that alarge portion of NATs could exert their regulatory role bybinding to chromatin enzymes and recruiting them in cis totheir targets. In favor of this hypothesis, RNA immuno-precipitation (RIP) experiments targeting Ezh2, coupledwith directional RNA sequencing (RIP-seq), revealed thatthe PRC2 complex associates with almost 10 000 RNAs inmouse embryonic stem cells (mESCs) [62]. Almost 3000 ofthese RNAs are NATs, and around 1000 are bidirectionaltranscripts. Interestingly, some NATs linked to disease lociwere found to immunoprecipitate with Ezh2, such asHspa1a-AS, Bgn-AS, Foxn2-AS and Malat1-AS [62], sug-gesting that ncRNAs target the PRC2 complex to chroma-tin. Unfortunately, in this study RIP-sequencing data werenot integrated with ChIP-sequencing data, and theauthors did not investigate the possible overlap betweenthe genomic localization of PRC2 and the immunoprecipi-tated RNA transcripts. Nevertheless, the presence of NATsassociated with PRC2 suggests the importance of theseRNA transcripts in mediating the recruitment of chroma-tin-modifying complexes.
Accumulating evidence implies that the interaction ofNATs with EZH2 and other HMTs is more common thanpreviously believed, contributing to the epigenetic regula-tion of many autosomal loci. In addition to the finding thatlncRNAs interact with histone-modifying enzymes, theyhave also been shown to play a role in DNA methylation.ANRIL is a NAT that overlaps with the INK4b/ARF/INK4a locus [63]. This locus encodes two cyclin-dependentkinase inhibitors, p15INK4b and p16INK4a, and a regula-tor of the p53 pathway, ARF [64]. The ANRIL transcriptalso overlaps with several polymorphisms discovered ingenome-wide association studies (GWAS) that correspondto increased risk for cardiovascular disease and diabetes[65]. An initial study showed that ANRIL expressioninversely correlates with p15INK4b expression in acutelymphoblastic leukemia and acute myeloid leukemia. Itwas demonstrated that ANRIL mediates the silencing ofthe tumor suppressor gene p15INK4b via DNA methyla-tion and heterochromatin formation in a Dicer-indepen-dent manner, thus excluding the involvement ofendogenous small RNAs in the process [20]. Later, itwas shown that ANRIL, EZH2 and the PRC1 componentCBX7 are upregulated in several prostate cancer tissuespecimens with an inverse correlation to the expression ofp16INK4a [19]. Moreover, ANRIL physically associateswith CBX7 and colocalizes with EZH2 and CBX7 to thepromoter region of p16INK4a in prostate cancer cells.Thus, the NAT ANRIL participates in the silencing oftwo very important tumor-suppressor genes via two dis-tinct mechanisms, and the alteration of these regulatorycircuits has been found in different types of cancer.
Evidence of a functional interaction between NATs andPRC2 comes from a study on the cyclin-dependent kinaseinhibitor p21, another important tumor-suppressor gene.
391
Review Trends in Genetics August 2012, Vol. 28, No. 8
Bidirectional transcription at the p21 locus generates anantisense transcript and p21 mRNA. The p21 NATrepresses p21 mRNA in a process involving the depositionof the repressive histone mark H3K27me3 [66]. This mech-anism is AGO1-independent, further excluding involve-ment of endogenous small RNA mediators in theprocess. Thus, depending on the cellular context, an im-balanced expression of NATs can result in the silencing oractivation of partner protein-coding genes, providing aninteresting potential mechanism to explain the aberrantupregulation or silencing of cancer-related genes.
Among the different body tissues, the brain expresses ahigh abundance of ncRNAs [67]. Discovered in the devel-oping mouse forebrain, the NAT Evf2 is transcribed fromthe ultra-conserved Dlx5/6 region encoding the homeodo-main transcription factors DLX5 and DLX6 [68]. Evf2forms a complex with the DLX-2 homeodomain proteinto function as a transcriptional coactivator that increasesDlx5/6 enhancer activity [68]. Recently, studies of an Evf2loss-of-function mouse revealed more complex regulatoryfunctions of this NAT in the development of GABAergicinterneurons [69]. Through antisense interference, Evf2negatively regulates the expression of Dlx6 mRNA. More-over, Evf2 exerts a silencing effect on Dlx5 by recruitingDLX and the methyl CpG binding protein 2 (MECP2) to theenhancer region [69]. Mutant Evf2 mice have reducednumbers of GABAergic interneurons in the dentate gyrusof the early postnatal hippocampus and reduced synapticinhibition in the adult hippocampus [69]. This study high-lights the importance of NATs in regulating gene expres-sion during neuronal maturation and raises the possibilityof a more extended role of antisense transcripts in centralnervous system development.
In recent studies, repeat expansion diseases have oftenbeen characterized by bidirectional transcription overlap-ping the repeat region [70]. Spinocerebellar ataxia type 7(SCA7) is a neurological disorder associated with a poly-glutamine repeat (CAG) expansion in the ataxin-7 gene(ATXN7) [71]. SCAANT1 is a 1.4 kb long NAT overlappingthe ATXN7 gene that is actively transcribed upon CTCFbinding to target sites flanking the CAG repeat region [72].SCAANT1 expression is associated with an increased levelof the repressive H3K27me3 mark and a decreased level ofthe activating histone H3 acetylation mark at the ATXN7promoter. The pathological increase of CAG expansion isaccompanied by reduced expression of SCAANT1 ncRNAand increased expression of ATXN7 mRNA, showing aninverse relationship between the NAT and its partnersense transcript [72]. This study reveals an interestingNAT-based mechanism that is potentially involved inSCA7 pathogenesis.
NATs can silence gene expression in cis, making themattractive therapeutic targets to achieve specific upregula-tion of gene expression. It has recently been shown thatbrain-derived neurotrophic factor (BDNF) is under theepigenetic control of an antisense transcript, BDNF-AS[73]. Depletion of BDNF-AS can alter chromatin marks atthe BDNF locus and upregulate locus-specific gene expres-sion. This study also described NAT-mediated endogenousgene suppression of glia-derived neurotrophic factor(GDNF) and ephrin B2 receptor (EPHB2), suggesting that
392
antisense RNA-mediated transcriptional suppression is afrequent phenomenon [73]. Considering the frequency withwhich NATs are transcribed, these examples may repre-sent only the tip of the iceberg, with the regulatory role ofNATs in epigenetic modifications representing a morecommon event than previously imagined.
NATs: cis-acting epigenetic activatorsThe first observation that lncRNAs are involved in epige-netic gene activation stems from dosage compensationstudies in Drosophila, where the imbalanced presence ofX chromosomes in the sexes necessitates compensation bya twofold upregulation of all the genes on the single male Xchromosome [74]. Two lncRNAs, roX1 and roX2, play afundamental role in the correct targeting of the DosageCompensation Complex to many different binding sites onthe male X chromosome, which results in transcriptionalupregulation. These and other examples provide accumu-lating evidence of a central role for NATs in the epigeneticactivation of specific loci on a genome-wide basis, providinginsight into the biological language of lncRNAs [75].
Following these initial findings in Drosophila, severalother examples of ncRNAs in vertebrates have beenreported. Among these, a ncRNA expression-profile studyof mESC differentiation identified several ncRNAs associat-ed with important mESC protein-coding genes [30]. Amongthese ncRNAs, two concordantly upregulated NATs coloca-lized with their sense mRNA partners during a specific stepof mESC differentiation. The NATs, named Evx1as andHoxb5/6as, are transcribed from the opposite DNA strandof Evx1 and Hoxb5/6, respectively [30]. Using RNA-ChIP,the authors found that these NATs immunoprecipitate withH3K4me3, demonstrating a physical interaction with atranscriptional activation mark [30]. Furthermore, RNA-IP experiments showed direct interaction between Evx1asand Hoxb5/6as with MLL1, the mammalian trithorax pro-tein responsible for H3K4me3 in the promoter region ofseveral developmental genes [30]. This finding raise thepossibility that these NATs are involved in the epigeneticactivation of their mRNA partners during differentiation.
In another example of epigenetic activation, the chro-matin-associated ncRNA transcript termed Intergenic10,located in the region 30 to FANK1 in the opposite orienta-tion, overlaps with the protein-coding gene ADAM12 [18].The expression of Intergenic10 correlates positively withthe expression of the neighboring protein coding genes.siRNA depletion of Intergenic10 resulted in the concordantdownregulation of ADAM12 and FANK1 and a decrease inthe levels of the active chromatin mark H3K4me2 in thepromoter regions of the downregulated genes [18]. NATsmay bind and recruit in cis chromatin-modifying enzymesto establish a locus-specific transcriptionally active chro-matin state.
Taken together, these observations show that a chro-matin-associated ncRNA can act as a chromatin remodelerin cis to regulate positively or negatively the expression ofneighboring genes.
LncRNAs: trans-acting chromatin remodelersControversy still exists regarding the functional signifi-cance of many long and short ncRNA transcripts that are
Review Trends in Genetics August 2012, Vol. 28, No. 8
pervasively transcribed in the human genome, and partic-ularly those originating in the proximity of the transcrip-tional start sites (TSSs) of many active genes. However,cell-, tissue- and developmental-specific transcription oflncRNAs argues against the simplistic assumption thatthese arise from transcriptional noise. Moreover, removalof these ncRNAs often correlates with functional conse-quences. Aside from NATs, the human genome producesmany other classes of lncRNAs. For example, the analysisof chromatin signatures revealed a family of over 1000highly conserved lncRNAs, termed large intergenic non-coding RNAs (lincRNAs), that contain sense and antisensemembers with many potential regulatory functions [41].RNA-IP experiments of the PRC2 complex componentEZH2 followed by hybridization to a custom exon-tilingarray for 900 human lincRNAs showed that almost 30% ofexpressed lincRNAs physically interact with PRC2 [76].Immunoprecipitation of lncRNAs with EZH2 is highlysuggestive of functional roles of these transcripts throughthe PRC2 pathway. The catalog of lincRNAs encoded in thehuman genome as well as the understanding of their rolesin mediating the function of chromatin-modifying com-plexes is rapidly expanding.
Unlike most NATs, lincRNAs exert their regulatoryroles in trans to alter chromatin shape and gene expressionat distant loci. HOTAIR is a lincRNA encoded in antisenseorientation in the HOX-C cluster on chromosome 12 that isnecessary for the correct expression of the HOX-D clusterof genes on chromosome 2 [23]. HOTAIR associates withthe PRC2 complex to silence and maintain a large domainof heterochromatin in the HOX-D gene cluster. Genomicregions flanking HOX-D contain high levels of H3K27me3and low levels of H3K4me2/3 [77]. It was shown in severalcellular systems that HOTAIR acts as a modular scaffoldfor the recruitment of both PRC2 and LSD1, the catalyticsubunit of the repressor complex CoREST/REST, which inturn coordinate the methylation of H3K27me3 and de-methylation of H3K4me2/3, respectively, in trans at manydifferent target genomic regions [78]. Interestingly, alteredHOTAIR expression in primary breast tumors is apowerful predictor of metastasis and poor prognosis [35].Inhibition of HOTAIR expression in cancer cells reducesinvasiveness and metastatic potential, consistent with itsphysiological function in dictating chromatin states offibroblast during development [35].
A loss-of-function study in mESCs produced a functionalcharacterization of a large number of lincRNAs [32]. It wasshown that lincRNAs maintain the pluripotent state andrepress lineage programs in mESCs via trans-actingmechanisms of global gene expression regulation. mESCslincRNAs associate with 12 different chromatin complexesinvolved in different aspects of epigenetic regulation, suchas writers (Tip60/P400, Prc2, Setd8, Eset, Suv39), readers(Prc1, Cbx1, Cbx3) and erasers (Jarid1b, Jarid1c, Hdac1)[32]. Seventy-four lincRNAs associate with at least one ofthese complexes and several lincRNAs associate withfunctionally related chromatin complexes [32]. BecauselincRNAs physically associate with multiple chromatin-regulatory proteins, they may serve as scaffolds tobridge together similar complexes into larger functionalunits.
Similar to NATs, lncRNAs can be involved in theepigenetic activation of specific loci. HOTTIP is a spliced,polyadenylated lncRNA transcribed in the opposite orien-tation from the 50 end of the HOXA locus [79]. HOTTIPknockdown in fibroblasts and chick embryos resulted indecreased HOXA expression, affecting a region 40 kbdownstream from the 50 end of the HOXA locus. Thisrepressive effect depends on the distance from the HOTTIPgene; genes in close proximity exhibit a greater decrease inexpression levels [79]. These changes in gene expressioncorrelated with a global loss of H3K4me3 and H3K4me2across the affected region. RIP experiments demonstrateddirect binding of HOTTIP with WDR5, a component ofthe core complex responsible for H3K4 methylation [79].Ectopically expressed HOTTIP does not induce the expres-sion of 50 HOXA genes in fibroblast cells, implying a cismechanism of action for HOTTIP. Artificial recruitment ofHOTTIP RNA upstream of a silent GAL4 promoter canboost transcription in the presence of WDR5, confirmingthe cis effect of the HOTTIP transcript in the proximity ofthe target genes [79].
Mechanisms of lncRNA interactions with chromatin andchromatin-modifying enzymesThe ability of lncRNAs to function as scaffolds for therecruitment of different yet functionally related enzymesand to confer locus specificity to these enzymes raises twoimmediate questions: what mediates the interactionsbetween ncRNAs and specific chromatin enzymes, andwhat is the language of molecular rules governing them?One of the first hints of a mechanism governing ncRNA–enzyme interactions came from studies of the X-chromo-some inactivation phenomenon. It was shown that a novelncRNA termed Repeat A (RepA) directly binds to EZH2 andfunctions in the recruitment of PRC2 to the X chromosome[21]. RepA is a 1.6 kb ncRNA transcribed within Xist and iscomposed of 7.5 tandem repeat sequences that fold into twoconserved stem–loop structures crucial for EZH2 binding[21]. These initial findings were subsequently confirmed byan independent study showing that short RNAs 50–200 ntin length are transcribed from the 50 end of polycomb targetgenes [80]. Interestingly, these short RNAs have stem–loopstructures similar to RepA and are able to bind the PRC2component SUZ12 [80]. Similarly, the antisense Kcnq1ot1has a conserved RNA repeat that was shown to be neces-sary for the epigenetic silencing of imprinted genes [60].These studies imply that lncRNAs assume specific second-ary structures offering different docking sites for differentenzymes.
In large part, how NATs bind to target genes to guidechromatin-modifying enzymes to specific loci remains un-explained (Figure 2). Two recently developed methods forprofiling the genome-wide occupancy of lncRNAs haveallowed high-throughput identification of RNA–DNA andRNA–protein interactions [81,82]. The application of thesenew techniques may represent a promising tool to explorethe mechanisms governing ncRNA–chromatin interac-tions, as shown by the informative analysis performedon a few known lncRNAs (roX2, TERC and HOTAIR)[82]. Interestingly, among the discovered DNA bindingsites of both rox2 and TERC, specific consensus DNA
393
(a)
(b)
(c)
Sense mRNA
Antisense RNA
Sense mRNA
Antisense RNA
Sense mRNA
Antisense RNA
TRENDS in Genetics
Figure 2. Molecular mechanisms of NATs and chromatin interactions. Two types
of interactions are necessary for any ncRNA-induced chromatin modification to
take place: between an antisense RNA molecule and a chromatin-modifying
enzyme (CME), and between either a CME and DNA or antisense RNA and DNA.
The second type of interaction is necessary to confer sequence specificity to the
chromatin modifications. Each one of these interactions (RNA–protein, RNA–DNA
or DNA–protein) can either take place through sequence motifs (digital Watson–
Crick base pairing) or by RNA secondary structure. NATs function as intermediates
that target CMEs to locus-specific regions of the genome. The molecular
mechanisms governing the interaction between NATs and chromatin remain
poorly characterized. Here, we propose three different possible scenarios by which
this interaction occurs. (a) Specific binding of antisense RNA to a CME as well as to
a DNA region by forming a unique secondary structure. (b) The sequence motif
dictates the interaction between the antisense RNA molecule and the target DNA.
In this model, antisense RNA binds specifically to CMEs and to a particular DNA
region. (c) Nonspecific binding of antisense RNA to a DNA sequence. In this model,
local antisense transcription leads to a specific chromatin modification. The
specificity in this model comes from the promoter of antisense RNA and the fact
that transcription will lead to particular modifications. NATs do not physically
associate with the chromatin. In this case, locus-specificity is achieved by nascent
NATs that are recognized by chromatin-modifying enzymes.
Review Trends in Genetics August 2012, Vol. 28, No. 8
sequences have been observed, thus suggesting that spe-cific DNA motifs might be important for the recruitment ofthese and other lncRNAs to their target genomic loci.HOTAIR binding sites contain a GA-rich polypurine motif,reminiscent of mammalian Polycomb response elements. Itis notable that although the HOTAIR binding sites overlapwith PRC2 and H3K27me3 chromatin regions, they arerestricted to small regions of a few hundred bp, raising thepossibility that HOTAIR nucleates PRC2 binding andH3K27me3 spreading [82]. These data, together with
394
the discovery that HOTAIR binding to its genomic targetsdoes not require EZH2, demonstrate that ncRNAs arerequired for specific recognition of DNA sequences as wellas recruitment of polycomb proteins, which in turn modifythe neighboring chromatin. This study demonstrates thatlocus-specific interaction between ncRNAs and chromatintakes place independently from ncRNA–enzyme interac-tion and pointed out the existence of specific RNA-target-ing motifs among ncRNA target sites. These motifs mayrepresent binding sites for structural elements within thencRNA, in case of direct RNA–DNA interaction, or mayfunction as the binding site for mediator proteins that mayinduce HOTAIR recruitment.
Concluding remarksAlthough the examples of NAT and lncRNA mechanismsdescribed above suggest a broad continuum of function forncRNAs in epigenetic regulation, the exact roles and mech-anisms of most of these molecules remain largely un-known. NATs have emerged as powerful transducers ofbiological information, primarily due to their ability tobridge the interaction between proteins and DNA [83].The information content and structural features of thesencRNAs collectively establish a dynamic interface withother macromolecules [83], thus facilitating the formationand modulation of ribonucleoprotein complexes crucial forepigenetic signaling. These unique features permit NATsand other lncRNAs to function as scaffolds to regulateepigenetic mechanisms within the cell. The key to futurestudies of lncRNAs will be to integrate successfully thelayers of knowledge gained from multiple genomic, tran-scriptomic, proteomic and epigenomic approaches to createa multidimensional understanding of NATs within theexisting cellular framework [84].
AcknowledgmentsThe authors would like to thank Dr Chiara Pastori and Roya PedramFatemi for helpful discussions and critical reading of the manuscript. Theresearch on long ncRNAs in C.W.’s laboratory is supported in large part bygrants from the U.S. National Institutes of Health (5R01NS063974 and5R01MH084880). M.M.’s postdoctoral studies are supported by a fellowshipfrom the Swiss National Science Foundation (PBGEP3-136151).
References1 Chi, P. et al. (2010) Covalent histone modifications – miswritten,
misinterpreted and mis-erased in human cancers. Nat. Rev. Cancer10, 457–469
2 Daniel, J.A. et al. (2005) Effector proteins for methylated histones: anexpanding family. Cell Cycle 4, 919–926
3 Cao, R. and Zhang, Y. (2004) The functions of E(Z)/EZH2-mediatedmethylation of lysine 27 in histone H3. Curr. Opin. Genet. Dev. 14, 155–164
4 Shi, Y. et al. (2004) Histone demethylation mediated by the nuclearamine oxidase homolog LSD1. Cell 119, 941–953
5 Maurer-Stroh, S. et al. (2003) The Tudor domain ‘Royal Family’: Tudor,plant Agenet, Chromo, PWWP and MBT domains. Trends Biochem.Sci. 28, 69–74
6 Mellor, J. (2006) It takes a PHD to read the histone code. Cell 126, 22–24
7 Kouzarides, T. (2007) Chromatin modifications and their function. Cell128, 693–705
8 van Steensel, B. (2011) Chromatin: constructing the big picture. EMBOJ. 30, 1885–1895
9 Schubeler, D. (2010) Chromatin in multicolor. Cell 143, 183–18410 Filion, G.J. et al. (2010) Systematic protein location mapping reveals
five principal chromatin types in Drosophila cells. Cell 143, 212–224
Review Trends in Genetics August 2012, Vol. 28, No. 8
11 Kharchenko, P.V. et al. (2011) Comprehensive analysis ofthe chromatin landscape in Drosophila melanogaster. Nature 471,480–485
12 Ernst, J. and Kellis, M. (2010) Discovery and characterization ofchromatin states for systematic annotation of the human genome.Nat. Biotechnol. 28, 817–825
13 Gerstein, M.B. et al. (2010) Integrative analysis of the Caenorhabditiselegans genome by the modENCODE project. Science 330, 1775–1787
14 Bonasio, R. et al. (2010) Molecular signals of epigenetic states. Science330, 612–616
15 Bernstein, E. and Allis, C.D. (2005) RNA meets chromatin. Genes Dev.19, 1635–1655
16 Rodriguez-Campos, A. and Azorin, F. (2007) RNA is an integralcomponent of chromatin that contributes to its structuralorganization. PLoS ONE 2, e1182
17 Maison, C. et al. (2002) Higher-order structure in pericentricheterochromatin involves a distinct pattern of histone modificationand an RNA component. Nat. Genet. 30, 329–334
18 Mondal, T. et al. (2010) Characterization of the RNA content ofchromatin. Genome Res. 20, 899–907
19 Yap, K.L. et al. (2010) Molecular interplay of the noncoding RNAANRIL and methylated histone H3 lysine 27 by polycomb CBX7 intranscriptional silencing of INK4a. Mol. Cell 38, 662–674
20 Yu, W. et al. (2008) Epigenetic silencing of tumour suppressor gene p15by its antisense RNA. Nature 451, 202–206
21 Zhao, J. et al. (2008) Polycomb proteins targeted by a short repeat RNAto the mouse X chromosome. Science 322, 750–756
22 Martianov, I. et al. (2007) Repression of the human dihydrofolatereductase gene by a non-coding interfering transcript. Nature 445,666–670
23 Rinn, J.L. et al. (2007) Functional demarcation of active and silentchromatin domains in human HOX loci by noncoding RNAs. Cell 129,1311–1323
24 Bierhoff, H. et al. (2010) Noncoding transcripts in sense and antisenseorientation regulate the epigenetic state of ribosomal RNA genes. ColdSpring Harb. Symp. Quant. Biol. 75, 357–364
25 Wang, K.C. and Chang, H.Y. (2011) Molecular mechanisms of longnoncoding RNAs. Mol. Cell 43, 904–914
26 Katayama, S. et al. (2005) Antisense transcription in the mammaliantranscriptome. Science 309, 1564–1566
27 Carninci, P. et al. (2005) The transcriptional landscape of themammalian genome. Science 309, 1559–1563
28 Birney, E. et al. (2007) Identification and analysis of functionalelements in 1% of the human genome by the ENCODE pilot project.Nature 447, 799–816
29 Clark, M.B. et al. (2011) The reality of pervasive transcription. PLoSBiol 9, e1000625 discussion e1001102
30 Dinger, M.E. et al. (2008) Long noncoding RNAs in mouse embryonicstem cell pluripotency and differentiation. Genome Res. 18, 1433–1445
31 Ahfeldt, T. et al. (2012) Programming human pluripotent stem cellsinto white and brown adipocytes. Nat. Cell Biol. 14, 209–219
32 Guttman, M. et al. (2011) lincRNAs act in the circuitry controllingpluripotency and differentiation. Nature 477, 295–300
33 Ji, P. et al. (2003) MALAT-1, a novel noncoding RNA, and thymosinbeta4 predict metastasis and survival in early-stage non-small celllung cancer. Oncogene 22, 8031–8041
34 Faghihi, M.A. et al. (2008) Expression of a noncoding RNA is elevatedin Alzheimer’s disease and drives rapid feed-forward regulation ofbeta-secretase. Nat. Med. 14, 723–730
35 Gupta, R.A. et al. (2010) Long non-coding RNA HOTAIR reprogramschromatin state to promote cancer metastasis. Nature 464, 1071–1076
36 Wright, M.W. and Bruford, E.A. (2011) Naming ‘junk’: human non-protein coding RNA (ncRNA) gene nomenclature. Hum. Genomics 5,90–98
37 Kapranov, P. et al. (2007) RNA maps reveal new RNA classes and apossible function for pervasive transcription. Science 316, 1484–1488
38 Ghildiyal, M. and Zamore, P.D. (2009) Small silencing RNAs: anexpanding universe. Nat. Rev. Genet. 10, 94–108
39 Malone, C.D. and Hannon, G.J. (2009) Small RNAs as guardians of thegenome. Cell 136, 656–668
40 Carthew, R.W. and Sontheimer, E.J. (2009) Origins and mechanisms ofmiRNAs and siRNAs. Cell 136, 642–655
41 Guttman, M. et al. (2009) Chromatin signature reveals over a thousandhighly conserved large non-coding RNAs in mammals. Nature 458,223–227
42 Nakaya, H.I. et al. (2007) Genome mapping and expression analyses ofhuman intronic noncoding RNAs reveal tissue-specific patterns andenrichment in genes related to regulation of transcription. GenomeBiol. 8, R43
43 Sun, M. et al. (2006) Evidence for variation in abundance of antisensetranscripts between multicellular animals but no relationship betweenantisense transcription and organismic complexity. Genome Res. 16,922–933
44 Kiyosawa, H. et al. (2003) Antisense transcripts with FANTOM2clone set and their implications for gene regulation. Genome Res. 13,1324–1334
45 Chen, J. et al. (2005) Human antisense genes have unusually shortintrons: evidence for selection for rapid transcription. Trends Genet. 21,203–207
46 Chen, J. et al. (2005) Genome-wide analysis of coordinate expressionand evolution of human cis-encoded sense–antisense transcripts.Trends Genet. 21, 326–329
47 Lapidot, M. and Pilpel, Y. (2006) Genome-wide natural antisensetranscription: coupling its regulation to its different regulatorymechanisms. EMBO Rep. 7, 1216–1222
48 Faghihi, M.A. and Wahlestedt, C. (2009) Regulatory roles of naturalantisense transcripts. Nat. Rev. Mol. Cell Biol. 10, 637–643
49 Lee, J.T. et al. (1999) Tsix, a gene antisense to Xist at the X-inactivationcentre. Nat. Genet. 21, 400–404
50 Verona, R.I. et al. (2003) Genomic imprinting: intricacies of epigeneticregulation in clusters. Annu. Rev. Cell Dev. Biol. 19, 237–259
51 Mohammad, F. et al. (2009) Epigenetics of imprinted long noncodingRNAs. Epigenetics 4, 277–286
52 Wan, L.B. and Bartolomei, M.S. (2008) Regulation of imprinting inclusters: noncoding RNAs versus insulators. Adv. Genet. 61, 207–223
53 Sleutels, F. et al. (2002) The non-coding Air RNA is required forsilencing autosomal imprinted genes. Nature 415, 810–813
54 Nagano, T. et al. (2008) The Air noncoding RNA epigeneticallysilences transcription by targeting G9a to chromatin. Science 322,1717–1720
55 Kim, T.H. et al. (2007) Analysis of the vertebrate insulator proteinCTCF-binding sites in the human genome. Cell 128, 1231–1245
56 Gaszner, M. and Felsenfeld, G. (2006) Insulators: exploitingtranscriptional and epigenetic mechanisms. Nat. Rev. Genet. 7, 703–713
57 Smilinich, N.J. et al. (1999) A maternally methylated CpG island inKvLQT1 is associated with an antisense paternal transcript and loss ofimprinting in Beckwith–Wiedemann syndrome. Proc. Natl. Acad. Sci.U.S.A. 96, 8064–8069
58 Pandey, R.R. et al. (2008) Kcnq1ot1 antisense noncoding RNA mediateslineage-specific transcriptional silencing through chromatin-levelregulation. Mol. Cell 32, 232–246
59 Terranova, R. et al. (2008) Polycomb group proteins Ezh2 and Rnf2direct genomic contraction and imprinted repression in early mouseembryos. Dev. Cell 15, 668–679
60 Kanduri, C. (2011) Kcnq1ot1: a chromatin regulatory RNA. Semin. CellDev. Biol. 22, 343–350
61 Ghirlando, R. et al. (2012) Chromatin domains, insulators, and theregulation of gene expression. Biochim. Biophys. Acta (http://dx.doi.org/10.1016/j.bbagrm.2012.01.016)
62 Zhao, J. et al. (2010) Genome-wide identification of polycomb-associated RNAs by RIP-seq. Mol. Cell 40, 939–953
63 Pasmant, E. et al. (2007) Characterization of a germ-line deletion,including the entire INK4/ARF locus, in a melanoma-neural systemtumor family: identification of ANRIL, an antisense noncoding RNAwhose expression coclusters with ARF. Cancer Res. 67, 3963–3969
64 Popov, N. and Gil, J. (2010) Epigenetic regulation of the INK4b–ARF–INK4a locus: in sickness and in health. Epigenetics 5, 685–690
65 Pasmant, E. et al. (2011) ANRIL, a long, noncoding RNA, is anunexpected major hotspot in GWAS. FASEB J. 25, 444–448
66 Morris, K.V. et al. (2008) Bidirectional transcription directs bothtranscriptional gene activation and suppression in human cells.PLoS Genet. 4, e1000258
67 Qureshi, I.A. et al. (2010) Long non-coding RNAs in nervous systemfunction and disease. Brain Res. 1338, 20–35
395
Review Trends in Genetics August 2012, Vol. 28, No. 8
68 Feng, J. et al. (2006) The Evf-2 noncoding RNA is transcribed from theDlx-5/6 ultraconserved region and functions as a Dlx-2 transcriptionalcoactivator. Genes Dev. 20, 1470–1484
69 Bond, A.M. et al. (2009) Balanced gene regulation by an embryonicbrain ncRNA is critical for adult hippocampal GABA circuitry. Nat.Neurosci. 12, 1020–1027
70 Batra, R. et al. (2010) Partners in crime: bidirectional transcription inunstable microsatellite disease. Hum. Mol. Genet. 19, R77–R82
71 Martin, J-J. (2012) Spinocerebellar ataxia type 7. In Handbook ofClinical Neurology (Vol. 103; Ataxic Disorders) (Subramony, S.H.and Du rr, A., eds), pp. 475–491, Elsevier
72 Sopher, B.L. et al. (2011) CTCF regulates ataxin-7 expression throughpromotion of a convergently transcribed, antisense noncoding RNA.Neuron 70, 1071–1084
73 Modarresi, F. et al. (2012) Inhibition of natural antisense transcripts invivo results in gene-specific transcriptional upregulation. Nat.Biotechnol. (http://dx.doi.org/10.1038/nbt.2158)
74 Straub, T. and Becker, P.B. (2011) Transcription modulationchromosome-wide: universal features and principles of dosagecompensation in worms and flies. Curr. Opin. Genet. Dev. 21, 147–153
75 Ilik, I. and Akhtar, A. (2009) roX RNAs: non-coding regulators of themale X chromosome in flies. RNA Biol. 6, 113–121
396
76 Khalil, A.M. et al. (2009) Many human large intergenic noncodingRNAs associate with chromatin-modifying complexes and affect geneexpression. Proc. Natl. Acad. Sci. U.S.A. 106, 11667–11672
77 Fanti, L. et al. (2008) The trithorax group and Pc group proteins aredifferentially involved in heterochromatin formation in Drosophila.Chromosoma 117, 25–39
78 Tsai, M.C. et al. (2010) Long noncoding RNA as modular scaffold ofhistone modification complexes. Science 329, 689–693
79 Wang, K.C. et al. (2011) A long noncoding RNA maintains activechromatin to coordinate homeotic gene expression. Nature 472, 120–124
80 Kanhere, A. et al. (2010) Short RNAs are transcribed from repressedpolycomb target genes and interact with polycomb repressive complex-2. Mol. Cell 38, 675–688
81 Simon, M.D. et al. (2011) The genomic binding sites of a noncodingRNA. Proc. Natl. Acad. Sci. U.S.A. 108, 20497–20502
82 Chu, C. et al. (2011) Genomic maps of long noncoding RNA occupancyreveal principles of RNA-chromatin interactions. Mol. Cell 44, 667–678
83 St Laurent, G., 3rd and Wahlestedt, C. (2007) Noncoding RNAs:couplers of analog and digital information in nervous systemfunction? Trends Neurosci. 30, 612–621
84 Hawkins, R.D. et al. (2010) Next-generation genomics: an integrativeapproach. Nat. Rev. Genet. 11, 476–486
Genetic basis of blood pressure andhypertensionSandosh Padmanabhan1, Christopher Newton-Cheh2 and Anna F. Dominiczak1
1 Institute of Cardiovascular and Medical Sciences, College of Medical, Veterinary and Life Sciences, University of Glasgow,
Glasgow G12 8TA, UK2 Harvard Medical School, Massachusetts General Hospital, Broad Institute of Harvard University and Massachusetts Institute
of Technology, Boston, MA 02114, USA
Review
Blood pressure (BP) is a complex trait regulated by anintricate network of physiological pathways involvingextracellular fluid volume homeostasis, cardiac contrac-tility and vascular tone through renal, neural or endocrinesystems. Untreated high BP, or hypertension (HTN), isassociated with increased mortality, and thus a betterunderstanding of the pathophysiological and geneticunderpinnings of BP regulation will have a major impacton public health. However, identifying genes that contrib-ute to BP and HTN has proved challenging. In this reviewwe describe our current understanding of the geneticarchitecture of BP and HTN, which has accelerated overthe past five years primarily owing to genome-wide as-sociation studies (GWAS) and the continuing progress inuncovering rare gene mutations, epigenetic markers andregulatory pathways involved in the physiology of BP. Wealso look ahead to future studies characterizing novelpathways that affect BP and HTN and discuss strategiesfor translating current findings to the clinic.
The complexity of BP and HTNBP is a quantitative trait that is distributed normally inthe general population. In adults there is a continuous,incremental risk of cardiovascular disease, stroke andrenal disease associated with high BP. HTN is definedbased on a cut-off at the upper end of the distribution of BP‘at which the benefits of action (i.e., therapeutic interven-tion) exceed those of inaction’ [1]. Based on this definition,there are over 1 billion people with HTN worldwide, andthe World Health Organization suggests this will rise to1.5 billion by 2020 [2]. The high prevalence of HTN and itsconsequent significant adverse economic impact on theindividual and population highlight the importance ofprimary prevention of HTN. Thus, there is a pressingneed for a greater understanding of the pathophysiologicaland genetic underpinnings of BP regulation and dysregu-lation. Studies have demonstrated that BP is a geneticallydetermined trait, with estimates of heritability rangingfrom 31% to 68% [3,4]. The BP/HTN phenotype posesunique challenges for genetic dissection that have madeprogress slow (Box 1). BP levels are determined by cardiacoutput and peripheral vascular resistance, and these inturn are regulated by a complex network of interactingphysiological pathways involving extracellular fluid
Corresponding author: Dominiczak, A.F. ([email protected]).Keywords: hypertension; blood pressure; genetics; sodium; artery; kidney.
0168-9525/$ – see front matter � 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.101
volume homeostasis, cardiac contractility and vasculartone through renal, neural or endocrine systems. Pertur-bations in any of these physiological pathways can arisefrom environmental (for example salt intake) or geneticfactors or a combination of both that result in high or lowBP. Rare monogenic BP syndromes are characterized by amajor gene defect affecting a single pathway commonlyinvolving renal electrolyte balance (Box 2). The phenotyp-ic heterogeneity is further complicated by intra-individualBP variability caused by a large number of factors includ-ing measurement technique, instrument error and patientfactors such as anxiety and activity level [5]. All thesegenotypic and phenotypic complexities have resulted inboth false-positive and false-negative studies in the past.The search for BP genes initially focused on genome-widelinkage studies that were successful in uncovering genesfor monogenic forms of high and low blood pressure butturned out to be largely unsuccessful in explaining thepolygenic BP phenotype. The recent successes of GWAS[6–12] are testament to greater rigor in phenotypic char-acterization and statistical design. The limitations andpotential of GWAS in the dissection of hypertension arehighlighted in a recent debate [13,14].
In this review we describe the explosion in our under-standing of the genetic architecture of BP and HTN thathas occurred over the past five years and the continuingprogress in uncovering new gene mutations causing rareinherited forms of HTN and hypotension. We review theroad ahead, highlighting the novel pathways identified,both common and rare, and discuss future strategies touncover novel mechanisms and clinical translation. Table1 summarizes the genomic and functional context of all themonogenic and GWAS BP/HTN loci discovered to date.
Common variantsGWAS use dense sets of single nucleotide polymorphisms(SNPs; usually �500 000–1 000 000) and rely on linkagedisequilibrium (LD) or correlation patterns of typed (orimputed) SNPs with functional variants. This means theidentified SNPs are usually proxies of ungenotyped func-tional variants. Although these SNPs show unequivocalassociation with BP, the functional dissection of thesesignals is not straightforward. Associations detected byGWAS between BP/HTN and SNPs, with 50 kb flankingsegments, are shown in Figure 1 (GWAS SNPs were se-lected from studies with sample sizes greater than 20 000
6/j.tig.2012.04.001 Trends in Genetics, August 2012, Vol. 28, No. 8 397
Box 1. Phenotypic complexity
Variation in extracellular fluid volume, the contractile state of the heart
and vascular tone contribute to variation in BP level. Other determi-
nants of BP include age, weight, ethnicity and diet. Systolic BP (SBP)
increases linearly from age 30 to 84 years together with mean arterial
pressure [a weighted average of SBP and diastolic pressure (DBP)],
but DBP increases linearly up to age 50–60, after which it begins to
decline with a steep increase in pulse pressure (the difference between
SBP and DBP) [52,53]. The late decline of DBP after age 60 and the
continuous rise in systolic BP reflects the increased large artery
stiffness in older age. The odds of progression to HTN increase by 20–
30% for every 5% gain in body weight.
At all ages, HTN is more common in African Americans than in
whites; in all ethnic and racial groups it is more common in those with
lower socioeconomic status. Interestingly, HTN is more prevalent in
the African-American population in the USA than in either Afro-
Caribbean or native black African populations [54–56]. In some
societies, BP shows only a small age-related increase and may be
related in part to their agrarian lifestyle as well as the high potassium,
low sodium diet of the hunter-gatherer, a more rural lifestyle and a
lower consumption of food [57–60]. From an evolutionary perspective,
essential HTN is a disease of civilization – with its abundance of
processed foods and long lifespans – and could be an undesirable
pleiotropic effect of a genotype that may have optimized fitness in an
ancient environment [61]. The rates of HTN and sodium sensitivity are
generally higher in individuals carrying the ancestral alleles of
sodium-conserving genes, which show strong latitudinal clines with
the ancestral sodium-conserving alleles more prevalent in African
populations and less so in the northern regions [62–64].
Review Trends in Genetics August 2012, Vol. 28, No. 8
and SNPs attaining a P value <5�10–8 for significantassociation). Many of the SNPs from GWAS that attainedgenome-wide significance also show similar strong associa-tions for other traits (pleiotropy) (Figure 1a) – for example,rs13333226 shows independent association with HTN andchronic kidney disease [11,15,16]; rs3184504 shows
Box 2. Qualitative or quantitative phenotype
From a genetic perspective, whether BP is considered as a
quantitative trait or a dichotomous disease phenotype has major
implications for studies of genetic causation, and this was recognized
very early. In the 1950s a technical controversy about the unimodal or
bimodal distribution of blood pressure led to the famous Platt–
Pickering debate [65], and this is a useful platform to understand the
assumptions that have driven genetic research of BP/HTN so far
(Figure I). Platt measured BP in normotensive and hypertensive
probands and their relatives and found a bimodal distribution of BP
values (Figure Ia). This led him to argue that HTN was simple
mendelian disease caused by a single dominant genetic mutation. By
contrast, Pickering studied BP distributions from the second to eighth
decades in first-degree relatives of normotensive and hypertensive
probands and instead found a unimodal distribution of BP (Figure Ib).
Pickering concluded that the continuous Gaussian distribution of BP
values indicated that BP was inherited as a ‘graded character’ and is
hence a polygenic non-mendelian trait [65,66]. Thus, according to
Platt
Hyp
erte
nsio
n
Blood pressure
Gene mutation absent Gene mutation present
Pop
ulat
ion
freq
uenc
y
(a)
Figure I. The Platt–Pickering debate about the quantitative or qualitative nature of HTN
by a single, heritable genetic mutation. (b) Pickering suggested that there was a range
hypertension and normotension; instead, HTN represented the end of a continuum a
398
significant association with chronic kidney disease, celiacdisease, type 1 diabetes, coronary artery disease, choles-terol, hemoglobin, retinal vascular caliber, plasma eosino-phil count and rheumatoid arthritis [6,9,16–24]; rs1799945is also associated with serum iron concentration andhemoglobin levels in addition to its association with blood
Platt, hypertensive individuals were a distinct sub-population,
whereas according to Pickering, hypertension was only the upper
portion of a continuous distribution curve of BP. The debate dragged
on through the 1960s with much resistance to accepting Pickering’s
quantitative model; this only changed with mounting evidence from
epidemiological studies indicating that high BP was a risk factor for
cardiovascular disease and intervention trials of antihypertensive
therapy all showing similar benefits of reducing BP. Further support
comes from large-scale GWAS for BP that have mapped common
variants at 29 loci with small effects [6,8–11,67]. However, Platt’s
model cannot be discounted entirely because there are rare
mendelian forms of HTN and hypotension that are caused by highly
penetrant rare genetic variants with large effects [40]. Pickering’s
concession at the end of the long debate aptly summarizes the current
understanding of HTN genetics – ‘I never denied the possibility that
there may be a group in what we now call ‘‘essential hypertension’’
characterized by single-gene inheritance.’
Pickering
Hyp
erte
nsio
n
Gene (0-n)
Environment (0-n)
Blood pressure
Pop
ulat
ion
freq
uenc
y
(b)
TRENDS in Genetics
. (a) Platt argued that HTN occurred in a discrete subpopulation and was caused
of BP levels in the population, and that there was no clear dividing line between
nd was therefore polygenic in origin.
Table 1. Genetic loci associated with monogenic BP syndromes and identified through GWASa
CHR Gene/nearest
gene
Monogenic syndrome GWASb Pathway Notes
1p36.13 CLCNKB Bartter syndrome, type 3
OMIM #607364
Renal electrolyte
balance
Autosomal recessive
Impaired chloride reabsorption in
the thick ascending loop of Henle
leads to impaired sodium
reabsorption
Low/normal BP
1p36.13 SDHB Paragangliomas 4
OMIM #115310
Sympathetic system Multiple catecholamine-secreting
head and neck paragangliomas and
retroperitoneal
pheochromocytomas
1p36.2 MTHFR
(NPPA, NPPB)
CHARGE, GBPG,
AGEN, ICBP,
Gene-centric
Renal electrolyte
balance
Methylene-tetrahydrofolate
reductase; has been associated
with changes in plasma
homocysteine levels and pre-
eclampsia. Atrial natriuretic and
brain natriuretic peptides genes
have been associated with
hypertension
1q23.3 SDHC Paragangliomas 3
OMIM #605373
Sympathetic system Tumors or extra-adrenal
paraganglia- associated
pheochromocytoma
1q42.2 AGT Gene-centric Renal electrolyte
balance,
vascular function
The cleaved products angiotensin I,
angiotensin II and angiotensin III
are known regulators of BP and
sodium homeostasis
2q36.2 CUL3 Pseudohypoaldosteronism
type IIE
OMIM *603136
Renal electrolyte
balance
Modulation of renal salt, K+ and H+
handling in response to
physiological challenge
3p25.3 VHL von Hippel–Lindau syndrome
OMIM #193300
Sympathetic system Autosomal dominant
Associated with retinal, cerebellar,
and spinal hemangioblastoma,
renal cell carcinoma (RCC),
pheochromocytoma, and
pancreatic tumors
3q22.1 ULK4 CHARGE, GBPG, ICBP Serine-threonine kinase of
unknown function
3q26.2 MECOM
(MDS1)
GBPG, ICBP Myelodysplasia syndrome
protein 1
4q21.2 FGF5 GBPG, AGEN, ICBP Fibroblast growth factor 5;
stimulates cell growth and
proliferation and is associated with
angiogenesis
4q31.2 NR3C2 Hypertension exacerbation
in pregnancy
OMIM #605115
Renal electrolyte
balance
Autosomal dominant
Missense mutation (S810L) in the
mineralocorticoid receptor
Low-renin, low-aldosterone,
hypokalemia
Pseudohypoaldosteronism
type I
OMIM #177735
Renal electrolyte
balance
Autosomal dominant
Renal unresponsiveness to
mineralocorticoids
5p15.3 SDHA Paragangliomas 5
OMIM #614165
Sympathetic system Tumors or extra-adrenal
paraganglia-associated
pheochromocytoma
5p13.3 NPR3 AGEN, ICBP,
Gene-centric
Renal electrolyte
balance
Natriuretic peptide clearance
receptor
5q31.2 KLHL3 Pseudohypoaldosteronism
type IID
OMIM #614495
Renal electrolyte
balance
Modulation of renal salt, K+ and H+
handling in response to
physiological challenge
6p22.2 HFE Hemochromatosis
OMIM #235200
ICBP, Gene-centric Autosomal recessive
Iron metabolism
7p22 Familial
hyperaldosteronism type 2
OMIM #605635
Steroid/aldosterone
synthesis
Autosomal dominant
Hyperaldosteronism due to
adrenocortical hyperplasia not
suppressed by dexamethasone
Review Trends in Genetics August 2012, Vol. 28, No. 8
399
Table 1 (Continued )
CHR Gene/nearest
gene
Monogenic syndrome GWASb Pathway Notes
7q36.1 NOS3 Pregnancy-induced
hypertension
OMIM +163729
HYPERGENES,
Gene-centric
Endothelial function Nitric oxide plays an important role
in the maintenance of
cardiovascular and renal
homeostasis
8q24.3 CYP11B1,
CYP11B2
Familial
hyperaldosteronism type 1
Glucocorticoid-
remediable
aldosteronism (GRA)
OMIM #103900
Steroid/aldosterone
synthesis
Autosomal dominant
Chimeric gene
Plasma and urinary aldosterone
responsive to ACTH;
dexamethasone suppressible
within 48 h
8q24.3 CYP11B2 Corticosterone
methyloxidase
II deficiency
OMIM #61060
Steroid/aldosterone
synthesis
Autosomal recessive
Enzymatic defect results in
decreased aldosterone and salt-
wasting
8q24.3 CYP11B1 Steroid 11b-hydroxylase
deficiency
OMIM #202010
Steroid/aldosterone
synthesis
Enzyme dysfunction leads to
increased levels of MR-activating
hormones
10p12.3 CACNB2 CHARGE, ICBP ?Vascular/cardiac
function
Subunit of voltage-gated calcium
channel expressed in heart
10q11.2 RET Multiple endocrine
neoplasia type IIA
OMIM #171400
Sympathetic
system
Autosomal dominant
Associated with multiple endocrine
neoplasms, including medullary
thyroid carcinoma,
pheochromocytoma, and
parathyroid adenomas
10q24.3 CYP17A1 17a-hydroxylase and/or
17,20-lyase deficiency
OMIM *609300
CHARGE, GBPG,
AGEN-BP, ICBP
Steroid/aldosterone
synthesis
Cytochrome p450 enzyme
mediating the first step in
mineralocorticoid and
glucocorticoid synthesis. Enzyme
dysfunction leads to increased
levels of MR activating hormones.
Also involved in sex steroid
synthesis
11p15.1 PLEKHA7 CHARGE, ICBP Plextrin-homology domain-
containing family member
expressed in zona adherens of
epithelial cells
11p15.2 SOX6 Gene-centric Transcription Required for normal development
of the central nervous system,
chondrogenesis, and maintenance
of cardiac and skeletal muscle cells
11p15.5 LSP1/TNNT3 Gene-centric ?Endothelial
function
Expressed in leukocytes and
endothelial cells. Involved in
signaling, regulating the
cytoskeletal architecture and
neutrophil migration
11q12.2 SDHAF2 Paragangliomas 2
OMIM #601650
Sympathetic
system
Tumors or extra-adrenal
paraganglia-associated
pheochromocytoma
11q23.1 SDHD Paragangliomas 1
OMIM #16800
Sympathetic
system
Tumors or extra-adrenal
paraganglia associated
pheochromocytoma
11q24.3 KCNJ1 Bartter syndrome,
antenatal, type 2
OMIM #241200
Renal electrolyte
balance
Reduced potassium recycling leads
to impaired sodium
Reabsorption
12p12.2 Hypertension with
Brachydactyly
Bilginturan syndrome
OMIM %112410
Inversion, deletion, and reinsertion
at 12p12.2 to p11.2
No specific biochemical findings
12p12.3 WNK1 Pseudohypoaldosteronism
type IIC
Gordon’s syndrome
OMIM #614492
Renal electrolyte
balance
Autosomal dominant
Gain-of-function mutations in
WNK1
Low plasma renin, normal or
elevated K+
12q21.3 ATP2B1 CHARGE, GBPG, AGEN,
ICBP, Gene-centric
?Vascular function Encodes plasma membrane
calcium- or calmodulin-dependent
ATPase expressed in endothelium
Review Trends in Genetics August 2012, Vol. 28, No. 8
400
Table 1 (Continued )
CHR Gene/nearest
gene
Monogenic syndrome GWASb Pathway Notes
12q24.1 SH2B3 CHARGE, GBPG, ICBP ?Endothelial
function
Also known as lymphocyte-specific
adaptor protein (LNK), may
regulate hematopoietic progenitors
and inflammatory signaling
pathways in endothelium
12q24.2 TBX5–TBX3 CHARGE, ICBP T-box genes involved in regulation
of developmental processes
15q21.1 SLC12A1 Bartter syndrome,
antenatal, type 1
OMIM #601678
Renal electrolyte
balance
Homozygous or compound
heterozygous mutation in the
sodium-potassium-chloride
cotransporter-2 gene
15q24.1 CSK CHARGE, GBPG,
AGEN-BP, ICBP
Vascular function Cytoplasmic tyrosine kinase
involved in angiotensin II-
dependent vascular smooth muscle
cell contraction
16p12.2 SCNN1B,
SCNN1G
Liddle syndrome
OMIM #177200
Renal electrolyte
balance
Autosomal dominant
Constitutive activation of epithelial
sodium
transporter, ENaC. Low plasma
renin, low or normal K+; negligible
urinary aldosterone
16p12.3 UMOD BP-Extremes ?Renal electrolyte
balance
?Renal function
Uromodulin; Tamm–Horsfall
protein. Specifically expressed in
the thick ascending limb of the loop
of Henle where 25% of sodium
reabsorption in the kidney occurs
16q13 SLC12A3 Gitelman syndrome
OMIM #263800
Renal electrolyte
balance
Low BP
Loss-of-function mutation leads to
lower sodium reabsorption
16q22.1 HSD11B2 Apparent mineralocorticoid
excess
OMIM # 218030
Steroid/aldosterone
synthesis
Autosomal recessive
Increased plasma ACTH and
secretory rates of all corticosteroids
17q21.3 ZNF652 GBPG, AGEN-BP, ICBP Zinc-finger protein 652
17q21.3 WNK4 Pseudohypoaldosteronism
type IIB
Gordon’s syndrome
OMIM #614491
Renal electrolyte
balance
Autosomal dominant
Loss-of-function mutations in
WNK4
Low plasma renin, normal or
elevated K+
20q13 GNAS–EDN3 GBPG, ICBP Vascular function GNAS encodes the a subunit of the
G protein mediating b-receptor
signal transduction
EDN3 encodes endothelin 3, the
precursor for the ligand of the
endothelin B receptor
aThe key genes at each locus are shown with their known or potential role in BP regulation. The grey shaded rows indicate genes implicated in monogenic syndromes of
high/low BP. The Pathway column is color-coded according to the pathway involved.
bGWAS studies: AGEN [7], BP-Extremes [10], CHARGE [8], GBPG [9], Gene-centric [6], HYPERGENES [11], and ICBP [5].
Review Trends in Genetics August 2012, Vol. 28, No. 8
pressure [6,25,26]; and proxies for rs1004467 show ge-nome-wide significant associations with coronary arterydisease, schizophrenia, intracranial aneurysm and par-kinsonism [6,8–10,24,27–30]. This illustrates the chal-lenges ahead when attempting to design studies tofunctionally dissect these signals. Figure 1a also showsgenes that are associated with monogenic syndromesfrom Online Mendelian Inheritance in Man (OMIM) thatoccur in the GWAS-related DNA segments shown. Theonly genes known to be associated with monogenic formsof high blood pressure and have been identified by GWASare cytochrome P450, family 17, subfamily A, polypep-tide 1 (CYP17A1) and nitric oxide synthase 3 (NOS3).Even once a SNP has been identified that is associatedwith HTN, it is difficult to identify the gene involved. Forexample, Figure 1b shows the genes within 50 kb on
either side of a blood pressure GWAS SNP, and amongthese only a few genes (NPPA, NOS3 and UMOD) havebeen clearly linked to the GWAS SNP [11,12,31]. Fur-thermore, 50 kb is by no means the limit of the zone ofinfluence of a SNP because the risk-genes implicated bythe GWAS SNPs may lie within the region of linkagedisequilibrium around the SNP, or even more distantlybecause SNPs can influence the regulation of remotegenes. GWAS loci are often rich in copy-number variants,insertion/deletion variants (Figure 1b), microRNA tar-gets and transcription factor binding sites (Figure 1c)that may influence the genotype–phenotype associationand offer another avenue for molecular and functionalexperiments to elucidate the causal pathways or moresimply to identify which risk-gene the GWAS SNPimplicates.
401
(a)
(b)
TRENDS in Genetics
Figure 1. Phenotypic, genetic and regulatory context of GWAS signals for blood pressure and hypertension. (a) Phenotypic landscape of GWAS signals in BP/HTN GWAS.
The strongest SNPs for BP and HTN also show very little overlap with genes involved in monogenic BP syndromes. Only CYP17A1 and NOS3 are associated with
monogenic BP syndromes and occur within 50 kb of BP GWAS SNPs. The strongest BP GWAS SNPs and their proxies are not associated exclusively with BP phenotypes
Review Trends in Genetics August 2012, Vol. 28, No. 8
402
(c)
TRENDS in Genetics
Figure 1. (Continued ).
Review Trends in Genetics August 2012, Vol. 28, No. 8
Finally, the collective effect of all BP loci identifiedthrough GWAS explains only a small fraction (�2%) ofBP heritability. Thus, similarly to other common traits, BPshares the same missing heritability conundrum [32], andefforts are now directed toward identifying additionalcommon variants of small effect and rare variants ofgreater effect. Although GWAS use SNPs selected to pro-vide genome-wide coverage, they provide limited coverageof genes with plausible biological relevance (‘candidategenes’) particularly in relation to lower-frequency geneticvariants (such as those with minor allele frequencies
but show pleiotropy with non-BP traits that can either point to plausible underlying pa
novel common pathways or may be independent associations. The rings from outer t
flanking region); (2) GWAS SNPs; (3) black markers on chromosomal segments – SNP
monogenic syndromes from OMIM present in the chromosomal regions; (5) non-BP
landscape of GWAS signals in BP/HTN GWAS. Only a few genes (NPPA, NOS3, UMOD) h
in gene-rich regions, highlighting the challenges ahead in fine-mapping and identifyin
regulation of distant genes outside the 50 kb regions shown in this figure. Furthermore,
that will need to be considered in the functional dissection of GWAS signals. The rings
within the chromosomal regions. (c) Regulatory landscape of GWAS signals in BP/HT
conserved transcription factor binding sites, and epigenetic loci, that may influence
molecular and functional experiments to elucidate the causal pathways. The SNP positio
in (a) and (b). The rings from outer to inner represent: (1) MicroRNA targets and associa
region); (3) transcription factors binding sites conserved in the human/mouse/rat alignm
Browser showing those transcription factors with score >800; (4) DNase hypersensitiv
height of the line indicates the length of the segment; (a–c) were generated using Circos
(http://genome.ucsc.edu/) [69].
of 1–5%). Large-scale gene-centric analysis of BP using acustomized gene array enriched with common, low-fre-quency variants in �2100 candidate cardiovascular genesreflecting a wide variety of biological pathways in over80 000 individuals identified NPR3, HFE, NOS3, SOX6,LSP1/TNNT3, MTHFR, AGT and ATP2B1, with someoverlap with large GWAS meta-analyses [7]. Among thesingle candidate genes studied, NPPA-NPPB [31] andSCNN1G [33] showed evidence of association with replica-tion, but only the former showed strong concordant signalsin GWAS.
thways (for example UMOD and its association with HTN and kidney function) or
o inner represent: (1) chromosomal segments with GWAS SNPs (including 50 kb
proxy locations for the index SNP in the region (r2>0.8); (4) genes implicated in
phenotypes that showed genome-wide significance within these loci. (b) Genetic
ave been clearly linked to the strongest GWAS SNP, whereas many of the SNPs lie
g the causative gene/variant. It is very likely that GWAS SNPs may influence the
the GWAS loci are also rich in copy-number variants and insertion/deletion variants
from outer to inner represent 1–5 as in (a); (6) shows structural variations present
N GWAS. Bioinformatic analysis of GWAS BP SNP loci show microRNA targets,
the genotype–phenotype association and offer another avenue for the design of
ns are indicated by red bars on the chromosome and are the same SNPs as shown
ted genes; (2) chromosomal segments with GWAS SNPs (including 50 kb flanking
ent in the chromosomal regions using TFBS Conserved (tfbsConsSites) in UCSC
e areas assayed in a large collection of cell types; (5) predicted CpG islands. The
[68] with the Feb 2009(GRCh37/hg19) assembly data from UCSC Genome Browser
403
Review Trends in Genetics August 2012, Vol. 28, No. 8
One striking result of the BP GWAS is that the genesfrom highly plausible pathways are not represented nearthe identified SNPs (Figure 1). Using the GRAIL text-mining algorithm (Gene Relationships Across ImplicatedLoci [34]) to search for connectivity between genes near theassociated SNPs, based on existing literature (publishedbefore 2006 – before the explosion of GWAS publications),Figure 2 shows that of the 41 BP GWAS loci, 14 showedunderlying genes with significant relatedness, as definedby the degree of similarity in the text describing themwithin article abstracts, implying these connected genesare involved in a common cellular process or pathway.These regions of GRAIL connectivity show the expectedconnection between NPPA/B and NPR3 but, in cases whenthe GWAS SNPs lie in gene-rich regions, also reveal con-nections that point to specific novel genes for follow-upstudies. This is highlighted by rs805303, present in a verygene-rich locus, and where a connection between NOTCH4
Figure 2. Representation of the connections between 41 BP GWAS SNPs and their co
Relationships Across Implicated Loci [34]). This searches for connectivity between gen
literature before 2006 – before the surge of GWAS publications). The thickness of the red
This type of analysis supports known interactions but also suggests new connections
404
and Jagged 1 (JAG1 – in a different locus – rs1327235)highlights the NOTCH signaling pathway that has beenshown to be important in developing cardiovascular sys-tem and congenital human cardiovascular diseases. Theirconnection with BP regulation is not intuitive, but thisassociation should prioritize this pathway and these genesfor functional dissection.
Despite the increasing pace of discovery of variantsassociated with BP and HTN, the limited predictive utilityof these variants either singly or as part of a composite riskscore is striking. The population distribution of the numberof BP-increasing alleles with nearly similar allele frequen-cies is normally distributed because each SNP is inheritedindependently, and hence the number of individuals in thepopulation that are expected to carry all harmful riskalleles would be vanishingly small. As an example, usingthe BP extreme case–control cohort [11], the probabilitydensity functions of the number of BP-increasing alleles
TRENDS in Genetics
rresponding genes using the GRAIL literature-based text-mining algorithm (Gene
es near the associated SNPs, based on existing literature (we selected published
lines indicates the strength of the literature-based connectivity between the genes.
that are worth following up in future studies.
30 35 40 45 50
0.00
0.02
0.04
0.06
0.08
0.10
0.12
Fre
quen
cy
Number of BP-increasing alleles
Hypertensive populationAge <63 yearsBP>160/100
‘Hypercontrols’Age>50 yearsBP<120/80No prevalent CVD or incident CVD during 10 year follow-up
TRENDS in Genetics
Figure 3. For the prediction of complex diseases, genotypes at multiple SNPs are
often combined into scores (for example, scores are calculated according to the
number of risk alleles carried). The frequency distribution of the number of BP-
increasing alleles carried in the general population would be normally distributed
because each allele is inherited independently. The frequency distributions of 35
BP-increasing alleles from GWAS SNPs in populations selected from the extremes
of BP distribution (top 9% and the bottom 2%) [11] show a large overlap of scores,
and the majority of the individuals from both phenotypic extremes lie in the middle
of the distribution. This illustrates the fallacy of using risk scores from GWAS SNPs
to identify individuals at high risk for hypertension. Abbreviation: CVD:
cardiovascular diseases.
Review Trends in Genetics August 2012, Vol. 28, No. 8
(from 35 genome-wide significant GWAS SNPs) in hyper-controls and the extreme hypertensive cases are shown inFigure 3, illustrating the significant overlap of cases andcontrols by genetic risk score despite extremeness of thephenotypic ascertainment. Using genetic risk scores con-structed from up to 13 GWAS BP SNPs, a novel longitudi-nal study showed that individuals with the highestcombination risk score had significantly higher diastolicBP at the age of nine years, and the effect was persistentfrom childhood through adult age [35]. Genetic risk scores,including many non-genome-wide significant SNPs,explained more of the variance than scores based onlyon very significant SNPs in adults and children(1.2–1.7% in adults and 0.8–1.4% in children) [36].
Novel pathways uncovered by GWASHighly correlated SNPs (r2>0.9) in the 50 end of UMODhave been independently identified in large GWAS of bloodpressure extremes and kidney function [11,16]. The UMODgene [expressed primarily in the thick ascending limb(TAL) of the loop of Henle] encodes the Tamm–Horsfallprotein [THP/uromodulin (UMOD)], an extracellular pro-tein anchored by a glycosyl phosphatidylinositol (GPI)functional group at the luminal face of tubular epitheliaand released into the urine by proteolytic cleavage. It is themost abundant tubule protein in the urine. In the HTNstudy, the minor G allele of rs13333226 at the 50 end ofUMOD gene is associated with a lower risk of HTN [OR(95% CI): 0.87 (0.84;0.91); P = 3.6�10–11], 0.49 mmHglower SBP (P = 2.6�10–5) and 0.30 mmHg lower DBP(P = 1.5�10–5), increased estimated glomerular filtrationrate (eGFR) (3.6 ml/min/minor-allele, P = 0.012), reduced
urinary UMOD excretion and lower fractional excretion ofendogenous lithium. In addition, the genotype associationbetween rs13333226 and urinary UMOD excretion wasmore pronounced with low salt intake and blunted withhigh salt intake, indicating a possible gene–environmentinteraction [11]. Adjustment for eGFR in the HTN GWASdid not alter the association between rs13333226 andHTN. Mutations in UMOD cause medullary cystic kidneydisease type 2 (MCKD2), familial juvenile hyperuricemicneuropathy (FJHN) and glomerulocystic kidney disease(GCKD), but these only lead to HTN during latter stages ofrenal failure. A single mechanism that could explain all theobservations involving the minor G allele of rs13333226 isa decreased sensitivity of the macula densa to luminal Cl–.The decreased sensitivity of the macula densa may bemediated either through the increased UMOD excretionassociated with the G allele or through other mechanisms.Under this model, decreased macula densa sensitivityactivates tubuloglomerular feedback and increases GFR,explaining the increased proximal tubular Na+ reabsorp-tion. The lifetime effect of the elevated GFR would explainthe reduced BP and potentially the age-related effect of thevariant. There may be other possible mechanisms, forexample ROMK function is activated by UMOD, and thusreduced ROMK activity might explain renal salt wasting inTHP knockout mice and patients with Bartter syndrome[37].
Rare variantsGenes involved in monogenic hypertension are summa-rized in Table 1. Pseudohypoaldosteronism type II (Gor-don’s syndrome; familial hyperkalemia; OMIM #145260),an autosomal dominant form of HTN associated withhyperkalemia, non-anion gap metabolic acidosis and in-creased salt reabsorption by the kidney, is caused by eithergain-of-function mutations in WNK1 or loss-of-functionmutations in WNK4. Recently, exome sequencing has beenused to identify mutations in kelch-like 3 (KLHL3) orcullin3 (CUL3) in pseudohypoaldosteronism II (PHAII)patients from 41 unrelated families [38].
Conversely, mutations that reduce salt retention, suchas those associated with Bartter (SLC12A1, KCNJ1,CLCNKB, BSND, CaSR, ClCK-A) and Gitelman(SLC12A3) syndromes, tend to lower BP and protectagainst the development of HTN [39,40]. Resequencingthree candidate genes (SLC12A3, SLC12A1, KCNJ1) in-volved in Bartter or Gitelman syndromes in the Framing-ham Heart Study population identified 30 distinctpotentially deleterious rare mutations present in 49 sub-jects. In the heterozygous state, these variants were asso-ciated with 5.7 mmHg lower BP at age 40, and 9.0 mmHglower BP at age 60, and in aggregate reduce the risk ofHTN by 60% at age 60 [39]. This is the first indication thatrare variants can produce clinically significant BP reduc-tion in the general population and supports the rare-variant–common-disease hypothesis [41]. There arecurrently exome sequencing projects involving individualsat BP extremes to replicate this finding and discover morerare variants with a large effect on blood pressure. Thehypothesis for these studies is that there will be an abun-dance of low-frequency variants of large effect that will
405
Review Trends in Genetics August 2012, Vol. 28, No. 8
explain most of the missing heritability of BP. Although theclinical applications of these findings will be limited giventhe very low frequency of these variants in the population,these studies should uncover novel pathways and provide adeeper understanding of the genetic architecture of bloodpressure.
EpigeneticsNot all features of gene regulation are encoded in genes orcontained in the DNA sequence. MicroRNAs (miRs), his-tone modifications and DNA methylation have all beeninvestigated with regard to their role in BP gene regula-tion. The potential role of miRs in vascular smooth-musclebiology and blood pressure is just beginning to be appre-ciated. Mice lacking miR-143 and miR-145 develop signifi-cant reductions in BP resulting from modulation of actindynamics [42]. Intrarenal expression of miR-200a, miR-200b, miR-141, miR-429, miR-205 and miR-192 were foundto be increased in hypertensive nephrosclerosis, and thedegree of upregulation correlated with disease severity.There are significant correlations between miR species andproteinuria and GFR, suggesting a dose–response type ofrelationship between intrarenal miR expression and theseverity of hypertensive nephrosclerosis [43]. Renin geneexpression appears to be regulated by miR-181a and miR-663 [44]. The identification of these miRs may lead to theelucidation of pathways involved in HTN causation andnovel therapeutics. Recently, an observational studyshowed that human cytomegalovirus (HCMV) seropositiv-ity and titers are positively associated with essential hy-pertension independently of other HTN risk factors [45].The HCMV-encoded miR hcmv-miR-UL112 was highlyexpressed in hypertensive patients, pointing to a poten-tially novel pathway involved in HTN. There is supportfrom an animal study showing that infection of mice withmouse cytomegalovirus can alone elevate blood pressure[46]. Although this is an observational finding, it highlightsthe prospect of an abundance of pathways and risk factorsthat lead to the final common BP phenotype and may haveimplications for the discovery of new treatments.
Recently, renal sympathetic denervation has shownconsiderable promise in treating refractory HTN [47].The sympathetic innervation of the kidney is implicatedin the pathogenesis of HTN by increasing plasma reninactivity that leads to sodium and water retention andreduces renal blood flow (RBF). The procedure involvesradiofrequency ablation of the renal sympathetic nerves,and has shown remarkable reductions in BP, but theunderlying mechanism is unclear. Recently, histone modi-fication has been shown to play an important role in theepigenetic modulation of WNK4 transcription in the devel-opment of salt-sensitive HTN. Isoproterenol-induced tran-scriptional suppression of WNK4 was shown to bemediated via inhibition of histone deacetylase-8 activity(HDAC8) at the WNK4 promoter [48], which in turn canstimulate thiazide-sensitive Na+-Cl+ cotransporter (NCC/SLC12A3) implying that sympathetic nerve activity canincrease BP partly by activating NCC. The evidence thatisoproterenol induces transcriptional suppression ofWNK4 and leads to activation of NCC offers an opportunityto combine genomics, epigenomics and NCC detection in
406
urinary exosomes to test the salt–sympathetic-system–BPaxis [48,49].
There are indications of epigenetic regulation involvinginteractions between a disruptor of telomeric-silencingalternative splice variant a (Dot1a) and the ALL-1 fusedgene from chromosome 9 (Af9) to produce a nuclear repres-sor complex that targets histone H3 Lys-79 methylation inthe promoter region of the sodium channel SCNN1A andsuppresses its transcriptional activity [50]. Aldosteronecan disrupt this nuclear complex, which results in histoneH3 Lys-79 hypomethylation at specific subregions andderepression of the SCNN1A promoter. This adds a novelepigenetic dimension to the complex transcriptional andpost-transcriptional regulation of the epithelial sodiumchannel by aldosterone.
Concluding remarksUnraveling the genetic basis of BP regulation and HTN hasbeen more difficult than might be suggested by their highheritability, but the progress in cataloging common var-iants using GWAS is comparable to other common traits.Ongoing studies include the GWAS meta-analysis of BPextremes and exome sequencing of BP extremes to identifymore sequence variants that are associated with BP. In-deed, it has been estimated that further increasing theGWAS sample size will identify 116 common variants forBP that have similar effect sizes to those found already, butthese will collectively explain only about 2.2% of the phe-notypic variance [6].
Some important issues to be addressed in future studiesinvestigating BP as a quantitative trait are to model BPmore accurately in subjects on antihypertensive treat-ments by taking into account the number of drugs, drugdosage and compliance metrics, or to make use of longitu-dinal BP data – for example long-term average and BPvariability (both visit-to-visit or 24 h intra-individual var-iability). Novel strategies and orthogonal study designs areneeded to discover causal and clinically useful geneticmarkers efficiently. This would require a move from pureBP quantitative traits in larger and larger cohorts todetailed studies of subjects selected on informative inter-mediate traits derived from the extensive interventionalstudies for high BP. SNPs near the genes encoding uro-modulin and natriuretic peptides show allelic associationwith urinary uromodulin and plasma natriuretic peptidesrespectively [11,31] and offer the opportunity for salt-intervention trials to dissect the underlying mechanismsfurther. Randomized clinical trials with stored DNA sam-ples offer a readily available resource to study not onlydrug response but also to dissect pathways of HTN basedon interindividual differences in response to drugs thattarget specific pathways.
The limited predictive utility of common variants thathave emerged from most GWAS would suggest that tobuild better predictive models it will be necessary to iden-tify orthogonal (i.e., uncorrelated) genetic variants that areassociated with new pathways as suggested for biomarkers[51]. The current despondency over poor prediction isprobably related to the early discovery of low-hanging fruitthat are perhaps more correlated with known pathways.The next level of discovery will be more challenging
Review Trends in Genetics August 2012, Vol. 28, No. 8
because the molecular and functional dissection of thenovel variants will require more detailed low-throughputscience in contrast to the high-throughput screeningmethods applied so far.
References1 Evans, J.G. and Rose, G. (1971) Hypertension. Br. Med. Bull. 27, 37–422 Kearney, P.M. et al. (2005) Global burden of hypertension: analysis of
worldwide data. Lancet 365, 217–2233 Hottenga, J.J. et al. (2005) Heritability and stability of resting blood
pressure. Twin Res. Hum. Genet. 8, 499–5084 Kupper, N. et al. (2005) Heritability of daytime ambulatory blood
pressure in an extended twin design. Hypertension 45, 80–855 Padmanabhan, S. et al. (2008) Hypertension and genome-wide
association studies: combining high fidelity phenotyping andhypercontrols. J. Hypertens. 26, 1275–1281
6 Ehret, G.B. et al. (2011) Genetic variants in novel pathways influenceblood pressure and cardiovascular disease risk. Nature 478, 103–109
7 Johnson, T. et al. (2011) Blood pressure loci identified with a gene-centric array. Am. J. Hum. Genet. 89, 688–700
8 Kato, N. et al. (2011) Meta-analysis of genome-wide association studiesidentifies common variants associated with blood pressure variation inEast Asians. Nat. Genet. 43, 531–538
9 6Levy, D. et al. (2009) Genome-wide association study of blood pressureand hypertension. Nat. Genet. 41, 677–687
10 Newton-Cheh, C. et al. (2009) Genome-wide association studyidentifies eight loci associated with blood pressure. Nat. Genet. 41,666–676
11 Padmanabhan, S. et al. (2010) Genome-wide association study of bloodpressure extremes identifies variant near UMOD associated withhypertension. PLoS Genet. 6, e1001177
12 Salvi, E. et al. (2012) Genomewide association study using a high-density single nucleotide polymorphism array and case–control designidentifies a novel essential hypertension susceptibility locus in thepromoter region of endothelial NO synthase. Hypertension 59, 248–255
13 Dominiczak, A.F. and Munroe, P.B. (2010) Genome-wide associationstudies will unlock the genetic basis of hypertension: pro side of theargument. Hypertension 56, 1017–1020
14 Kurtz, T.W. (2010) Genome-wide association studies will unlock thegenetic basis of hypertension: con side of the argument. Hypertension56, 1021–1025
15 Gudbjartsson, D.F. et al. (2010) Association of variants at UMOD withchronic kidney disease and kidney stones-role of age and comorbiddiseases. PLoS Genet. 6, e1001039
16 Kottgen, A. et al. (2009) Multiple loci associated with indices of renalfunction and chronic kidney disease. Nat. Genet. 41, 712–717
17 Gudbjartsson, D.F. et al. (2009) Sequence variants affecting eosinophilnumbers associate with asthma and myocardial infarction. Nat. Genet.41, 342–347
18 Schunkert, H. et al. (2011) Large-scale association analysis identifies13 new susceptibility loci for coronary artery disease. Nat. Genet. 43,333–338
19 Stahl, E.A. et al. (2010) Genome-wide association study meta-analysisidentifies seven new rheumatoid arthritis risk loci. Nat. Genet. 42, 508–514
20 Dubois, P.C. et al. (2010) Multiple common variants for celiac diseaseinfluencing immune gene expression. Nat. Genet. 42, 295–302
21 Ganesh, S.K. et al. (2009) Multiple loci influence erythrocytephenotypes in the CHARGE Consortium. Nat. Genet. 41, 1191–1198
22 Ikram, M.K. et al. (2010) Four novel Loci (19q13, 6q24, 12q24, and5q14) influence the microcirculation in vivo. PLoS Genet. 6, e1001184
23 Teslovich, T.M. et al. (2010) Biological, clinical and populationrelevance of 95 loci for blood lipids. Nature 466, 707–713
24 Wain, L.V. et al. (2011) Genome-wide association study identifies sixnew loci influencing pulse pressure and mean arterial pressure. Nat.Genet. 43, 1005–1011
25 Pichler, I. et al. (2011) Identification of a common variant in the TFR2gene implicated in the physiological regulation of serum iron levels.Hum. Mol. Genet. 20, 1232–1240
26 Chambers, J.C. et al. (2009) Genome-wide association study identifiesvariants in TMPRSS6 associated with hemoglobin levels. Nat. Genet.41, 1170–1172
27 Coronary Artery Disease (C4D) Genetics Consortium (2011) A genome-wide association study in Europeans and South Asians identifies fivenew loci for coronary artery disease. Nat. Genet. 43, 339–344
28 Ripke, S. et al. (2011) Genome-wide association study identifies fivenew schizophrenia loci. Nat. Genet. 43, 969–976
29 Simon-Sanchez, J. et al. (2009) Genome-wide association studyreveals genetic risk underlying Parkinson’s disease. Nat. Genet. 41,1308–1312
30 Yasuno, K. et al. (2010) Genome-wide association study of intracranialaneurysm identifies three new risk loci. Nat. Genet. 42, 420–425
31 Newton-Cheh, C. et al. (2009) Association of common variants in NPPAand NPPB with circulating natriuretic peptides and blood pressure.Nat. Genet. 41, 348–353
32 Manolio, T.A. et al. (2009) Finding the missing heritability of complexdiseases. Nature 461, 747–753
33 Busst, C.J. et al. (2011) The epithelial sodium channel gamma-subunitgene and blood pressure: family based association, renal geneexpression, and physiological analyses. Hypertension 58, 1073–1078
34 Raychaudhuri, S. et al. (2009) Identifying relationships among genomicdisease regions: predicting genes at pathogenic SNP associations andrare deletions. PLoS Genet. 5, e1000534
35 Oikonen, M. et al. (2011) Genetic variants and blood pressure in apopulation-based cohort: the Cardiovascular Risk in Young Finnsstudy. Hypertension 58, 1079–1085
36 Taal, H.R. et al. (2012) Genome-wide profiling of blood pressure inadults and children. Hypertension 59, 241–247
37 Renigunta, A. et al. (2011) Tamm–Horsfall glycoprotein interacts withrenal outer medullary potassium channel ROMK2 and regulates itsfunction. J. Biol. Chem. 286, 2224–2235
38 Boyden, L.M. et al. (2012) Mutations in kelch-like 3 and cullin 3 causehypertension and electrolyte abnormalities. Nature 482, 98–102
39 Ji, W. et al. (2008) Rare independent mutations in renal salt handlinggenes contribute to blood pressure variation. Nat. Genet. 40, 592–599
40 Lifton, R.P. et al. (2001) Molecular mechanisms of humanhypertension. Cell 104, 545–556
41 Eyre-Walker, A. (2010) Evolution in health and medicine Sacklercolloquium: genetic architecture of a complex trait and itsimplications for fitness and genome-wide association studies. Proc.Natl. Acad. Sci. U.S.A 107 (Suppl. 1), 1752–1756
42 Xin, M. et al. (2009) MicroRNAs miR-143 and miR-145 modulatecytoskeletal dynamics and responsiveness of smooth muscle cells toinjury. Genes Dev. 23, 2166–2178
43 Wang, G. et al. (2010) Intrarenal expression of miRNAs in patients withhypertensive nephrosclerosis. Am. J. Hypertens. 23, 78–84
44 Marques, F.Z. et al. (2011) Gene expression profiling reveals reninmRNA overexpression in human hypertensive kidneys and a role formicroRNAs. Hypertension 58, 1093–1098
45 Li, S. et al. (2011) Signature microRNA expression profile of essentialhypertension and its novel link to human cytomegalovirus infection.Circulation 124, 175–184
46 Cheng, J. et al. (2009) Cytomegalovirus infection causes an increase ofarterial blood pressure. PLoS Pathog. 5, e1000427
47 Esler, M.D. et al. (2010) Renal sympathetic denervation in patientswith treatment-resistant hypertension (The Symplicity HTN-2 Trial):a randomised controlled trial. Lancet 376, 1903–1909
48 Mu, S. et al. (2011) Epigenetic modulation of the renal beta-adrenergic–WNK4 pathway in salt-sensitive hypertension. Nat. Med. 17, 573–580
49 Ellison, D.H. and Brooks, V.L. (2011) Renal nerves, WNK4,glucocorticoids, and salt transport. Cell Metab. 13, 619–620
50 Zhang, D. et al. (2009) Epigenetics and the control of epithelial sodiumchannel expression in collecting duct. Kidney Int. 75, 260–267
51 Gerszten, R.E. and Wang, T.J. (2008) The search for newcardiovascular biomarkers. Nature 451, 949–952
52 Leitschuh, M. et al. (1991) High-normal blood pressure progression tohypertension in the Framingham Heart Study. Hypertension 17, 22–27
53 Franklin, S.S. et al. (1997) Hemodynamic patterns of age-relatedchanges in blood pressure. The Framingham Heart Study.Circulation 96, 308–315
54 Burt, V.L. et al. (1995) Prevalence of hypertension in the US adultpopulation. Results from the Third National Health and NutritionExamination Survey, 1988–1991. Hypertension 25, 305–313
55 Kaminer, B. and Lutz, W.P. (1960) Blood pressure in Bushmen of theKalahari Desert. Circulation 22, 289–295
407
Review Trends in Genetics August 2012, Vol. 28, No. 8
56 Truswell, A.S. et al. (1972) Blood pressures of Kung bushmen inNorthern Botswana. Am. Heart J. 84, 5–12
57 Poulter, N.R. et al. (1990) The Kenyan Luo migration study:observations on the initiation of a rise in blood pressure. BMJ 300,967–972
58 Crews, D.E. and Mancilha-Carvalho, J.J. (1993) Correlates of bloodpressure in Yanomami Indians of northwestern Brazil. Ethn. Dis. 3,362–371
59 Carvalho, J.J. et al. (1989) Blood pressure in four remote populations inthe INTERSALT Study. Hypertension 14, 238–246
60 Laville, M. et al. (1994) Epidemiological profile of hypertensive diseaseand renal risk factors in black Africa. J. Hypertens. 12, 839–843
61 Neel, J.V. (1962) Diabetes mellitus: a ‘thrifty’ genotype rendereddetrimental by ‘progress’? Am. J. Hum. Genet. 14, 353–362
62 Nakajima, T. et al. (2004) Natural selection and population history inthe human angiotensinogen gene (AGT): 736 complete AGT sequences
408
in chromosomes from around the world. Am. J. Hum. Genet. 74, 898–916
63 Weder, A.B. (2007) Evolution and hypertension. Hypertension 49, 260–26564 Young, J.H. et al. (2005) Differential susceptibility to hypertension is
due to selection during the out-of-Africa expansion. PLoS Genet. 1, e8265 Pickering, G.W. (1955) The genetic factor in essential hypertension.
Ann. Intern. Med. 43, 457–46466 Oldham, P.D. et al. (1960) The nature of essential hypertension. Lancet
1, 1085–109367 Adeyemo, A. et al. (2009) A genome-wide association study of
hypertension and blood pressure in African Americans. PLoS Genet.5, e1000564
68 Krzywinski, M. et al. (2009) Circos: an information aesthetic forcomparative genomics. Genome Res. 19, 1639–1645
69 Kent, W.J. et al. (2002) The human genome browser at UCSC. GenomeRes. 12, 996–1006
Editor Rhiannon Macrae
Executive EditorFeng Chen
Journal ManagerBasil Nyaku
Journal AdministratorsRia Otten and Patrick Scheffmann
Advisory Editorial BoardK.V. Anderson, New York, USAA. Clark, Ithaca, USAG. Fink, Cambridge, USAW.J. Gehring, Basel, SwitzerlandD. Goldstein, Durham, USAL. Guarente, Cambridge, USAY. Hayashizaki, Yokohama, Japan S. Henikoff, Seattle, USAJ. Hodgkin, Oxford, UKH.R. Horvitz, Cambridge, USAL. Hurst, Bath, UKM. Justice, Houston, USAE. Koonin, Bethesda, USAE. Meyerowitz, Pasadena, USAS. Moreno, Salamanca, SpainC. Scazzocchio, Orsay, FranceJ. Smith, Cambridge, UKM. Takeichi, Kobe, JapanD. Tautz, Plön, GermanyO. Voinnet, Strasburg, France
Editorial EnquiriesTrends in GeneticsCell Press600 Technology Square, 5th floorCambridge MA 02139, USATel: +1 617 397 2818Fax: +1 617 397 2810E-mail: [email protected]
Cover: During conjugation, members of the ciliate genus Oxytricha inherit a genome that looks like typical eukaryotic chromatin but is replete with fragmented and scrambled genes. The subsequent developmental process produces a rearranged somatic genome containing on the order of twenty million of the shortest known telomere-bearing chromosomes. On pages 382–388, Aaron Goldman and Laura Landweber describe recent progress toward understanding Oxytricha’s genomic dimorphism and discuss its various implications for our understanding of ancient genome evolution and early life. The cover shows an SEM image of Oxytricha, false-colored with Photoshop, courtesy of Bob Hammersmith.
August 2012 Volume 28, Number 8 pp. 361–418
Reviews
Letter
364 Human limb abnormalities caused by disruption of hedgehog signaling
374 Replication timing and its emergence from stochastic processes
382 Oxytricha as a modern analog of ancient genome evolution
389 Regulation of chromatin structure by long noncoding RNAs: focus on natural antisense transcripts
397 Genetic basis of blood pressure and hypertension
409 Mechanisms of transcriptional precision in animal development
Eve Anderson, Silvia Peluso, Laura A. Lettice and Robert E. Hill
John Bechhoefer and Nicholas Rhind
Aaron David Goldman and Laura F. Landweber
Marco Magistri, Mohammad Ali Faghihi, Georges St Laurent III and Claes Wahlestedt
Sandosh Padmanabhan, Christopher Newton-Cheh and Anna F. Dominiczak
Mounia Lagha, Jacques P. Bothma and Michael Levine
361 Is ‘forward’ the same as ‘plus’?…and other adventures in SNP allele nomenclature
Sarah C. Nelson, Kimberly F. Doheny, Cathy C. Laurie and Daniel B. Mirel
417 Corrigendum: Human evolutionary genomics: ethical and interpretive issues. [Trends in Genetics 28 (2012)137–145]
Joseph J. Vitti, Mildred K. Cho, Sarah A. Tishkoff and Pardis C. Sabeti
Erratum
Mechanisms of transcriptionalprecision in animal developmentMounia Lagha1, Jacques P. Bothma2 and Michael Levine1
1 Center for Integrative Genomics, Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA2 Biophysics Graduate Group, Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA
Review
Glossary
Canalization: a measure of the ability of a population to produce the same
phenotype regardless of fluctuations in its environment, genotype or other
sources of variability. Our use of the term ‘robustness’ conveys the same
essential meaning.
Enhancer: the predominant regulatory DNA for controlling gene expression. It
has the defining property of driving reporter expression in transgenic assays
from a heterologous promoter.
Gene regulatory network: interacting genes and their associated regulatory
DNAs that are responsible for a specific developmental process such as the
specification of gut or muscle.
Paused polymerase: RNA Pol II that has initiated transcription, but arrests after
producing a small nascent RNA of 30–50 nt. The Pol II is ‘ready to go’ but needs
additional regulators to undergo elongation.
Pioneer factor: a specialized TF (sequence-specific) that binds to nucleosomal
DNA and prepares enhancers for rapid and timely deployment.
Poising: preparing genes for rapid and timely transcription. This can be
achieved by priming the promoter, the enhancer, or both.
Redundancy: two genes are considered to be redundant if they play similar
functions and are able to replace one another. This can be extended to a
genetic interaction or enhancers or binding sites within enhancers. However,
we do not believe in true redundancy. Instead, genes or regulatory DNAs might
appear to possess redundant, or overlapping, activities in the laboratory, but
We review recently identified mechanisms of transcrip-tional control that ensure reliable and reproducible pat-terns of gene expression in natural populations ofdeveloping embryos, despite inherent fluctuations ingene regulatory processes, variations in genetic back-grounds and exposure to diverse environmental condi-tions. These mechanisms are not responsible forswitching genes on and off. Instead, they control thefine-tuning of gene expression and ensure regulatoryprecision. Several such mechanisms are discussed, in-cluding redundant binding sites within transcriptionalenhancers, shadow enhancers, and ‘poised’ enhancersand promoters, as well as the role of ‘redundant’ geneinteractions within regulatory networks. We proposethat such regulatory mechanisms provide populationfitness and ‘fine-tune’ the spatial and temporal controlof gene expression.
Transcriptional precisionThe basic mechanisms for switching genes on and offduring development were intensively studied in the1980s and 1990s. The enhancer was shown to play a keyrole in integrating complex regulatory information to gen-erate cell-specific patterns of gene expression [1]. However,in natural populations enhancer–promoter interactionscan be affected by changes in temperature and variationsin genetic background, but the developmental programremains unperturbed. What is the basis for this stabilityin developmental programming?
Our central premise is that the mechanisms used toprovide stability in gene expression in natural popula-tions also produce greater precision in developmentalpatterning mechanisms. By transcriptional precision werefer to the formation of sharp borders of gene expres-sion, the exact timing of gene activation, coordinateexpression of groups of genes within a developing tissue,and homogenous expression of a given gene across a fieldof coordinately developing cells. The advent of whole-genome technologies and improved imaging methods hasprovided recent insights into more subtle aspects ofdifferential gene activity, namely the reproducibledeployment of developmental programs in naturalpopulations.
Corresponding author: Levine, M. ([email protected]).Keywords: enhancer; paused polymerase; pioneer factors; robustness; gene regulatorynetworks
0168-9525/$ – see front matter . Published by Elsevier Ltd. doi:10.1016/j.tig.2012.03.006 Trends
Redundant genetic interactionsGenetic analysis of Drosophila embryogenesis led to aconceptual breakthrough in our understanding of animaldevelopment [2]. The subdivision of the embryo into aseries of body segments was first envisaged to be a regula-tory cascade or genetic pathway, with maternal determi-nants, such as Bicoid, that establish sequential patterns ofgap gene expression, pair-rule stripes and ultimately,segment-polarity stripes of gene expression (e.g. [3]). Thisview of a sequential pathway gave way to one of genenetworks, whereby both maternal and zygotic activatorsand repressors interact with complex enhancers to producelocalized stripes of gene expression [4]. In recent years,gene networks have been visualized as complex wiring-diagrams [5].
Such networks often contain seemingly redundantinteractions. Moreover, two related transcription factorsare sometimes seen to activate the expression of down-stream target genes in the same cells at the same time.Removal of one copy of the regulatory gene often fails toproduce an obvious or fully penetrant phenotype. None-theless, the gene might augment population fitness, whichis what natural selection ultimately acts on.
not in natural populations subject to stress.
Shadow enhancer: an enhancer that is sometimes located far from the gene it
regulates. The term ‘shadow’ is a metaphor which reflects that, historically,
these distal enhancers tended to be discovered after the proximal/primary
enhancer and in unexpected locations such as in the introns of neighboring
genes.
in Genetics, August 2012, Vol. 28, No. 8 409
skn-1
med-1/2
end-3 end-1
elt-2
Intestinal differentiation
skn-1
med-1/2
end-3 end-1
elt-2
Variable intestinal differentiation
(Variable)
(Variable)
Wildtype End-3 mutant
TRENDS in Genetics
(a) (b)
Figure 1. Redundant interactions in gene regulatory networks. Summary of the genetic cascade governing intestinal cell specification in C. elegans (see ref. [6]). (a) Wild-
type network. skn-1 is maternally deposited and, in concert with other maternal and zygotic factors, activates the expression of transcription factors end-3 and end-1, both of
which activate elt-2, the key regulator of intestine differentiation. (b) In end-3 mutants, end-1 can compensate and intestine differentiation is essentially normal. However,
end-1 expression becomes significantly more variable, resulting in erratic expression of elt-2 and abnormal intestine differentiation in some individuals.
Review Trends in Genetics August 2012, Vol. 28, No. 8
Redundant interactions in gene regulatory networkshave been suggested to provide stability and precision inmetazoan development [5]. An illustrative example is seenfor gut specification in Caennorhabditis elegans [6]. Theintestine is composed of 20 cells that arise from a singleprogenitor in the early embryo. Intestinal identity is speci-fied by a simple regulatory network, beginning with thematernal deposition of skn-1 transcripts and culminatingin the expression of elt-2, which activates hundreds of‘target’ genes required for gut differentiation (Figure 1a).
The activation of elt-2 depends on two related transcrip-tion factors, end-1 and end-3, which function in a largelyredundant fashion. The consequences of disrupting eitherend-1 or end-2 gene activity have been examined, andevidence was obtained for increased ‘noise’ in gut specifi-cation from measurements of single mRNAs in individualembryos [6]. In particular, end-3 mutants show variabilityin both the timing and levels of elt-2 expression (Figure 1b),which might explain why 5% of end-3 mutants lack intes-tinal cells. Similarly, the overlapping activities of twoT-box transcription factors, tbx-8 and tbx-9, appear tobuffer stochastic variations in muscle differentiation [7].
These results suggest that redundant gene interactionswithin developmental networks can stabilize gene expres-sion in natural populations. Such redundancy might alsoplay a key role in ensuring transcriptional precision. Thatis, the combination of end-1 plus end-3 might ensure theprecise timing and exact levels of elt-2 expression. Thereare numerous examples of potential redundancies in genenetworks (e.g. [5]). Are these required for developmentalpatterning, or do they represent a means to stabilizecomplex processes despite genetic and environmental var-iations? These are not mutually exclusive concepts.
Intra-enhancer redundancyA typical developmental enhancer is several hundred bp inlength and contains multiple binding sites for two or more
410
sequence-specific transcription factors (reviewed in [1]).Some of the binding sites appear to be redundant, in thatmutations in a subset of the sites do not qualitatively alterthe expression patterns produced by the modified enhan-cers (e.g. [8]). What is the purpose of these ‘extra’ sites,which are often highly conserved? Evidence is gatheringthat in some cases they ensure robustness or stability inresponse to genetic and environmental variation. A recentanalysis of the eve stripe 2 enhancer provides a particularlycompelling example [9] (Figure 2).
The full-length eve stripe 2 enhancer is over 700 bp inlength and contains several binding sites for each of fourkey regulators: Bicoid, Hunchback, Kru ppel, and Giant[10,11]. It produces a robust and authentic stripe 2 patternwhen attached to a reporter gene and expressed in trans-genic Drosophila embryos. Removal of �200 bp from the 30
end of the enhancer, which contains several TF bindingsites, diminishes the levels of expression, but the resulting�500 bp minimal enhancer produces an essentially normalpattern of expression [11]. BAC transgenesis and geneticcomplementation assays have been used to examine thecontributions of the minimal enhancer and 30 ‘extension’[9].
Removal of the �500 bp minimal enhancer from a ‘res-cuing’ BAC transgene results in lethality due to a severelydiminished stripe 2 pattern. Mutant eve–/eve– embryoscarrying this BAC fail to hatch due to defects in the firstthoracic segment. Interestingly, removal of the �200 bp 30
extension does not cause lethality under optimal cultureconditions, and the viability of these flies is comparable tothat of wild-type flies. These results suggest that theminimal �500 bp eve stripe 2 enhancer is sufficient forsegmentation, at least in the absence of environmentalstress. However, there is a breakdown in the function of theminimal enhancer at elevated temperatures and in ‘sensi-tized’ genetic backgrounds. Thus, binding sites in the30 extension are not redundant under all conditions, but
eveMinimal enhancer
eve480bp
Redundant binding sites
Normal at optimal conditions
Under stress : reduced viability
Non viable
Normaleve BAC
eve BAC,no minimal stripe 2
eve BAC,no extension
eve stripe 2 enhancer
Extension
eve211bp
554bp
(a)
(b)
(c)
TRENDS in Genetics
Figure 2. Importance of redundant binding sites for robustness. (a) Diagram of a BAC transgene containing the entire eve locus, including 50 and 30 stripe enhancers. Only
the stripe 2 regulatory region is shown. The ‘full-length’ enhancer contains both the minimal �500 bp enhancer (green) and �200 bp 30 extension (blue). The yellow ovals
represent a subset of the TF binding sites in the stripe 2 regulatory DNA. (b) Removal of the minimal eve stripe 2 enhancer results in lethality, and embryos die with defects
in the thorax (derived from the region of stripe 2 expression). (c) Removal of the 30 extension does not impair embryogenesis under optimal culturing conditions, and
normal adult flies are obtained. However, under genetic stress, only 5% of the flies survive. Thus, ‘redundant’ binding sites in the 30 extension are required for robustness.
Review Trends in Genetics August 2012, Vol. 28, No. 8
instead they appear to ensure reliable expression of evestripe 2 under stress (Figure 2). This is likely to be ageneral mechanism of robustness or ‘canalization’ in de-velopment ([9,12,13] for a definition of canalization). So-called redundant binding sites in developmental enhan-cers are probably used in natural populations to cope withvariability.
Shadow enhancersA related mechanism for ensuring robustness is the use ofmultiple enhancers for a single pattern of gene expression. Avariety of recently developed whole-genome assays (Box 1)permit the systematic identification of developmental
Box 1. Whole-genome identification of enhancers
During the past 10 years a variety of ‘post-genome’ methods have
been devised for the systematic identification of enhancers
(which can exist both 50 or 30 of the gene or within the transcription
unit). Transgenic assays are required to confirm their identities.
Putative enhancers are attached to a minimal promoter and reporter
gene, and introduced (via injection or electroporation) into a
developing embryo. Either stable or transient transgenic embryos
are assayed for reporter gene expression. Below we provide a brief
review of some post-genome methods for identifying putative
enhancers.
Computational methods: enhancers often contain a high density of
transcription factor binding sites, typically one for every 30–50 bp
across the length of the enhancer (200–300 bp or more). Algorithms
have been developed for identifying high-density clusters of putative
binding sites [58,59]. These methods work, but typically only 10–30%
of ‘hits’ represent authentic enhancers when tested in transgenic
embryos.
ChIP-Seq: permits the genome-wide identification of binding sites
for sequence-specific transcription factors, or histone modifications
(e.g. [40,41]). ChIP-Seq using antibodies against early Drosophila
patterning determinants (e.g. Dorsal, Twist and Snail) led to the
identification of shadow enhancers for a number of genes engaged in
enhancers (e.g. [14–16]). Such approaches suggest thatmany of the crucial developmental patterning genes inDrosophila are regulated by multiple enhancers that directextensively overlapping patterns of gene expression andemploy a similar regulatory ‘logic’ (e.g. [17]). The newlyidentified enhancers are sometimes termed ‘shadow enhan-cers’ because they map to more remote locations than the‘classical’ or primary enhancers situated close to the gene[18,19]. Several examples are discussed below.
The shavenbaby locus (also known as ovo) is importantfor the specification of dorsal hairs in the cuticle of embryosand larvae [20]. It is regulated by a complex array ofenhancers with extensively overlapping activities. It is
dorsal–ventral patterning [17]. In some systems it has been possible
to identify active enhancers on a genome-wide scale for a given tissue
by identifying particular histone modifications, or the enzymes
responsible for these modifications (e.g. [16]).
Chromosome conformation capture (3C) assays: can identify the
sequences in a genome that interact with specific promoters. It relies
on the stabilization of transient ‘loops’ of distal enhancers to target
promoters using formaldehyde cross-linking, similar to the chromatin
cross-linking used for ChIP-Seq assays. 4C (chromosome conforma-
tion capture-on-chip) methods were used to identify multiple and
overlapping enhancers for the regulation of Hoxd genes in the mouse
limb bud [25]. 3C and 4C assays provide an estimate of the overall
interactions that occur in vivo but do not reveal the dynamics of these
long-range interactions.
MNase-Seq and FAIRE assays: micrococcal nuclease (MNase)
induces double-strand breaks within nucleosome linker regions
and single-strand nicks within the nucleosome and can be used to
identify ‘nucleosome-free’ regions. In some cases, these regions
coincide with ‘poised’ enhancers due to the binding of pioneer
transcription factors (e.g. [60]). FAIRE (formaldehyde-assisted isola-
tion of regulatory elements) also identifies nucleosome-free regions,
or ‘open’ chromatin [61].
411
10% failure 1% failure10% failure10% failure
(a)
(b)
Enhancer Enhancer Enhancer
TRENDS in Genetics
Figure 3. Model for enhancer synergy. (a) Schematic showing that the primary and shadow enhancers (green boxes) possess the same regulatory logic (TF binding sites are
illustrated by colored circles). (b) To activate transcription, an enhancer loops to its cognate promoter. This interaction has a typical failure rate of 10%. In the presence of
two enhancers regulating the same gene at the same time (primary and shadow), the combined failure rate is 1% (10% x 10% = 1%). This assumes that the two enhancers
work independently of one another.
Review Trends in Genetics August 2012, Vol. 28, No. 8
possible to remove some of these enhancers and still obtainessentially normal cuticle patterns at optimal tempera-tures. However, these patterns are disrupted when theembryos are grown at either low (15 8C) or elevated (30 8C)temperatures. Moreover, normal embryos are resilient togenetic changes, such as reductions in the levels of Wing-less, but produce abnormal cuticles upon removal of sha-venbaby ‘shadow’ enhancers. Thus, the shadow enhancersensure reliable expression when embryos are subject togenetic and environmental variation.
A similar situation is seen for the regulation of snail,which encodes a zinc finger transcription factor that estab-lishes the boundary between the presumptive mesodermand neurogenic ectoderm [12]. The snail gene is regulatedby a proximal enhancer located near the transcription startsite, as well as by a recently identified shadow enhancerlocated 5 kb upstream of the start site within the firstintron of a neighboring gene. Quantitative imaging assaysand genetic complementation experiments suggest thatthe two enhancers ensure reliable and uniform activationof snail expression in embryos containing only one mater-nal dose of Dorsal, or when subject to high temperatures(30 8C) [12,21]. Removal of either enhancer, particularlythe distal shadow enhancer [21], causes defects in gastru-lation under adverse conditions.
Shadow enhancers have also been implicated in verte-brate developmental processes. For example, the neuro-genic regulatory gene, ATOH7 (Math5), is essential for thedevelopment of the mammalian retina [22]. A geneticdisease causing blindness at birth (nonsyndromic congeni-tal retinal nonattachment) results from the deletion of aremote ‘shadow’ enhancer located more than 20 kb awayfrom the ATOH7 transcription unit [23]. The shadowenhancer directs a very similar spatiotemporal patternof gene expression as the ‘primary’ proximal enhancer inthe developing retina of a mouse. This result suggests thatthe primary enhancer alone cannot sustain sufficient levelsof ATOH7 expression for normal development in the ab-sence of the shadow enhancer. Thus, the two enhancersseem to be redundant in terms of the location and timing ofthe expression patterns they direct, but both are requiredto reinforce ATOH7 expression and achieve correct levels ofexpression during crucial stages of eye development.
There are additional examples of multiple enhancers forkey vertebrate patterning genes. For example, deletion of a
412
limb enhancer of the paired-box homeodomain transcrip-tion factor Prx has no obvious effect on Prx expressionlevels or on limb development in mice [24], suggesting theexistence of additional, shadow enhancers. More recently,4C assays (Box 1) identified multiple putative enhancersfor Hoxd13 expression within a distal gene ‘desert’ thatcontains known regulatory elements, GCR and Prox [25].Deletions of GCR and Prox have little effect on Hoxd13expression in digits, thereby suggesting the occurrence ofredundant regulatory elements. Indeed, complete abolitionof Hoxd13 expression in digits is achieved only when thegene desert, together with the GCR and Prox regions, arecompletely deleted (830 kb deletion).
The preceding examples suggest that multiple enhan-cers represent a simple means for improving the reliabilityof gene expression. The underlying mechanism is uncer-tain, but they might increase the probability of gene acti-vation at any given time during critical windows ofdevelopment and make it more robust to perturbation.For example, if a typical enhancer has a 10% failure rateto loop and engage its target promoter, and if the proximaland distal enhancers function more or less independentlyof one another, then there is a combined failure rate of only1% (e.g. [12]). That is, two enhancers function in an inher-ently multiplicative manner to activate gene expression(Figure 3). Such a mechanism also provides robustness.For example, if the failure rate of each individual enhancerincreases to 30% due to stress, then the combined failurerate is only 9%.
An alternative explanation is that multiple enhancersensure high levels of expression above a minimal thresholdrequired for genetic function (as suggested in the case ofATOH7 regulation). In reality, multiple enhancers could beimportant both for the reliable activation of gene expres-sion and for maintaining high levels of expression. We stilldo not understand the details of how an enhancer switcheson a gene and affects levels of expression, and thereforethis is very much an open question. The source of shadowenhancers is uncertain, but it has been proposed that theymight arise from ‘cryptic’ duplication events [18].
Rendering genes ‘poised’ for activationTiming is crucial in development, and recent studies haveidentified several mechanisms that ensure faithful activa-tion of gene expression upon receipt of key inducing signals.
Promoter
+1
Exon
Nucleosome
Pol II
ser-5PNelfDSIF
mRNA (30nt)
Poised promoter
Nucleosome ‘’free’’ paused promoter
Poised enhancer
Key:
Pioneer TF (ex: FoxA, Zelda?)
Multiple chromatin marks
Enhancer
Pol II
TRENDS in Genetics
Figure 4. Summary of mechanisms of transcriptional priming. Gene transcription depends on enhancers (blue) and promoters (purple). The transcription start site (TSS) is
indicated by an arrow labeled +1. The promoter can be primed or ‘poised’ for transcription by the recruitment of Pol II before gene expression. This ‘promoter pausing’
generates a small mRNA (around 30–50 nt) and then elongation is blocked by the binding of negative elongation factors such as Nelf and DSIF. The enhancer can be
‘prepared’ for activation by the binding of pioneer factors (represented by gray boxes), by recruitment of Pol II, or by the modification of the chromatin landscape
(positioned nucleosomes and associated histone marks). These three features at enhancers may be linked, but for simplicity we illustrate them sequentially. Nucleosomes
are represented by hexagons and histone marks with colored flags. A simplified scheme of a paused promoter is represented in the gray box.
Review Trends in Genetics August 2012, Vol. 28, No. 8
We consider mechanisms that optimize induction of distalenhancers and the core promoter (Figure 4). In some cases,both are ‘primed’ for efficient activation.
Paused promotersMany metazoan genes contain paused RNA polymerase II(Pol II) prior to their activation [26–28]. This paused Pol II
Box 2. Methods for identifying paused promoters
Many developmental patterning genes contain paused Pol II before
their activation during Drosophila embryogenesis (reviewed in [27]).
There is also evidence that a significant number of inactive or weakly
expressed genes contain paused Pol II in mammalian tissues,
including embryonic stem cells. Several different methods have been
used to identify paused genes, as summarized below.
Pol II ChIP-Seq assays: the simplest method is the genome-wide
identification of Pol II binding. This is typically done with a mixture of
antibodies recognizing different isoforms of Pol II (e.g. nonpho-
sphorylated, ser-5P, ser-2P). Active genes contain Pol II extending
signals across the length of their transcription units. Inactive genes
fall into two classes: those completely lacking Pol II and those
containing Pol II near the +1 transcription start site (e.g. [26]). These
latter genes can be regarded as stalled or ‘provisionally’ paused.
However, it is unclear whether Pol II has engaged the DNA template
and undergone promoter escape, or if the signals detected in the
promoter region represent an equilibrium of unstable Pol II associat-
ing and dissociating from the template. Additional methods are
required to determine whether Pol II is truly paused, that is, activated
polymerase containing a capped nascent transcript and arresting
�30–50 bp downstream of +1.
Permanganate protection assays: stably paused Pol II is associated
with a ‘transcription bubble’ of �20 bp due to the local denaturation
is an active form of the enzyme that halts �30–50 bpdownstream of the +1 transcription start site (Figure 4;Box 2). It is present in �30% of all genes in embryonic stemcells and about 15% of genes in the early Drosophilaembryo [26,29,30]. The purpose is uncertain, but manydevelopmental patterning genes contain paused Pol II. Ithas been suggested that it fosters rapid and synchronous
of the double helix by the active polymerase. It is possible to detect
the bubble by the modification of exposed, single-stranded thymidine
residues with potassium permanganate. This method has been used
to identify transcription bubbles for a number of genes containing
stalled Pol II in Drosophila embryos and cultured S2 cells [62].
Direct sequencing: small nuclear RNAs containing 50 caps are
isolated, cloned, and then subjected to deep sequencing [63]. This
method identified +34 as a common site of paused Pol II, with the DPE
(downstream promoter element) or PB (pause button) motifs being
the last nucleotides transcribed before arrest. A significant fraction of
paused genes contain GAGA, INR, and DPE/PB motifs within or near
their core promoters.
Gro-Seq assays: this has emerged as the method of choice for the
systematic identification of paused Pol II [30,64]. However, it is not for
the faint of heart. The method is a whole-genome nuclear run-on assay.
Nuclei are harvested from embryos, tissues, or cultured cells, and
treated with Sarkosyl to block de novo binding of Pol II. A modified
nucleotide (e.g. bromouridine) is added along with a mixture of ATP
and other agents to permit the elongation of pre-existing polymerases
already engaged on DNA templates. These polymerases are allowed to
extend �50–100 nucleotides; the RNAs are then isolated using anti-
bromo antibodies and subjected to deep sequencing. The resulting
sequence information provides the exact locations of paused Pol II.
413
Review Trends in Genetics August 2012, Vol. 28, No. 8
activation of gene expression [31]. The idea is that regu-lating Pol II release, rather than recruitment, permitsrapid induction of gene expression. This hypothesis hasbeen explored using detailed mathematical modeling oftranscription [32], but it still remains to be tested experi-mentally.
A nonexclusive alternative view is that paused Pol II isinvolved in recruiting chromatin-modifying enzymes thatexpedite transcription. For example, the chromatin land-scape of the Hsp70 locus (the prototypic paused gene inDrosophila) is rapidly altered following heat shock, througha mechanism independent of transcription [33]. This rapidchange is key to the effective activation of Hsp70 expressionupon heat shock. Moreover, there is an inverse correlationbetween paused Pol II and positioned nucleosomes at thecore promoter [34,35]. An increase in positioned nucleo-somes has been observed upon destabilization of pausedPol II (e.g. NelfE knockdown in S2 cells) [34]. Conversely,diminished levels of the Polycomb repressor (in esc mutantembryos) correlates with augmented levels of paused Pol II[35]. It would appear that the promoter regions of develop-mentally regulated genes contain either paused Pol II orpositioned nucleosomes, but the basis for this regulatoryswitch is uncertain.
These studies raise the possibility that paused Pol IImight prepare genes for activation by establishing an‘open’ configuration at the promoter. However, this possi-bility has not yet been critically tested.
Poised enhancersThere is also evidence that enhancers can be prepared forrapid deployment before gene activation (Figure 4). Forexample, the forkhead transcription factor FoxA binds tothe Albumin enhancer in the primitive endoderm of mouseembryos where it is inactive (reviewed in [36]). FoxA is anexample of a ‘pioneer’ factor [37]; it binds to inactiveenhancers and renders them ‘poised’ for rapid inductionupon the appearance of key activators, such as thosemediating cell signaling.
To bind inactive enhancers, pioneer factors have thedefining property of binding to nucleosomal DNA andcompact chromatin, and remain bound even during mito-sis. Since the initial discovery of FoxA and GATA factors aspioneer factors in the liver differentiation program, addi-tional examples have been described [38,39].
Zelda is a maternal zinc finger transcription factor thatis essential for the activation of �100 genes 2–3 h afterfertilization during Drosophila embryogenesis (maternalto zygotic transition) [40–42]. It binds to the enhancerregions of many or most developmental control genesbefore their activation. Disrupting Zelda binding sitescan delay the onset of expression, or cause sporadic pat-terns of activation [40,41]. Thus, Zelda renders develop-mental enhancers poised for activation by maternaldeterminants such as Bicoid and Dorsal, and may functionas a pioneer factor. It might also help ensure reliablepatterns of gene activation in natural populations understress, but this idea has not yet been tested.
The mechanisms by which pioneer factors prepareenhancers for efficient activation are not known. It hasbeen suggested that they can displace nucleosomes and
414
thereby render adjacent binding sites available for occu-pancy [36,38]. A nonexclusive possibility is that pioneerfactors recruit chromatin-modifying enzymes that ‘mark’enhancers for rapid deployment. For example, inactiveliver and pancreas enhancers exhibit ‘active’ chromatinmodifications in the mouse foregut endoderm where theyare inactive [36]. This suggests ‘pre-patterning’ of theenhancers in progenitor tissues before their induction inthe liver and pancreas. The P300 histone acetyltransferaseand the EZH2 histone methyltransferase have been impli-cated in these modifications [43]. It is conceivable that suchmodifications are not strictly required for gene expression,but might improve the precision and stability of geneexpression in natural populations.
More recently it has been suggested that histone mod-ifications and Pol II help to prime distal enhancers [44](Figure 4). In this study, whole-genome Chip-Seq assayswere performed on isolated tissues obtained from stagedDrosophila embryos. The timing of gene expression corre-lated with Pol II binding and two types of chromatin marksin enhancers. Pol II occupancy at enhancers is counterin-tuitive, but multiple studies, in human ES cells [45] andmice [45,46], suggest that enhancers can be bound by Pol IIand are sometimes transcribed. Additional members of thegeneral transcription machinery, such as the TATA bind-ing protein TAF3 [47], are also seen at particular enhan-cers. It was suggested that these factors might fosterlooping interactions between distal enhancers and promo-ters, but it is currently unclear how Pol II and associatedfactors might render enhancers poised for activation. It ispossible that they are recruited to enhancers by pioneerTFs, but this idea awaits further studies.
When stochastic expression is ‘purposeful’Many developmental patterning genes in Drosophila con-tain paused Pol II, shadow enhancers, or both. We havediscussed how these mechanisms might foster the preci-sion and stability of gene expression in development. How-ever, there are examples of developmental control genesthat exhibit sporadic or stochastic patterns of expression.Some might exhibit such expression because there is noselective pressure for them to be expressed in a precise andsynchronous manner. However, there are cases wherestochastic expression is used as a purposeful strategy forgenerating regulatory diversity among the cells of a popu-lation [48]. One of the most striking examples is seen in theeye of the adult fly [49–51].
Color vision depends on the differential expression ofrhodopsin-3 (Rh3) and Rh4 in the R7 photoreceptor cellsand the differential expression of Rh5 and Rh6 in the R8photoreceptor cells. These differential patterns depend onstochastic expression of spineless, which encodes a homeo-box transcription factor that activates Rh4 in R7 [52].Approximately 70% of the ommatidia express spineless,but the patterns of activation differ among adult flies.When spineless is expressed, Rh4 is activated in R7; ifnot, Rh3 is expressed instead. The identity of these distinctclasses of R7 cells dictates the identities of the underlyingR8 cells. When spineless and Rh4 are expressed in R7, thenRh6 is expressed in the associated R8 cell. Conversely,when spineless is absent and Rh3 is expressed in R7, then
Box 3. Outstanding questions
� How do multiple enhancers provide precision in gene expression:
do they increase the levels or probability of expression?
� Are genes with multiple enhancers more or less ‘evolvable’? Do
shadow enhancers increase the probability of evolving novel gene
activities?
� How do pioneer factors prime enhancers?
� How does paused Pol II prime the promoter?
� When are imprecise, stochastic modes of gene activation
advantageous in development?
Review Trends in Genetics August 2012, Vol. 28, No. 8
Rh5 is expressed in the associated R8 cell. Thus, diversepatterns of rhodopsin expression are achieved by the sto-chastic expression of spineless. The underlying mechanismis uncertain.
There are other examples of the imporatance of stochas-tic expression in the control of developmental genes. Nota-bly, Nanog, one of the key determinants of pluripotentstem cells, exhibits stochastic expression in cultured EScells and in early mouse embryos [53,54]. There is acorrelation between elevated levels of Nanog expressionand self-renewal of pluripotent stem cells in culture[55,56]. By contrast, low levels correlate with a propensityfor the cells to differentiate.
Concluding remarksThe preceding examples are probably exceptional. Webelieve that most regulatory genes are ‘primed’ for rapidand precise deployment during development. Severalmechanisms were discussed, including redundancies ingene networks and developmental enhancers, shadowenhancers, and primed promoters and enhancers (viapaused Pol II and pioneer TFs). There is little doubt thatadditional mechanisms await discovery (Box 3).
There is something of a chicken and egg issue that wehave skirted. Namely, what is the source of these mecha-nisms of developmental precision? It is conceivable thatthey arose from the demands of natural populations,namely, to stabilize complex developmental processes inresponse to inherent (genetic) and extrinsic (environmen-tal) fluctuations. Alternatively, they might have arisenfrom the demands of the embryo, to produce timely anddynamic on/off patterns of gene expression underlying cellspecification processes. These are not mutually exclusiveconcepts. A regulatory mechanism selected to providestability in natural populations (e.g. shadow enhancer)might be incorporated into the core patterning process toproduce sharper borders of gene expression [12] or homog-enous patterns of activation [57]. Conversely, a mecha-nism selected for developmental precision (e.g. paused PolII) might foster robustness of expression in natural popu-lations. We suggest that the dynamic interplay betweenthe demands of natural populations and the embryo hasproduced the exquisite patterning processes that underlieanimal development.
References1 Levine, M. (2010) Transcriptional enhancers in animal development and
evolution. Curr. Biol. 20, R754–R7632 Nu sslein-Volhard, C. and Wieschaus, E. (1980) Mutations affecting
segment number and polarity in Drosophila. Nature 287, 795–801
3 Nu sslein-Volhard, C. and Roth, S. (1989) Axis determination in insectembryos. Ciba Found. Symp. 144, 37–55
4 Ip, Y.T. et al. (1992) The bicoid and dorsal morphogens use a similarstrategy to make stripes in the Drosophila embryo. J. Cell Sci. 16(Suppl.), 33–38
5 Davidson, E.H. (2009) Network design principles from the sea urchinembryo. Curr. Opin. Genet. Dev. 19, 535–540
6 Raj, A. et al. (2010) Variability in gene expression underlies incompletepenetrance. Nature 463, 913–918
7 Burga, A. et al. (2011) Predicting mutation outcome from earlystochastic variation in genetic interaction partners. Nature 480, 250–253
8 Arnosti, D.N. et al. (1996) The eve stripe 2 enhancer employs multiplemodes of transcriptional synergy. Development 122, 205–214
9 Ludwig, M.Z. et al. (2011) Consequences of eukaryotic enhancerarchitecture for gene expression dynamics, development, and fitness.PLoS Genet. 7, e1002364
10 Stanojevic, D. et al. (1991) Regulation of a segmentation stripe byoverlapping activators and repressors in the Drosophila embryo.Science 254, 1385–1387
11 Small, S. et al. (1992) Regulation of even-skipped stripe 2 in theDrosophila embryo. EMBO J. 11, 4047–4057
12 Perry, M.W. et al. (2010) Shadow enhancers foster robustness ofDrosophila gastrulation. Curr. Biol. 20, 1562–1567
13 Waddington, C.H. (1942) Canalization of development and theinheritance of acquired characters. Nature 150, 563–565
14 Zinzen, R.P. et al. (2009) Combinatorial binding predicts spatio-temporal cis-regulatory activity. Nature 462, 65–70
15 He, Q. et al. (2011) High conservation of transcription factor bindingand evidence for combinatorial regulation across six Drosophilaspecies. Nat. Genet. 43, 414–420
16 May, D. et al. (2011) Large-scale discovery of enhancers from humanheart tissue. Nat. Genet. 44, 89–93
17 Zeitlinger, J. et al. (2007) Whole-genome ChIP-chip analysis of Dorsal,Twist, and Snail suggests integration of diverse patterning processes inthe Drosophila embryo. Gene Dev. 21, 385–390
18 Hong, J.W. et al. (2008) Shadow enhancers as a source of evolutionarynovelty. Science 321, 1314
19 Barolo, S. (2011) Shadow enhancers: frequently asked questions aboutdistributed cis-regulatory information and enhancer redundancy.BioEssays 34, 135–141
20 Frankel, N. et al. (2010) Phenotypic robustness conferred by apparentlyredundant transcriptional enhancers. Nature 466, 490–493
21 Dunipace, L. et al. (2011) Complex interactions between cis-regulatorymodules in native conformation are critical for Drosophila snailexpression. Development 4084, 4075–4084
22 Riesenberg, A.N. et al. (2009) Rbpj cell autonomous regulation ofretinal ganglion cell and cone photoreceptor fates in the mouseretina. J. Neurosci. 29, 12865–12877
23 Ghiasvand, N.M. et al. (2011) Deletion of a remote enhancer nearATOH7 disrupts retinal neurogenesis, causing NCRNA disease. Nat.Neurosci. 14, 578–586
24 Cretekos, C.J. et al. (2008) Regulatory divergence modifies limb lengthbetween mammals. Gene Dev. 22, 141–151
25 Montavon, T. et al. (2011) A regulatory archipelago controls Hox genestranscription in digits. Cell 147, 1132–1145
26 Zeitlinger, J. et al. (2007) RNA polymerase stalling at developmentalcontrol genes in the Drosophila melanogaster embryo. Nat. Genet. 39,1512–1516
27 Levine, M. (2011) Paused RNA polymerase II as a developmentalcheckpoint. Cell 145, 502–511
28 Li, J. and Gilmour, D.S. (2011) Promoter proximal pausing and thecontrol of gene expression. Curr. Opin. Genet. Dev. 21, 231–235
29 Guenther, M.G. et al. (2007) A chromatin landmark and transcriptioninitiation at most promoters in human cells. Cell 130, 77–88
30 Min, I.M. et al. (2011) Regulating RNA polymerase pausing andtranscription elongation in embryonic stem cells. Gene Dev. 25, 742–754
31 Boettiger, A.N. and Levine, M. (2009) Synchronous and stochasticpatterns of gene activation in the Drosophila embryo. Science 325,471–473
32 Boettiger, A.N. et al. (2011) Transcriptional regulation: effects ofpromoter proximal pausing on speed, synchrony and reliability.PLoS Comput. Biol. 7, e1001136
415
Review Trends in Genetics August 2012, Vol. 28, No. 8
33 Petesch, S.J. and Lis, J.T. (2008) Rapid, transcription-independent lossof nucleosomes over a large chromatin domain at Hsp70 loci. Cell 134,74–84
34 Gilchrist, D.A. et al. (2010) Pausing of RNA polymerase II disruptsDNA-specified nucleosome organization to enable precise generegulation. Cell 143, 540–551
35 Chopra, V.S. et al. (2011) The Polycomb group mutant esc leads toaugmented levels of paused Pol II in the Drosophila embryo. Mol. Cell.42, 837–844
36 Zaret, K.S. and Carroll, J.S. (2011) Pioneer transcription factors:establishing competence for gene expression. Gene Dev. 25, 2227–2241
37 Watts, J.A. et al. (2011) Study of FoxA pioneer factor at silent genesreveals Rfx-repressed enhancer at Cdx2 and a potential indicator ofesophageal adenocarcinoma development. PLoS Genet. 7, e1002277
38 Magnani, L. et al. (2011) Pioneer factors: directing transcriptionalregulators within the chromatin environment. Trends Genet. 27,465–474
39 Fakhouri, T.H.I. et al. (2010) Dynamic chromatin organization duringforegut development mediated by the organ selector gene pha-4/FoxA.PLoS Genet. 6, e1001060
40 Liang, H.L. et al. (2008) The zinc-finger protein Zelda is a key activatorof the early zygotic genome in Drosophila. Nature 456, 400–403
41 Nien, C.Y. et al. (2011) Temporal coordination of gene networks byZelda in the early Drosophila embryo. PLoS Genet. 7, e1002339
42 Harrison, M.M. et al. (2011) Zelda Binding in the early Drosophilamelanogaster embryo marks regions subsequently activated at thematernal-to-zygotic transition. PLoS Genet. 7, e1002266
43 Xu, C.R. et al. (2011) Chromatin ‘prepattern’ and histone modifiers in afate choice for liver and pancreas. Science 332, 963–966
44 Bonn, S. et al. (2012) Tissue-specific analysis of chromatin stateidentifies temporal signatures of enhancer activity during embryonicdevelopment. Nat. Genet. 44, 148–156
45 Rada-Iglesias, A. et al. (2011) A unique chromatin signature uncoversearly developmental enhancers in humans. Nature 470, 279–283
46 De Santa, F. et al. (2010) A large fraction of extragenic RNA pol IItranscription sites overlap enhancers. PLoS Biol. 8, e1000384
47 Liu, Z. et al. (2011) Control of embryonic stem cell lineage commitmentby core promoter factor, TAF3. Cell 146, 720–731
48 Eldar, A. and Elowitz, M.B. (2010) Functional roles for noise in geneticcircuits. Nature 467, 167–173
49 Vasiliauskas, D. et al. (2011) Feedback from rhodopsin controlsrhodopsin exclusion in Drosophila photoreceptors. Nature 479, 108–112
416
50 Johnston, R.J. et al. (2011) Interlocked feedforward loops control cell-type-specific rhodopsin expression in the Drosophila eye. Cell 145, 956–968
51 Jukam, D. and Desplan, C. (2010) Binary fate decisions indifferentiating neurons. Curr. Opin. Neurobiol. 20, 6–13
52 Wernet, M.F. et al. (2006) Stochastic spineless expression creates theretinal mosaic for colour vision. Nature 440, 174–180
53 Dietrich, J.E. and Hiiragi, T. (2007) Stochastic patterning in the mousepre-implantation embryo. Development 134, 4219–4231
54 Silva, J. and Smith, A. (2008) Capturing pluripotency. Cell 132, 532–536
55 Kalmar, T. et al. (2009) Regulated fluctuations in nanog expressionmediate cell fate decisions in embryonic stem cells. PLoS Biol. 7,e1000149
56 Glauche, I. et al. (2010) Nanog variability and pluripotency regulationof embryonic stem cells–insights from a mathematical model analysis.PLoS ONE 5, e11238
57 Perry, M.W. et al. (2011) Multiple enhancers ensure precision of gapgene-expression patterns in the Drosophila embryo. Proc. Natl. Acad.Sci. U.S.A. 108, 13570–13575
58 Berman, B.P. et al. (2002) Exploiting transcription factor binding siteclustering to identify cis-regulatory modules involved in patternformation in the Drosophila genome. Proc. Natl. Acad. Sci. U.S.A.99, 757–762
59 Markstein, M. et al. (2002) Genome-wide analysis of clustered Dorsalbinding sites identifies putative target genes in the Drosophila embryo.Proc. Natl. Acad. Sci. U.S.A. 99, 763–768
60 Valouev, A. et al. (2011) Determinants of nucleosome organization inprimary human cells. Nature 474, 516–520
61 Giresi, P.G. and Lieb, J.D. (2009) Isolation of active regulatoryelements from eukaryotic chromatin using FAIRE (formaldehydeassisted isolation of regulatory elements). Methods 48, 233–239
62 Gilmour, D.S. and Fan, R. (2009) Detecting transcriptionally engagedRNA polymerase in eukaryotic cells with permanganate genomicfootprinting. Methods 48, 368–374
63 Nechaev, S. et al. (2010) Global analysis of short RNAs revealswidespread promoter-proximal stalling and arrest of Pol II inDrosophila. Science 327, 335–338
64 Core, L.J. et al. (2008) Nascent RNA sequencing reveals widespreadpausing and divergent initiation at human promoters. Science 322,1845–1848
Is ‘forward’ the same as ‘plus’?. . . and otheradventures in SNP allele nomenclature
Sarah C. Nelson1, Kimberly F. Doheny2, Cathy C. Laurie1 and Daniel B. Mirel3
1 Genetics Coordinating Center, Department of Biostatistics, University of Washington, Seattle, WA, USA2 Center for Inherited Disease Research, Institute of Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA3 Broad Institute (Massachusetts Institute of Technology/Harvard), Cambridge, MA, USA
Letter
In the accelerating and expanding field of research ongenetic variation, it has become standard practice to workwith a combination of datasets generated by multipleresearch groups at different times and by different meth-ods. Synthesizing these data is important for genotypeimputation, meta-analysis, and other applications, butmay be difficult because alleles are typically observedand recorded on only one of the two DNA strands ingenotyping and sequencing experiments. Different nomen-clatures have arisen to designate strand orientation whenreporting single nucleotide polymorphism (SNP) geno-types, but they are neither widely understood nor uniform-ly applied. Here we define the most common allele strandorientation nomenclatures and provide guidance in achiev-ing strand consistency.
The majority of SNPs are ‘strand unambiguous’, suchthat genotypes called on different strands are readilyidentifiable (e.g., A/G alleles on one strand are T/C alleleson the opposite strand). However, determining strandorientation at ‘strand ambiguous’ SNPs is more complicat-ed, where alleles are symmetrical across strands (A/T andC/G). It is assumed that all researchers, as a minimum forconsistency, report the two alleles of a biallelic SNP on thesame strand. It is the choice and the definition of whichstrand is used that leads to ambiguity. Generally, SNPalleles are reported for a single strand designated in one offour strand naming conventions: ‘probe/target’, ‘plus/mi-nus’, ‘TOP/BOT’, and ‘forward/reverse’, defined as follows.
Probe/targetWhen SNPs are assayed with a site-specific probe, one ofthe two strands corresponds to (i.e., is collinear with) theprobe sequence itself, and the other to the complementarygenomic target sequence that flanks or spans the SNP site.Sometimes the probe strand is called the ‘design’ strand (inreference to assay design). Although the specifics varybetween platforms, alternative alleles at a SNP site areoften initially represented using the generic letter codes Aand B. In the following, an italicized A refers to this genericallele designation and not to adenine. In Illumina annota-tion each SNP is defined with design allele nucleotides, andthese occur on the same strand as the probe sequence; theorder in which the alternative alleles are given specifiesthe generic A and B allele designations [1]. To illustrate, fora SNP defined as [T/G], the A allele is T and the B allele is
Corresponding author: Nelson, S.C. ([email protected]).Keywords: allele; strand translation; genotype; nomenclature; genome-wideassociation study; meta-analysis.
G. In Affymetrix allele-specific hybridization technology,the letter codes A and B are assigned differently and couldtherefore occur on either the probe or target strand [2].
Plus (+)/minus (S)In all human reference chromosomes, as for other eukar-yotes [3], the plus (+) strand is defined as the strand with its50 end at the tip of the short arm [4,5] (Genome ReferenceConsortium, personal communication, March 27, 2012).SNP alleles reported on the same strand as the (+) strandare called ‘plus’ alleles and those on the (�) strand are called‘minus’ alleles. Providing SNP alleles on the plus genomicstrand is the convention in publicly available SNP datasetssuch as the HapMap (www.hapmap.org) and 1000 GenomesProjects (www.1000genomes.org).
Although the plus/minus designation is anchored atthe telomeres of each chromosome, the orientation ofintervening sequences may change between genomebuilds as gaps are filled in and sequences are refined.Thus when reporting plus/minus strand, one must specifya genome build. The fluid nature of plus/minus orienta-tion has partly motivated the development of alternativenomenclatures.
Illumina TOP/BOT strandThe TOP/BOT strand naming convention, developed byIllumina and subsequently adopted by dbSNP, has beenthoroughly defined elsewhere [1]. In brief, Illuminastrand designation is determined by either the SNPalternative nucleotides or its flanking sequence. For un-ambiguous SNPs the TOP strand is defined as the onethat contains an A nucleotide allele. The A is designatedgenerically as allele A, whereas the alternative allele onthe TOP strand is designated as allele B. For ambiguousSNPs the strand designation and allele A/B assignmentsare determined by flanking sequence in a similar manner.This strand definition is ‘local’ to a SNP in that allelesreported on the TOP strand for two neighboring SNPsmay be on different physical strands of DNA [6]. Further-more, the TOP/BOT strand definition is intended to beindependent of any genome build or design strand. An-other key feature of this naming system is that allele A fora TOP strand probe is the base pair complement of alleleA for a BOT strand probe, such that the generic A/Bgenotype coding remains consistent regardless of whichstrand is probe or target. This nomenclature offers rela-tive stability in the face of changing human genomeassemblies and SNP databases.
361
Box 1. An example of allele conversion using Illumina annotation
Here we use Illumina-provided annotation for an example SNP
(rs216614) in Table I to derive a set of allele call conversions in Table
II. In Table I, ‘SNP’ gives alternative alleles on the probe sequence
strand, ‘IlmnStrand’ gives the TOP/BOT status of the probe sequence
strand, ‘TopGenomicSeq’ gives the sequence surrounding the SNP
on the TOP strand, ‘RefStrand’ gives the plus/minus status of the
probe sequence strand, and ‘IlmnID’ encodes the correspondence
between TOP/BOT and forward/reverse (dbSNP) strands. The ‘de-
sign’ alleles (on the probe sequence strand) are given directly by
‘SNP’ = [T/G] and, following the Illumina convention, the first
nucleotide corresponds to allele A and the second to allele B. The
TOP strand alleles are given in brackets in ‘TopGenomicSeq’. The
‘B_R’ in ‘IlmnID’ specifies that the dbSNP reverse strand corresponds
with the BOT strand. The corresponding SNP assay is depicted in
Figure I.
Table I. Excerpt from Illumina HumanOmni1-Quad_v1-0_C annotation file (build 37)
IlmnID Name IlmnStrand SNP TopGenomicSeq RefStrand
rs216614-131_B_R_1865662557 rs216614 BOT [T/G] ...CATCCC[A/C]TGCACA. . . –
(+) strand
C
G G G G GT TC C C C AA A
C C C C C
GG
C
T
A
A AG G G GT T T3′
5′ 3′
5′
(–) strand
TRENDS in Genetics
Figure I. A simplified schematic of the SNP probe, where the probe sequence is
in blue and the target sequence in black text. The ‘design’ alleles (T or G) are the
fluorescently labeled nucleotides recruited to the allele probe in this two-color
primer-extension assay. Adapted from materials available on the Illumina
website (www.illumina.org).
Table II. rs216614 allele-mapping table
AB TOP Design Forward Plus
A A T A A
B C G C C
Letter Trends in Genetics August 2012, Vol. 28, No. 8
Forward/reverseThe dbSNP resource of the US National Center for Bio-technology Information (NCBI) contains detailed informa-tion for each SNP in its database. Each refSNP (or ‘rs’)entry consists of one or more submitted SNP (or ‘ss’)records, each submitted by individual laboratories. EachdbSNP record shows a flanking DNA sequence, which issimply taken from the submission with the longest flank-ing sequence [6,7]. SNP alleles reported on the same strandas this exemplar sequence in dbSNP sequence are called‘forward’ alleles. Conversely, alleles on the opposite strandare called ‘reverse’ alleles. Note that the dbSNP meaning of‘forward’ is easily confused with (+) genomic strand, whichhas been referred to as the ‘forward’ strand by the HapMapproject [8,9].
Achieving strand consistencyThe most basic level of strand consistency requires onlythat genotypes are reported on the same DNA strandacross datasets. At strand-unambiguous SNPs, discrepantnucleotides are sufficient to identify strand inconsistencies(e.g., A/C in one dataset and T/G in another). However,harmonizing strand-ambiguous SNPs requires convertingallele calls to a specific strand, according to one of thestrand naming conventions described above. Given a nu-cleotide sequence with a SNP and its flanking bases (e.g.,CATCCC[A/C]TGCACA) one can determine whether thestrand of that sequence is (i) plus or minus, by sequencematching with the genomic reference sequence; (ii) TOP orBOT, from the SNP itself or its flanking sequence [1]; and(iii) forward or reverse, from the ‘ss’ sequence record indbSNP. Determination of probe or target strand requiresadditional information about assay design. In practice,genotyping assay vendors generally supply annotations
362
that can be used to make strand conversions. Box 1 givesan example of how to interpret Illumina annotation tocreate a table of allele call conversions. Figure I shows asimplified schematic of the genotyping probe at this exam-ple SNP. However, SNP annotations are not infallible andfurther checks on strand consistency are useful. Commonlyused checks are comparisons of minor allele frequency andpatterns of linkage disequilibrium between the datasets tobe harmonized [10,11].
Our intent is not to advocate one allele nomenclatureabove all others because the universal adoption of onenaming system is both unlikely and unnecessary. Instead,our aim is to explain the different nomenclatures and theneed for precise documentation of allele designations foreach dataset. Increased understanding and documentationwill facilitate continued data sharing and collaborationwithin the genetics research community.
AcknowledgmentsThis work was supported in part by the following National Institutes ofHealth grants: GENEVA Coordinating Center (U01 HG004446);GARNET Coordinating Center (U01 HG005157); Center for InheritedDisease Research (U01HG004438, NIH contract numbersHHSN268200782096C and HHSN268201100011I); and Broad Centerfor Genotyping and Analysis (U01HG04424).
References1 Illumina Inc. (2006) ‘TOP/BOT’ strand and ‘A/B’ allele (Technical Note).
http://www.illumina.com/documents/products/technotes/technote_topbot.pdf
2 Affymetrix Inc. (2012) Affymetrix genotyping glossary. http://www.affymetrix.com/support/help/genotyping_glossary/index.affx
3 Cherry, J.M. et al. (1998) SGD: Saccharomyces genome database.Nucleic Acids Res. 26, 73–79
4 Dunham, I. et al. (1999) The DNA sequence of human chromosome 22.Nature 402, 489–495
Letter Trends in Genetics August 2012, Vol. 28, No. 8
5 Cartwright, R.A. and Graur, D. (2011) The multiple personalities ofWatson and Crick strands. Biol. Direct 6, 7
6 National Center for Biotechnology Information (2005) Sequenceformatting in dbSNP reports. http://www.ncbi.nlm.nih.gov/books/NBK44414
7 Kitts, A.K. and Sherry, S. (2002) The single nucleotide polymorphismdatabase (dbSNP) of nucleotide sequence variation. In The NCBIHandbook (McEntyre, J. and Ostell, J., eds), National Center forBiotechnology Information (Chap. 5) In: http://www.ncbi.nlm.nih.gov/books/NBK21101/)
8 Frazer, K.A. et al. (2007) A second generation human haplotype map ofover 3.1 million SNPs. Nature 449, 851–861
9 Altshuler, D.M. et al. (2010) Integrating common and rare geneticvariation in diverse human populations. Nature 467, 52–58
10 Browning, S.R. (2009–2011) Strand-switching utility for BEAGLE.http://faculty.washington.edu/sguy/beagle/strand_switching/strand_switching.html
11 Howie, B. and Marchini, J. (2009-2012) IMPUTE2 strand alignmentoptions. http://mathgen.stats.ox.ac.uk/impute/strand_alignment_options.html
0168-9525/$ – see front matter � 2012 Elsevier Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.tig.2012.05.002 Trends in Genetics, August 2012,
Vol. 28, No. 8
363
Corrigendum: Human evolutionary genomics:ethical and interpretive issues[Trends in Genetics 28 (2012)137–145]
Joseph J. Vitti1,2, Mildred K. Cho3, Sarah A. Tishkoff4 and Pardis C. Sabeti1,2
1 Broad Institute of MIT and Harvard, Cambridge, Massachusetts2 Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts3 Stanford Center for Biomedical Ethics, Stanford University, Palo Alto, California4 Departments of Genetics and Biology, University of Pennsylvania, Philadelphia, Pennsylvania
Erratum
In Figure 1 the genes involved in pigmentation are shownas ‘‘SLC24A5, SLC42A2’’. They should be ‘‘SLC24A5,SLC45A2’’. Similarly, in the legend for Figure 1, line 8,it reads:
‘‘In European populations, genes that affect skin pig-mentation (SLC24A5 and SLC42A2) have undergonepositive selection.’’
It should read:
§ DOI of original article: 10.1016/j.tig.2011.12.001.Corresponding author: Vitti, J.J. ([email protected]).
‘‘In European populations, genes that affect skin pig-mentation (SLC24A5 and SLC45A2) have undergonepositive selection.’’
We apologize to the readers of this article for this error.
0168-9525/$ – see front matter � 2012 Elsevier Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.tig.2012.05.003 Trends in Genetics, August 2012, Vol. 28, No. 8
417