TiG-8-2012

Oxytricha as a modern analog ofancient genome evolutionAaron David Goldman and Laura F. Landweber

Department of Ecology and Evolutionary Biology, Princeton University, Guyot Hall, Princeton, NJ 08544, USA

Review

Several independent lines of evidence suggest that themodern genetic system was preceded by the ‘RNAworld’ in which RNA genes encoded RNA catalysts.Current gaps in our conceptual framework of early ge-netic systems make it difficult to imagine how a stableRNA genome may have functioned and how the transi-tion to a DNA genome could have taken place. Here weuse the single-celled ciliate, Oxytricha, as an analog tosome of the genetic and genomic traits that may havebeen present in organisms before and during the estab-lishment of a DNA genome. Oxytricha and its closerelatives have a unique genome architecture involvingtwo differentiated nuclei, one of which encodes thegenome on small, linear nanochromosomes. While itsunique genomic characteristics are relatively modern,some physiological processes related to the genomesand nuclei of Oxytricha may exemplify primitive states ofthe developing genetic system.

Early genome evolutionThe modern genetic system requires the synthesis andfunctional orchestration of three distinct biopolymers:DNA, RNA, and proteins. This complex system was likelypreceded by a stage in which RNA played a central roleboth in information storage and as the only genetically-encoded catalyst (Figure 1) [1,2]. The early prominence ofRNA is substantiated by its ability to store genetic infor-mation, as in mRNA, and to impart catalysis, as demon-strated by the abundance of catalytic RNAs present innature and produced in laboratories [3]. The primacy offunctional RNAs in the process of protein translation(transfer and ribosomal RNAs and other functional RNAsthat modify them), coupled to the ubiquity of those RNAsacross all extant life, suggests that the translation systememerged from an RNA-catalyzed metabolism [4]. The cen-tral role of nucleotide-derived cofactors (such as ATP,NADH, and CoA) in metabolism is consistent with a sce-nario in which those functions were previously catalyzedby ribozymes [5].

The catalytic range of RNA is limited and a ribozyme-based metabolic system probably remained dependent onthe background chemistry from which it emerged. Thedevelopment of protein translation may have evolved asa mechanism to bring this crucial chemistry under thecontrol of genetically-encoded enzymes [6]. Deoxyribonu-cleotides were probably unavailable until the evolution ofribonucleotide reductase proteins [7], implying that the

Corresponding author: Goldman, A.D. ([email protected]).

382 0168-9525/$ – see front matter � 2012 Elsevier Ltd. All righ

development of the DNA genome was not even possibleuntil substantial evolution of protein enzymes had takenplace. By this point, the translation system seems to havereached a moderate level of its modern sophistication andthe range of protein fold architectures encoded by earlygenomes had significantly expanded [8].

The transition from an RNA genome to a DNA genome isnot well understood. Many protein fold architectures seemto have evolved before this stage, because modern ribonu-cleotide reductase enzymes fall into three distinct classesthat share no noticeable similarity in amino acid sequencebut appear to be homologous when their active site aminoacids are compared in 3D structure alignments [9]. Al-though DNA-processing functions are similar across thetree of life, no ancient core of enzymes can be detected bysequence comparison (Figure 2) [10]. Six distinct familiesof DNA polymerase are known, but only those with specificfunctions related to excision repair have a universal taxo-nomic distribution [11]. Two distinct families of DNAprimase, one bacterial and one archaeal/eukaryotic, areobserved in modern life. A similar phylogenetic pattern isobserved in DNA ligases. This lack of a universal DNAmetabolism may imply that a complete protein-catalyzedDNA-processing system was not present in the last uni-versal common ancestor (LUCA) or that ancient non-ortho-logous gene displacements [12], in either the ancestor ofBacteria or the ancestor of Archaea and Eukarya, erasedthe phylogenetic evidence of most DNA processes in LUCA.

In lieu of the current inability to reconstruct early ge-nome-related metabolism through bioinformatics, someresearchers have used features of modern biological systemsas analogs to traits of ancient organisms and their genomes.For example, it has been argued that viruses provide anideal evolutionary platform to acquire a DNA genome in anRNA world and to distribute this trait to cellular life [13]. Asimilar approach compares the notion of early genomes to aciliate macronucleus in which genes are encoded on smalllinear chromosomes [14]. Here, we expand the latter ideaand use the remarkable genetic system of the ciliate genusOxytricha [15] to improve our understanding of the earlytransition from RNA to DNA genomes.

Oxytricha

Oxytricha is a genus of single-celled ciliated protists. Theyare predatory, mitochondrion-bearing, free-living organ-isms that inhabit freshwater environments. Its lineagediverged �1 Gya ago from the common ancestor of Tetra-hymena and Paramecium [16]. Oxytricha spp, like most

ts reserved. doi:10.1016/j.tig.2012.03.010 Trends in Genetics, August 2012, Vol. 28, No. 8

mailto:[email protected]

http://dx.doi.org/10.1016/j.tig.2012.03.010

Informational RNAs

AUGCGGCAUUAUGGGAC..

Functional RNAs

Prebiotic reaction networks

Functional proteins

Metabolicnetworks

Informational RNAsFunctional RNAs

AUGCGGCAUUAUGGGAC..

Functional proteins

Metabolicnetworks

Informational RNAsFunctional RNAs

AUGCGGCAUUAUGGGAC..

DNA genome

(a)

(b)

(c)

TRENDS in Genetics

Figure 1. The development of the modern genetic system from an RNA-dominated precursor genetic system. (a) The first genetic system probably involved informational

RNAs encoding ribozymes which facilitated the replication of those informational RNAs [1]. Given the narrow catalytic range of ribozymes, this system probably relied on

substantial networks of prebiotic chemistry to provide activated nucleotides [6]. (b) Protein synthesis by translation most likely arose from this RNA-based system [7] and

rapidly developed into a highly processive, high-fidelity system [8]. Appropriately, the translation system is dominated by functional RNAs, including the ribosome itself,

which has a ribozyme active site in its highly conserved core [57,58]. (c) The DNA genome probably arose from an RNA–protein precursor system. Deoxyribonucleotides

seem to have been unavailable until the evolution of the ribonucleotide reductase protein enzymes [7]. Unlike translation, DNA replication and processing are dominated by

protein functions rather than RNA functions, and core DNA-related functions do not appear to be universally conserved [10,11]. In the absence of significant bioinformatic

evidence, the transition from an RNA genome to a DNA genome remains enigmatic.

Review Trends in Genetics August 2012, Vol. 28, No. 8

ciliates, have two types of nuclei, a micronucleus and amacronucleus (reviewed in [17]). The macronucleus istranscriptionally active during vegetative growth, whereasthe micronucleus is almost always transcriptionally silent.However, only the micronucleus is exchanged during theciliate sexual cycle, after which a new macronucleus andmacronuclear genome are formed from micronuclear DNA.Although these general traits are common throughout thephylum Ciliophora, the architectures of the macronuclearand micronuclear genomes, as well as the process of mac-ronuclear development, differ among ciliate taxa.

The Oxytricha micronuclear genome contains approxi-mately 1Gb of sequence, while the macronuclear genomecontains approximately 50Mb of sequence, representing a95% reduction in genome content during development [16].

In addition, thousands of micronuclear genes are scram-bled with respect to their macronuclear counterparts, withsegments of micronuclear genes present in a permuted orinverted order relative to their order in the macronucleus(Figure 3) (reviewed in [18]). Following sexual exchange ofhaploid micronuclei, the macronuclear genome assemblesfrom dispersed segments of micronuclear DNA through aprocess of genome rearrangement that is guided by mac-ronuclear RNA templates (Figure 3) [19]. It is likely thatthese RNA templates represent a transient cache of theentire macronuclear genome during this developmentalstage.

The roles of RNA may surpass those of DNA in regulat-ing the information in the genome of Oxytricha at threelevels. At the first level, RNA transcripts of complete

383

Polα

DNAprimases

Hel

DNAligases

ATP NAD

DNA polymerase familiesRibonucleotide

reductase classes

A B CI II III D X Y

Nanoarchaeum equitansPyrobaculum aerophilumAeropyrum pernixSulfolobus

Thermoplasma

Methanosarcina

Pyrococcus

Methanococcales

Apicomplexa

Streptophyta

Saccharomycetaceae

Diptera

Caenorhabditis

GnathostomataFirmicutesChlamydiaceae

Bacteroidales

ActinobacteridaePlanctomycetaceaeLeptospira

Spirochaetaceae

Cyanobacteria

DeinococciAcidobacteria

CampylobacteralesProteobacteria subclades

Alphaproteobacteria

Archaeoglobus fulgidusHalobacterium sp. NRC-1

Methanobacterium thermoautotrophicumMethanopyrus kandleri

Giardia lambliaLeishmania majorThalassiosira pseudonana

Cyanidioschyzon merolae

Disctyostelium discoideumSchizosaccharomyces pomber

Fibrobacter succinogenesChlorobium tepidum

Fusobacterium necleatumAquifex aeolicusThermotoga maritima

Dehalococcoides ethenogenes

Desulfovibrio vulgarisGeobacter sulfurreducensBdellovibrio bacteriovorus

TRENDS in Genetics

Figure 2. A phylogenetic distribution of key enzymes involved in DNA synthesis. Unlike the protein translation system, very few features of DNA synthesis and processing are

universally conserved. Ribonucleotide reductase is an enzyme required to produce deoxyribonucleotides from ribonucleotides. It is found in three distinct classes, I, II, and III,

although ancient homology between them can be inferred from structural and mechanistic similarity. Six distinct families of DNA polymerases are known. None of the four

standard DNA polymerase families (A, B, C, and D) has a universal taxonomic distribution. DNA polymerase families X and Y are universally distributed, but impart functions

that are related to excision repair rather than DNA replication. The DNA polymerase X family catalyzes non-template-dependent DNA synthesis, while the DNA polymerase Y

family polymerizes short segments across lesions. Bacteria use an ATP-dependent DNA ligase that is unrelated to the NADH-dependent DNA ligase used by Eukarya and

Archaea. Similarly, Bacteria use a helicase associated DNA primase, whereas Archaea and Eukarya use a DNA polymerase a-associated DNA primase. The lack of a universally

distributed set of enzymes involved in DNA synthesis suggests that modern pathways were still in the process of forming during the time of the last universal common ancestor

(LUCA). Alternatively, DNA-related pathways may simply be more evolutionarily malleable than, for example, translation pathways, and this property would obscure their

ancient phylogenetic signatures. The universal phylogenetic tree was previously generated in [59] and is based on 31 universal gene sequences from 191 genomes. The tree

image was produced using the Interactive Tree of Life web server [60]. Clades representing groups of 25–40% similarity were collapsed to conserve space. Taxonomic

distribution of ribonucleotide reductase enzymes were identified from the RNR database [61]. Taxonomic distributions of DNA polymerase families, DNA ligases, and DNA

primases are extrapolated from [10,11], and do not represent a resolution capable of illustrating horizontal gene transfer. Ciliates are members of the Alveolata.


nanochromosomes from the previous generation can pro-gram the pattern of DNA rearrangements during macro-nuclear development [19]. The microinjection of syntheticRNA molecules into Oxytricha cells can introduce an

384

alternative order of micronuclear DNA segments in theresulting progeny [18,19]. These new DNA rearrangementpatterns can transfer to the sexual offspring of thoseprogeny and even their progeny’s progeny. Given that

1 2 3 4

2 4 1 3Micronuclear chromosomes

Macronuclear nanochromosomes

... ...

Old macronucleus

TranscriptiondsDNA

1 2 3 4

2 4 1 3... ...New micronucleus

2 4 1 3... ...2 4 1 3... ...2 4 1 3... ...

Developing macronucleus

2 4 1 3

(a) (b)

(c)(d)

1 2 3 4

1 2 3 4

1 2 3 4

1 2 3 41 2 3 4

Developingmacronucleus

1 2 3 4

Micronuclearmeiosis

ssRNA

TRENDS in Genetics

Figure 3. A model for development of the Oxytricha macronuclear genome

following conjugation. During conjugation (center) the micronucleus undergoes

meiosis to produce four haploid nuclei, two of which exchange between partnering

cells to form a new diploid micronucleus. During this process, the old macronucleus

degrades and a new macronucleus differentiates from one copy of the new

micronucleus [22]. The outer panels depict the process of macronuclear genome

development by DNA rearrangement. (a) Conjugation triggers transcription of old

macronuclear chromosomes into RNA. (b) The old macronucleus becomes

dismantled, while the RNA transcripts of the chromosomes are retained and

transported to the developing macronucleus. (c) The micronucleus replicates by

mitosis and one micronucleus undergoes DNA amplification to produce material for

the macronuclear genome. (d) Segments of micronuclear DNA (numbered 1–4) are

reorganized using the macronuclear transcripts as a template for RNA-guided DNA

rearrangement (including inversion of segment 3). Red bars indicate telomeres at the

ends of nanochromosomes. Orange rectangles indicate deleted micronuclear DNA

that separates DNA segments retained in the macronucleus.


the micronuclear DNA remains unchanged, the inheri-tance of altered rearrangement patterns in Oxytrichaappears to be a transgenerational RNA-mediated epige-netic phenomenon.

At the second level, point substitutions can also transferfrom the RNA template to the macronuclear DNA [19],particularly near regions where junctions form betweenmacronuclear segments. These point substitutions can alsotransfer to the sexual progeny and their progeny’s progeny.Given that the micronuclear DNA does not share thesepoint substitutions [19], this observation implicates a rolefor RNA-templated DNA repair [20] in DNA rearrange-ment. These somatically acquired point mutations repre-sent another level at which epigenetically-inherited RNAmolecules instruct the sequence and interpretation of theDNA genome.

At the third level, the RNA macronuclear genome cachealso appears to be responsible for determining the copynumber of macronuclear chromosomes. Artificially in-creasing or decreasing the available levels of RNA chro-mosome templates by microinjection or RNAi, respectively,leads to a relative increase or decrease in the copy number

of the corresponding DNA molecules in the next genera-tion. This effect also lasts at least two sexual generations[21], demonstrating a further example of RNA-mediatedtransgenerational epigenetic inheritance in Oxytricha.

Apart from its unique sequence features, ciliate micro-nuclear genomes have a normal eukaryotic structure.Their genome architecture is in the form of large chromo-somes with telomeres and a centromere, and micronucleireproduce via mitosis during cell division. During thesexual cycle the diploid genome undergoes meiosis toproduce haploid gametes, one of which is retained andthe other of which passes to the mating partner(Figure 3) [22,23].

The macronucleus is very different. The macronucleargenome contains on the order of 20 million small DNAchromosomes, or ‘nanochromosomes’, most of which encodea single protein-coding gene or functional RNA. In fact, thelack of a centromere has led some to argue that the term‘chromosome’ is inappropriate for macronuclear DNA [16].The extraordinary number of DNA molecules in the mac-ronucleus results from approximately 20 000 uniquenanochromosomes averaging roughly 1000 copies per mac-ronucleus. Their average length is approximately 2.7 kb[24]. These unusual properties of the Oxytricha macronu-clear genome and macronucleus, and the powerful role ofRNA in sculpting these genomes, offer a compelling systemwithin which to consider possible transitions from simpleRNA genomes to complex DNA genomes.

Oxytricha and early genome replicationSmall, single-gene chromosomes, such as those in theOxytricha macronucleus, represent one of the simplestpossible states of a genome and thus were probably pre-decessors to more complex genome architectures. A ge-nome of small linear chromosomes would have presentedless of a challenge to primitive polymerases [14], whichprobably copied nucleic acids with low fidelity and wereunable to process long sequences. The nature of theseprimitive DNA polymerases is unknown. None of the fourfamilies of standard DNA polymerases has a universaldistribution [11], although the sliding clamp function ofthe DnaN polymerase in E. coli and the 50–30 exonucleasefunction of the Pol1-A polymerase in E. coli appear to havebeen present in LUCA [25,26]. Three subunits of DNA-dependent RNA polymerases appear to be universal aswell [25]. Structural and functional comparisons of DNA-dependent RNA polymerases suggest that they may sharea multi-subunit ancestor with proofreading capabilitiesthat was present in LUCA [27].

It is generally assumed that the RNA-only stage in thedevelopment of the genetic system would have required anRNA-dependent RNA polymerase ribozyme to have repli-cated the genome. Although no such enzyme has beenfound in extant biology, several have been produced syn-thetically through laboratory evolution techniques [24,28–30]. So far, all of these ribozymes are over a hundrednucleotides long and exhibit very tight constraints onsequence space, making it difficult to imagine how similarribozymes could have evolved de novo in an RNA worldscenario. In addition, even the most capable of theselaboratory-generated polymerase ribozymes is not able

385


to sustain the processivity required to replicate RNAmolecules of its own size or larger.

During the process of Oxytricha genome rearrangement,segments of DNA from the micronuclear genome assembleaccording to RNA templates of the macronuclear genome(Figure 3). This process represents a unique scenario inextant biology in which a complete copy of a genome isproduced, not by polymerizing a complementary strandone nucleotide at a time, but by recycling DNA polymersfrom a precursor genome. It is likely that these pieces ofmicronuclear DNA ligate together after assembling on thecomplementary RNA template, although there is also evi-dence that gaps or errors between the DNA segments arerepaired by the activity of an RNA-dependent DNA poly-merase [19].

A similar mode of replication would have conferredseveral benefits to early life and perhaps created a viableselection regime in which polymerases with high fidelityand processivity might have evolved. In contrast to ribo-zyme polymerases, several ribozyme ligases are present inmodern organisms [3] and more have been synthesized bydirected evolution [31,32]. Polymerases are in fact a spe-cialized kind of ligase in which one of the ligated partners isa single nucleotide [31]. It follows, then, that the centralchallenge to a polymerase is not the catalytic step ofligation, but the ability to perform that step repeatedlyover the full length of a gene-sized molecule, a limitationthat is borne out by the difficulty of producing a highlyprocessive ribozyme polymerase [24,29,30].

If early nanochromosomes replicated in an Oxytricha-like fashion, the number of catalytic ligation steps would bemuch smaller than that in a complete polymerase-depen-dent replication. The source of these DNA segments in aprimitive system is not clear. Perhaps if the GC% was veryhigh or very low, the sequence complexity of the nanochro-mosomes would also be low, and short abiotically synthe-sized segments with random sequences [33,34] wouldprovide enough matches to the template to permit assem-bly of most of the genome from these small, modular pieces[35]. The need to fill or repair small gaps between segmentswould create a selective environment for the evolution of aweakly processive polymerase into the ancestor of a mod-ern, highly processive polymerase. This model of earlygenome replication is consistent with the observation thatthe only universally conserved DNA polymerase familiesare involved in excision repair (Figure 2). Once a high-fidelity, high-processivity polymerase became available,genome replication could move towards its current poly-merase-dependent form and longer chromosome lengthswould be possible.

Oxytricha and early cell divisionIn most Eukaryotes, cell division is orchestrated by thecomplex process of mitosis, wherein duplicate chromo-somes segregate evenly between the dividing cells. Theprocess is controlled by dynamic motor complexes that pullchromosomes along organized microtubules [36,37]. Func-tionally analogous but non-homologous processes arethought to take place in Bacteria [38–40] and Archaea[41,42]. It is difficult to imagine that such a complex systemwas present in early life forms. Early cell division probably

386

involved uncontrolled membrane division with chromo-somes segregating at random.

Similarly, the Oxytricha macronucleus does not divideby way of mitosis. The approximately 20 million nanochro-mosome molecules probably present an overwhelmingchallenge to organized mitotic segregation. Although ami-tosis in Oxytricha is microtubule-dependent [43,44], thesemicrotubules appear to control membrane division ratherthan chromosome segregation. Macronuclear nanochromo-somes lack centromeres to which mitotic motors, or kine-tochores, would normally attach [16,45]. As a result, thesegregation of DNA between macronuclei is unpredictableand often uneven [16,46]. Amitotic division of macronucleiseems to have arisen early in the ciliates [47] althoughprevious phylogenies have predicted three independentorigins of amitosis in ciliates, with one origin in the com-mon lineage of the genera, Oxytricha and Euplotes [48].

It is possible that the high chromosome copy-numbersobserved in Oxytricha, and to an equal or lesser extent inother ciliates, are related to the imprecise segregation ofchromosomes during amitotic division [49]. If a singlechromosome is duplicated and the two copies are allowedto segregate randomly to one of the two daughter cells,then the probability of losing that gene in one of thedaughter cells is 0.5. A greater number of chromosomeswill statistically ensure an approximately even segrega-tion of the chromosomes between daughter cells. Thisfeature of amitosis in Oxytricha may be similar to thedivision of primitive cells, which would have also benefitedfrom carrying chromosomes in high copy-numbers to safe-guard against uneven segregation.

Oxytricha and early genome stabilityA single common ancestor of all life is the most statisticallysatisfying explanation for common traits observed in mod-ern organisms [50]. This explanation, however, does notdistinguish between a single organism and a communityof organisms with highly pervasive lateral gene transfer[14]. Even if we assume the former scenario, the complexityof a single LUCA organism may have been generated in partby lateral gene transfer within a heterogeneous populationof organisms [26]. If early genomes did indeed resembleOxytricha macronuclear genomes, then the nature of theRNA-mediated gene transfer observed in Oxytricha [19,23]may also help describe the sort of communal inheritancethat preceded the predominantly vertical inheritance ofmodern organisms.

The nanochromosome structure of the macronucleargenome and its regeneration through RNA-template-di-rected DNA unscrambling provide a form of lateral genetransfer that differs from mechanisms described in anyother organisms. Unlike conventional conjugation in bac-teria or sexual reproduction in eukaryotes, an RNA-drivenepigenetic mode of inheritance does not require the intro-duction of new genes, but instead new ‘alleles’ can spreadvia conversion of existing ones (through RNA-guided mech-anisms). Allele frequencies can be increased or decreasedby the introduction of foreign nucleic acids, and theseacquired traits are passed on to subsequent generations.

These phenomena are similar to horizontal gene trans-fer, in that somatic DNA or RNA variants provide an


external source of genetic variation. But the nanochromo-some structure of the macronuclear genome and its capac-ity to receive new alleles during the process of DNAarrangement make the Oxytricha macronucleus uniquelypermissive to somatically acquired genetic change. Never-theless, an epigenetic system such as that of Oxytricha isalso robust to such perturbations because the high copy-number of original alleles will initially act as a bufferagainst sequence change, restricting the spread of delete-rious somatic alterations. Perhaps early genomes withstructures similar to the Oxytricha macronucleus wouldalso be permissive to genetic acquisitions, but stableagainst their deleterious effects.

Oxytricha and early organismal identityThe genetic openness that existed during the transition tomodern life was probably also prone to invasions by selfishreplicators that may have easily infiltrated and takenadvantage of emerging organismal replicating systems[51]. This effect is generally modeled through self-propa-gating metabolism-like networks, or hypercycles. Thesereplicating entities may be parasitic if they either receivereplication support from the host system without confer-ring a reciprocal benefit, or shortcut the host system insome deleterious way. Vesicles can barricade replicatingsystems against selfish entities if they provide a mecha-nism of blocking the entry of external replicators [52].Selfish replicators can also be eventually incorporated intothe metabolism of the host system, balancing their delete-rious effects with beneficial ones [53].

Although the dynamics of nuclear dimorphism in Oxy-tricha do not resemble a hypercycle, the scrambling of themicronucleus and its rearrangement to form the macronu-clear genome illustrate the properties of stable systemsthat host selfish replicators. The unique genomic traits ofOxytricha seem to be both caused by and assisted by aninvasion of DNA transposons (typically regarded as selfishgenetic agents). The micronucleus hosts thousands oftransposons, which probably contributed to the scramblingof its genome, either through actual transposition or viaectopic recombination between transposons of the samefamily. Unlike domesticated transposases in other eukar-yotes, micronuclear transposons display evidence of puri-fying selection acting on their encoded proteins [23,54] andmay still be active outside the control of the host cell. Thepresence of active transposons in the micronucleus mayhave provided the selective pressure for acquisition of atemplate-directed genome unscrambling system as part ofmacronuclear development [55] as a mechanism for pro-moting the long-term stability of the genome and robust-ness to perturbations.

Recent discoveries reveal that micronuclear transpo-sons play a surprisingly direct role in both macronucleardevelopment and genome rearrangement [23]. Micronucle-us-limited transposase genes are expressed during macro-nuclear development, but silent during vegetative growth.The experimental silencing of these transposases by RNAiresults in aberrant unscrambling patterns in the macro-nuclear genome, suggesting that transposons play an ac-tive role in genome rearrangement. It is possible thatthe nanochromosome templates are composed of RNA to

protect the developing macronucleus from the integrationof active transposons. In this regard, Oxytricha seems tohave avoided the deleterious effects of internal transposonactivity through template-directed genome rearrangementthat, itself, employs the transposon proteins. Thus, theproperties of nuclear dimorphism and template-directedmacronuclear development in Oxytricha demonstrate theprinciples of spatial separation and metabolic incorpo-ration that are thought to make early replicating systemsresistant to selfish replicators.

Concluding remarksHere, we have discussed the nuclear dimorphism andgenome structures of Oxytricha to demonstrate severalplausible dynamics of early genetic systems during thetransition to modern genomes. Oxytricha is not by anymeans a ‘living fossil’, given that its phylum, Ciliophora, isboth eukaryotic and not particularly deep branching. How-ever, by analogy we have used Oxytricha to introduceseveral new hypotheses about early genomes. We invokethe process of template-directed genome rearrangement inOxytricha to model an evolutionary landscape in whichprotein polymerases could evolve gradually from ligases.We have also observed that the dynamics of Oxytrichaamitotic macronuclear division suggest that unmanagedcell division in early life could be viable if hereditarymolecules were present in high copy-numbers. Finally,we employed observations of lateral gene transfer andactive transposon mediation in Oxytricha to improve ourunderstanding of the consequences of genome instabilityfor early life. Although the particular genomic traits thatwe discuss are unique to Oxytricha and closely relatedgenera, we encourage the further exploration of extantorganisms, particularly those with atypical genetic sys-tems [56], to help elucidate features of early cellular life.

AcknowledgmentsWe thank members of the Landweber laboratory for critical discussions ofthis work. This work was supported by a National Aeronautics and SpaceAdministration Postdoctoral Program fellowship to A.D.G. and byNational Institutes of Health grant GM59708 and National ScienceFoundation grant 0923810 to L.F.L.

References1 Gilbert, W. (1986) The RNA world. Nature 319, 6182 Gesteland, R. and Atkins, J.F., eds (1993) The RNA World, Cold

Spring Harbor Laboratory Press3 Landweber, L. et al. (1998) Ribozyme engineering and early evolution.

Bioscience 48, 94–1034 Fox, G. (2010) Origin and evolution of the ribosome. Cold Spring Harb.

Perspect. Biol. 2, a0034835 White, H. (1976) Coenzymes as fossils of an earlier metabolic state. J.

Mol. Evol. 7, 101–1046 Goldman, A. et al. (2012) Evolution of the protein repertoire. In

Encyclopedia of Molecular Cell Biology and Molecular Medicine(Meyers, R.A., ed.), Wiley-VCH

7 Freeland, S. et al. (1999) Do proteins predate DNA? Science 286, 690–6928 Goldman, A. et al. (2010) The evolution and functional repertoire of

translation proteins following the origin of life. Biol. Direct 5, 159 Torrents, E. et al. (2002) Ribonucleotide reductases: divergent

evolution of an ancient enzyme. J. Mol. Evol. 55, 138–15210 Forterre, P. (2002) The origin of DNA genomes and DNA replication

proteins. Curr. Opin. Microbiol. 5, 525–53211 Filee, J. et al. (2002) Evolution of DNA polymerase families: evidences

for multiple gene exchange between cellular and viral proteins. J. Mol.Evol. 54, 763–773

387


12 Koonin, E. (2003) Comparative genomics, minimal gene-sets and thelast universal common ancestor. Nat. Rev. Microbiol. 1, 127–136

13 Forterre, P. (2006) Three RNA cells for ribosomal lineages and threeDNA viruses to replicate their genomes: a hypothesis for the origin ofcellular domain. Proc. Nat. Acad. Sci. U.S.A. 103, 3669–3674

14 Woese, C. (1998) The universal ancestor. Proc. Natl. Acad. Sci. U.S.A.95, 6854–6859

15 Zoller, S. et al. (2012) Characterization and taxonomic validity of theciliate Oxytricha trifallax (Class Spirotrichea) based on multiple genesequences: limitations in identifying genera solely by morphology.Protist DOI: 10.1016/j.protis.2011.12.006

16 Prescott, D. (1994) The DNA of ciliated protozoa. Microbiol. Rev. 58,233–267

17 Prescott, D. (2000) Genome gymnastics: unique modes of DNAevolution and processing in ciliates. Nat. Rev. Genet. 1, 191–198

18 Nowacki, M. et al. (2011) RNA-mediated epigenetic programming ofgenome rearrangements. Annu. Rev. Genomics Hum. Genet. 12, 367–389

19 Nowacki, M. et al. (2008) RNA-mediated epigenetic programming of agenome-rearrangement pathway. Nature 451, 153–158

20 Storici, F. et al. (2007) RNA-templated DNA repair. Nature 447, 338–341

21 Nowacki, M. et al. (2010) RNA-mediated epigenetic regulation of DNAcopy number. Proc. Natl. Acad. Sci. U.S.A. 107, 22140–22144

22 Nowacki, M. and Landweber, L.F. (2009) Epigenetic inheritance inciliates. Curr. Opin. Microbiol. 12, 638–643

23 Nowacki, M. et al. (2009) A functional role for transposases in a largeeukaryotic genome. Science 324, 935–938

24 Green, R. and Szostak, J.W. (1992) Selection of a ribozyme thatfunctions as a superior template in a self-copying reaction. Science258, 1910–1915

25 Harris, J. et al. (2003) The genetic core of the universal ancestor.Genome Res. 13, 407–412

26 Becerra, A. et al. (2007) The very early stages of biological evolution andthe nature of the last common ancestor of the three major cell domains.Annu. Rev. Ecol. Evol. Syst. 38, 361–379

27 Poole, A. and Logan, D.T. (2005) Modern mRNA proofreading andrepair: clues that the last universal common ancestor possessed anRNA genome? Mol. Biol. Evol. 22, 1444–1455

28 Doudna, J. et al. (1991) A multisubunit ribozyme that is a catalyst ofand template for complementary strand RNA synthesis. Science 251,1605–1608

29 Johnston, W. et al. (2001) RNA-catalyzed RNA polymerization:accurate and general RNA-templated primer extension. Science 292,1319–1325

30 Wochner, A. et al. (2011) Ribozyme-catalyzed transcription of an activeribozyme. Science 332, 209–212

31 Bartel, D. and Szostak, J.W. (1993) Isolation of new ribozymes from alarge pool of random sequences. Science 261, 1411–1418

32 Landweber, L. and Pokrovskaya, I.D. (1999) Emergence of a dualcatalytic RNA with metal specific cleavage and ligase activities: thespandrels of RNA evolution. Proc. Natl. Acad. Sci. U.S.A. 96, 173–178

33 Huang, W. and Ferris, J.P. (2006) One-step, regioselective synthesis ofup to 50-mers of RNA oligomers by montmorillonite catalysis. J. Am.Chem. Soc. 128, 8914–8919

34 Aldersley, M. et al. (2009) RNA synthesis by mineral catalysis. Orig.Life Evol. Biosph. 39, 200

35 Kotler, L. et al. (1993) DNA sequencing: modular primers assembledfrom a library of hexamers or pentamers. Proc. Natl. Acad. Sci. U.S.A.90, 4241–4245

388

36 Sharp, D. et al. (2000) Microtubule motors in mitosis. Nature 407, 41–47

37 Maiato, H. et al. (2004) The dynamic kinetochore-microtubuleinterface. J. Cell Sci. 117, 5461–5477

38 Fogel, M. and Waldor, M.K. (2006) A dynamic, mitotic-like mechanismfor bacterial chromosome segregation. Genes Dev. 20, 3269–3282

39 Ptacin, J. et al. (2010) A spindle-like apparatus guides bacterialchromosome segregation. Nat. Cell Biol. 12, 791–798

40 Draper, G. and Gober, J.W. (2002) Bacterial chromosome segregation.Annu. Rev. Microbiol. 56, 567–597

41 Lundgren, M. and Bernander, R. (2007) Genome-wide transcriptionmap of an archaeal cell cycle. Proc. Natl. Acad. Sci. U.S.A. 104, 2939–2944

42 Cortez, D. et al. (2010) Evidence for a Xer/dif system for chromosomeresolution in Archaea. PLoS Genet. 6, e1001166

43 Tucker, B. et al. (1980) Microtubules and control of macronuclear‘amitosis’ in Paramecium. J. Cell Sci. 44, 135–151

44 Kushida, Y. et al. (2010) Amitosis requires gamma-tubulin-mediatedmicrotubule assembly in Tetrahymena thermophila. Cytoskeleton 68,89–96

45 Jung, S. et al. (2011) Exploiting Oxytricha trifallax nanochromosomesto screen for non-coding RNA genes. Nucleic Acids Res. 39, 7529–7547

46 Witt, P. (1977) Unequal distribution of DNA in the macronucleardivision of the ciliate Euplotes eurystomus. Chromosoma 60, 59–67

47 Katz, L. (2001) Evolution of nuclear dualism in ciliates: a reanalysis inlight of recent molecular data. Int. J. System. Evol. Microbiol. 51, 1587–1592

48 Orias, E. (1991) Evolution of amitosis of the ciliate macro-nucleus: gainof the capacity to divide. J. Protozool. 38, 217–221

49 Duerra, H. et al. (2004) Modeling senescence in hypotrichous ciliates.Protist 155, 45–52

50 Theobald, D. (2010) A formal test of the theory of universal commonancestry. Nature 465, 219–222

51 Smith, S. (2003) Nucleoprotein assemblies. Encycl. Nanosci. Nanotech.X, 1–10

52 Eigen, M. et al. (1981) The origin of genetic information. Sci. Am. 244,88–92

53 Konnyu, B. et al. (2008) Prebiotic replicase evolution in a surface-boundmetabolic system: parasites as a source of adaptive evolution. BMCEvol. Biol. 8, 267

54 Doak, T. et al. (1994) A proposed superfamily of transposase genes:transposon-like elements in ciliated protozoa and a common ‘D35E’motif. Proc. Natl. Acad. Sci. U.S.A. 91, 942–946

55 Klobutcher, L. and Herrick, G. (1997) Developmental genomereorganization in ciliated protozoa: the transposon link. Prog.Nucleic Acid Res. Mol. Biol. 56, 1–62

56 Reyes-Prieto, F. et al. (2012) Coenzymes, viruses and the RNA world.Biochimie DOI: 10.1016/j.biochi.2012.01.004

57 Cech, T. (2000) The ribosome is a ribozyme. Science 289, 878–87958 Hsiao, C. et al. (2009) Peeling the onion: ribosomes are ancient

molecular fossils. Mol. Biol. Evol. 26, 2415–242559 Ciccarelli, F.D. et al. (2006) Toward automatic reconstruction of a

highly resolved tree of life. Science 311, 1283–128760 Letunic, I. and Bork, P. (2011) Interactive Tree Of Life v2: online

annotation and display of phylogenetic trees made easy. Nucleic AcidsRes. 39, W475–W478

61 Lundin, D. et al. (2009) RNRdb, a curated database of the universalenzyme family ribonucleotide reductase, reveals a high level ofmisannotation in sequences deposited to Genbank. BMC Genomics10, 589

http://dx.doi.org/10.1016/j.protis.2011.12.006

http://dx.doi.org/10.1016/j.biochi.2012.01.004

Replication timing and its emergencefrom stochastic processesJohn Bechhoefer1 and Nicholas Rhind2

1 Department of Physics, Simon Fraser University, Burnaby, BC, V5A 1S6, Canada2 Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester,

MA 01605, USA

Review

The temporal organization of DNA replication has puz-zled cell biologists since before the mechanism of repli-cation was understood. The realization that replicationtiming correlates with important features, such as tran-scription, chromatin structure and genome evolution,and is misregulated in cancer and aging has only deep-ened the fascination. Many ideas about replication tim-ing have been proposed, but most have been short onmechanistic detail. However, recent work has begun toelucidate basic principles of replication timing. In partic-ular, mathematical modeling of replication kinetics inseveral systems has shown that the reproducible repli-cation timing patterns seen in population studies can beexplained by stochastic origin firing at the single-celllevel. This work suggests that replication timing neednot be controlled by a hierarchical mechanism thatimposes replication timing from a central regulator,but instead results from simple rules that affect individ-ual origins.

Replication origins: correlated or independent?The duplication of the genome of a cell by DNA replicationis an essential step in the cell cycle. In bacteria, the overallsituation is straightforward, in that DNA replication initi-ates at a single, well-defined location in the genome (e.g.the oriC site in Escherichia coli) and terminates at asecond, well-defined region (ter in E. coli) [1]. Eukaryoticorganisms, with 10–1000 times more DNA and with 10–100 times slower replication forks, depend on the firing ofmultiple origins of replication along the DNA. These ori-gins are defined by a two-step process [2]. Licensing, thefirst step, occurs in G1 phase, when the origin recognitioncomplex (ORC) binds to chromatin and, with the aid ofCdc6 and Cdt1, loads onto the DNA head-to-head pairs ofthe barrel-shaped heterohexameric MCM complex, thecatalytic core of the replicative helicase [3,4]. Each pairof MCM complexes is a potential origin of DNA replication.Initiation (or origin firing), the second step, occurs in Sphase, when a pair of MCMs is activated via a complexprocess involving numerous proteins, including recruit-ment of Sld2, Sld3, the GINS complex and Cdc45, as wellas the phosphorylation of various components by the CDKand DDK replication kinases [5]. The regulation of thespatial binding of the ORC and the temporal activation

Corresponding author: Bechhoefer, J. ([email protected]).Keywords: DNA replication timing; stochastic models; replication initiation; ORC;MCM

374 0168-9525/$ – see front matter � 2012 Elsevier Ltd. All rights reserved. h

of MCMs largely determines the kinetics of replicationduring S phase, which is referred to as the replicationprogram.

The question of how replication programs are regulatedis an active, and sometimes controversial, field. Althoughthe specific mechanisms that control timing are still ob-scure, recent work has revealed basic principles that ap-pear to apply to eukaryotic replication in general. Inparticular, mathematical modeling of genome-wide repli-cation timing data shows that replication timing can beexplained by stochastic mechanisms. The significance ofthis conclusion is that it explains the regulation of replica-tion timing in terms of simple rules that affect the individ-ual probabilities of origin firing. In such models, replicationtiming is controlled by changing the firing rate of individ-ual origins, instead of by directly regulating the time atwhich origins fire. Although this distinction may seemsemantic, it is important because it recasts black-boxmechanisms of global replication timing in terms of bio-chemically plausible effects on individual origins.

Over the past decade, two views about replication tim-ing mechanisms have been developed. In the first, originfiring is a stochastic event that is (largely) independent ofthe replication state of neighboring origins. In particular, ithas been postulated that there is an initiation functionI(x,t) that describes the rate of initiation, per time and perlength of unreplicated DNA, of a site x along the genome attime t after the beginning of S phase [6,7] (Figure 1; Box 1).This type of origin firing can manifest in at least threedifferent ways, depending on the experimental model con-sidered. In species such as budding yeast, in which repli-cation initiates at well-defined loci, the function I(x,t) formsa discrete spike at the replication origin [8] (Figure 1a). Atthe other end of the spectrum, amphibian embryos lackorigin specificity, and DNA replication can initiate any-where along the genome [6]. In an intermediate case,mammalian somatic cells can display clusters of originsor broad initiation zones that are not homogeneouslydistributed throughout the genome [9–11] (Figure 1b).Each of these three cases is discussed in detail below.We refer to the hypothesis of a locally determined initiationrate as the independent origin hypothesis because it isdistinguished by the feature that origins fire independent-ly from the firing of neighboring origins. The attraction ofthe independent origin hypothesis is its simplicity: onedoes not need to postulate biological mechanisms thatwould cause correlated initiations. The potential weakness

ttp://dx.doi.org/10.1016/j.tig.2012.03.011 Trends in Genetics, August 2012, Vol. 28, No. 8



Yeast Metazoan

f(t)

f(t)

f(x) f(x)

I(x) I(x)

I(t)

I(t)

f(x,t)

I(x,t)

0 0

1

1

Genome position

Tim

e

Genome position

Tim

eT

ime

Tim

eIn

itiat

ion

rate

# /k

b /ti

me

Genome position

1

0Rep

licat

ion

frac

.f(x)

I(x)

1

0

Rep

licat

ion

frac

.In

itiat

i on

rate

# /k

b /ti

me

Genome position

I(t)

f(t)(a)

(c)

(b)

TRENDS in Genetics

Figure 1. Replication fractions and initiation rates. (a,b) The relation between

replication fractions f and initiation rates I, as illustrated for budding yeast. (a)

Spatially resolved data, averaged over an asynchronous cell population. (b) Time

course data, averaged over the genome. (c) Illustration of typical replication timing

data for budding yeast (left) and a metazoan organism (right). Top-left image

shows the replication fraction f(x,t), as it might be inferred from a microarray

timing experiment with several time points of data from synchronized cell

populations. Black represents low-replication levels and white represents high-

replication levels. Averaging the replication fraction over the genome gives the

curve f(t), depicted to the left of the f(x,t) image, which goes from 0 to 1. Averaging

the replication fraction over time, as in an experiment on asynchronous cell

populations, gives the curve above the f(x,t) image. The bottom-right group shows

the inferred I(x,t) image, as well as the averaged curves I(t) and I(x). Note that, in

budding yeast, replication origins are well localized, as indicated by the spikes in

the function I(x). [When viewed or printed at low resolution, not all spikes in I(x,t)

may be visible.] The right-hand groups illustrate similar concepts for a typical

metazoan organism. The main difference is that origins are not well localized, so

that the function I(x) has broad features, representing zones where initiations are

more or less likely to occur.


of the independent origin hypothesis is that, if too simple, itmay fail to describe experiments accurately or that im-plausible coincidences of parameters may be required to fitthe data.

In a second scenario, the initiation of an origin, al-though still stochastic, is linked to the state of the genomein its vicinity. For example, observations of origin cluster-ing [12–14] have led several authors to hypothesize thatthe presence of a replication fork can increase the firingrate of nearby origins, for instance, the ‘next-in-line’ model[15] and the ‘domino-cascade’ model [16,17]. We refer tothis second scenario in general as the correlated originhypothesis.

Previously, there was considerable debate as to whetherreplication was stochastic and whether origins are inde-pendent. At present, it is generally accepted that all models

of replication are stochastic at the level of molecular inter-actions. It is important to note that stochastic models do notrequire that origins all fire with the same probability, nor isstochastic firing incompatible with late firing origin [18].However, there is evidence in some cases for correlation inorigin initiation activity. As a result, the current picture isan intermediate one that mixes both stochastic elementsand mechanisms for correlations in origin initiation [15,19].Still, differences remain concerning what is essential andwhat is incidental in the above picture and what kind ofunderlying mechanisms are likely to be important in con-trolling the replication program. In this review, we arguethat, for the simpler cases such as unicellular yeast and forthe embryonic cells of some multicellular animals, recentexperiments and modeling efforts have shown that much ofthe available replication data may be understood in terms ofthe simpler independent origin hypothesis and that correla-tions probably play a minor role in the replication program.Replication in the somatic cells of metazoan organisms ismore complex, and we outline recent efforts in this area.

Replication in yeastThe past few years have marked a turning point in theunderstanding of replication in yeast. First came a seriesof high-resolution combing and microarray experiments(Box 2). For example, high-resolution timing data ofsynchronized populations of wild type and clb5D Saccharo-myces cerevisiae show clear average timing patterns [20].Their measurements, as mentioned in Box 2, amounted tomeasurements of f(x,t), with spatial information resolved toa few kilobases and temporal information resolved to 5 min.At around the same time, DNA combing studies in buddingand fission yeast showed that initiations at the single-mole-cule scale are stochastic, with different sets of origins chosenin each cell cycle [21,22]. Indeed, in budding yeast, it is nowclear that there are as many as 700 potential origin sites, ofwhich only approximately 200 are used in any given cycle.

In parallel work, the rate of origin firing in budding andfission yeast was shown to be regulated by competition forlimiting activators, such as the Cdc45 initiation factor andthe DDK initiation kinase [23–26]. Competition for limit-ing activators provides an explanation for why origin firingis less efficient than might be possible. The stochasticinteraction between origins and diffusible activators alsoprovides a mechanism for stochastic firing of origins.

The stochasticity of individual origins turns out to be animportant effect. In contrast to earlier models, in which thefiring of specific origins was envisaged to be limited tonarrow windows of S phase, it is now clear that the width ofthe firing-time distribution for an individual origin can be asubstantial fraction of S phase. Indeed, models that fail toincorporate the width of the timing distribution fail toreproduce many of the experimental details adequately[27]. By contrast, stochastic models that take into accountthe width of the firing-time distribution can successfully fitthe microarray data [8,28,29]. Several notable insights andresults come from these analyses: first, it is possible togenerate models with independent initiation scenarios[initiation rate I(x,t) and constant fork velocity v] that leadto good fits of the data. This result shows that the indepen-dent origin hypothesis suffices to explain microarray data

375

Box 1. f and I: mathematical functions that describe

replication kinetics

DNA replication kinetics can be described using two related but

distinct mathematical functions: the replication fraction f and the

initiation rate I. The first, f, is a complete description of replication

kinetics and can be directly determined from experimental data (Box

2). The second, I, only describes the kinetics of origin initiation and

cannot be directly measured; it must be inferred from f. However, if

fork rates are assumed to be nearly constant, as is frequently done

in models of replication kinetics, then I is sufficient to completely

determine f. Both f and I can be defined for every spatial point (x) in

the genome and every time point (t) in S phase, to give f(x,t) and

I(x,t) (Figure 1c, main text).

It is often useful to consider the time-averaged functions, f(x) and

I(x) (Figure 1a, main text). f(x) can be thought of as the average

replication time of each point in the genome and is generally

measured on asynchronous populations of cells. It is closely related

to the median replication time trep at a site that is inferred from time

course data on synchronized cell populations. The peaks in f(x)

represent the origins, and taller peaks indicate origins that fire, on

average, earlier in S phase. I(x) represents the average initiation rate

of each point in the genome. In yeast, where origins are well

defined, I(x) = 0 for most of the genome and forms spikes over the

origins, with taller spikes reflecting a higher average probability of

origin firing (Figure 1a,c). In metazoans, origins appear to be more

diffuse, and thus so is I(x) (Figure 1c). It is important to realize that

the height of the peaks in I(x) (e.g. the average firing probability of

an origin) cannot be directly inferred from the height of the peaks in

f(x), because f(x) convolves both passive replication and active firing

of each origin; I(x) can only be extracted by mathematical modeling

of f(x).

It can also be useful to consider the spatially averaged functions,

f(t) and I(t) (Figure 1b, main text). The replication fraction f(t) is

generally sigmoidal, as cells go from unreplicated in G1 to

replicated in G2. The exact shape of the sigmoid depends on the

details of the replication program, such as of the distribution of

origins and the shape of I(t). As discussed in the main text, I(t) has

been proposed to generally increase for most of S phase and then

decline in late S phase.


on replication timing in yeast. Second, the intrinsic param-eters characterizing each origin have values that are inde-pendent of their neighbors, again suggesting that theinitiation of each origin is an independent stochastic event[8]. Studies in fission yeast have also led to the conclusionthat local initiation models suffice to explain the availableexperimental data [30,31]. However, several biologicallydifferent scenarios can lead to similar overall timing pat-terns [32], and more complicated mechanisms, such astrans-acting regulators of origin activity and chromosomestructure, can affect origin timing [33,34]. Clearly, furtheriterations of modeling and experiment will be needed tocome to a final picture.

Replication in embryosEmbryonic cells in metazoans represent an interestingintermediate case of complexity. On the one hand, theyhave the full amount of DNA of somatic cells. On the otherhand, they undergo a rapid, simplified cell cycle that islargely transcriptionally silent, which removes one majorsource of complication in the replication of somatic cells. Invitro studies of Xenopus cell-free extracts have been espe-cially detailed and fruitful [13,35–37] and have led toassociated modeling efforts [6,7,19,38,39]. The replicationprogram in Xenopus embryos is relatively simple and muchfaster than in somatic cells. In particular, there are no fixed

376

origin sites, presumably because the lack of transcriptionand more uniform chromatin structure allows the ORC toload MCM anywhere in the genome [40,41]. Althoughvariation in initiation rates and, hence, replication timingdoes occur at the megabase scale [37], modeling efforts todate have focused on understanding the temporal varia-tion of the initiation rate, I(t), averaged over the genome.The main conclusion is that the initiation rate increasesover most of S phase, before decreasing to zero near the endof it. This variation of initiation rates over S phase issignificant because it leads to a relatively narrow distribu-tion of lengths for S phase, which, because of the stochas-ticity of origin placement and initiation time, varies witheach cell cycle [42]. In embryos, it is particularly importantthat there be little variation in genome duplication time, asthe cell cycle lacks checkpoints that can delay the start ofmitosis if replication is not complete. In Xenopus embryos,for example, the typical S phase duration is 20 min andthat of mitosis is 5 min, all within a 25-min cell cycle [43].Thus, variations of S phase of more than 5 min can belethal for a cell. Such variations are proposed to be sup-pressed by the increasing nature of initiation rate I(t). Ithas even been postulated that the increasing form of I(t) isa universal characteristic of eukaryotic replication [44].Preliminary assessment of replication data from S. cerevi-siae, Schizosaccharomyces pombe, Drosophila melanoga-ster and Homo sapiens supports this scenario, althoughbetter data and more extensive analysis are required. Theinitial increase of initiation rate I(t) has been attributed tocompetition for a limiting factor required for replicationfork function [35,38] or origin firing (e.g. the DDK replica-tion kinase [23]), whereas the decrease of I(t) at the end of Sphase has been variously attributed to a fork-dependentcontrol mechanism [38] or to increasing diffusion searchtimes for the limiting factor to find its target [39].

Replication in metazoan somatic cellsThe replication of DNA in metazoan germline and somaticcells is more complicated than in embryonic cells. Replica-tion in somatic cells can take up to 100 times longer than inembryonic cells [45], and this increase in replication time isnot spread equally across the genome. Instead, differentregions of the genome replicate at characteristic timesduring the elongated S phase, and the replication timingof a locus correlates with several other important chromo-somal characteristics. The best-established correlation isbetween late replication and constitutive heterochromatin,the repetitive, transcriptionally inactive regions of thegenome that remain condensed throughout the cell cycle[46]. Conversely, gene-rich, transcriptionally activeregions of the genome tend to replicate earlier in S phase[47]. The correlation between the transcriptional activity ofindividual genes and their replication timing is not strong[48]. However, when averaged over large groups of neigh-boring genes, transcriptional activity correlates well withreplication timing [49,50]. An even more remarkable cor-relation is seen between chromosome interaction maps andreplication timing [51,52]. The contiguous regions of thegenome that replicate with similar timing are referred toas replication domains. The correlations between the av-erage transcriptional activity, chromatin interactions and

Box 2. Experimental techniques for analyzing DNA

replication timing

The recent gains in our understanding of replication timing are built

on experimental advances that have greatly increased the quality

and quantity of data available. Defined patterns of DNA replication

were first observed in fiber autoradiography studies of tritiated

thymidine incorporation in bacterial and mammalian cells [12,79].

By in vivo pulse-labeling cells with tritiated thymidine and then

stretching the labeled DNA on a photosensitive film, it was possible

to map replication patterns (which regions have replicated and

which have not) at a given time. A significant technical improvement

was the substitution of fluorescently labeled thymine analogs, such

as BrdU, that could be observed using an optical microscope [80,81].

Molecular combing, which stretches DNA more controllably,

improved the latter technique by allowing one to more reliably

associate positions on an image of a stretched fiber with genomic

positions and by simplifying the identification of individual fibers

taken anonymously from the genome [82,83] or with the genome

location identified [54,84]. In parallel with fiber-based techniques,

live-cell imaging has also yielded much valuable information.

Although the size of origins and even their separations are well

below the resolution of conventional light microscopy, clever

techniques can yield spatial and temporal information. For example,

specific sites can be labeled with fusion proteins whose intensity

doubles after replication, an event that can readily be observed [85].

In the future, ‘live’ single-molecule studies based on flow and

optical or magnetic tweezers [86], nano-engineered capillaries

[87,88] and other molecular-scale structures may lead to even

greater insights, especially into local mechanisms at the fork and

initiation sites.

A second set of techniques provides information about the

fraction of cells in a population that has replicated at a particular

location x and time t. This fraction of replicated cells can be

described by the function f(x,t), if replication kinetics throughout S

phase are measured, or simply as f(x), if measurements are

performed on asynchronous cell populations (Figure 1, main text).

Such measurements originally used microarrays [89,90], with one

approach based on local changes in copy number during replica-

tion. In a population of unreplicated cells, a baseline intensity is

measured at each locus [f(x) = 0]. After all cells have replicated, the

measured intensity at each locus should be double [f(x) = 1]. During

replication, intermediate levels of replication are detected as

intermediate intensity levels [0<f(x)<1]. For example, if half of the

cells in the population have replicated at a location x, then f(x) = 0.5.

More recently, direct sequencing to determine local DNA copy

number has given similar information with fewer artifacts [91,92].

Initial studies used multiple time points in cultures of synchronized

cells to directly measure f(x,t) [89,93], and this approach is still the

state of the art in yeast [20,64]. However, comparable results can be

derived by sorting asynchronous cells of any type into G1 and S

populations [90].


the replication timing of replication domains has led toqualitative models in which the chromosome accessibilityof a domain affects its replication timing [53].

Although replication domains replicate with reproduc-ible timing, origin firing within domains is heterogeneousbecause of stochastic origin firing [10,54,55]. As in yeast,origin firing in metazoans appears to be regulated bylimiting activators. Mammalian Cdc45 is substoichio-metric, relative to OCR and MCM, and increasingCdc45 levels increases the rate of origin firing [56]. More-over, modulating the levels of the CDK replication kinaseaffects the efficiency of origin firing [57–59]. An additionalreason for the heterogeneity of origin firing in metazoansis that metazoan origins are not well-defined loci; at leastin some cases, MCM seems to be loaded heterogeneouslythroughout a region [60–62], which can be thought of as a

cluster of many inefficient origins or as a diffuse initiationzone.

Mechanisms for timingAlthough replication timing appears to be uniform andwell coordinated at the population level, this averagebehavior hides heterogeneous replication kinetics in indi-vidual cells. This apparent conflict between heterogeneityat the single-cell level and organization at the populationlevel is resolved by observing that the average of theheterogeneous single-cell data recapitulates the resultsfrom ensemble studies [22]. This observation has led tomodels in which the average replication time of a locus is afunction of the firing probability of individual origins,regardless of whether those probabilities are independentor coordinated (Box 3). Such models predict a correlationbetween the probability and timing of origin firing, acorrelation seen in budding yeast [8,28]. Furthermore,recent budding yeast studies have shown that, in mostcases in which the length of S phase is significantly in-creased, the relative timing program is maintained [63,64];that is, the overall ordering of replication timing of differ-ent regions is preserved, even as the scale of timing isaltered. Such a result would be expected if S phase lengthchanges because the initiation rates have been alteredglobally (Naama Barkai, personal communication). Asdiscussed above, initiation rates are thought to be regulat-ed by competition among origins for limiting activationfactors. One recently proposed model makes the case boththeoretically and experimentally that the limiting factor is aprotein associated with active replication forks [65]. TheCdc45 protein, which is required to activate the MCM heli-case complex, is one such candidate [56]. Alternatively,factors such as DDK, which phosphorylates and activatesMCM, have been seen to be rate limiting in fission yeast [23].

The competition for limiting activators explains whyorigins fire stochastically but not why some origins firewith higher probability than others. One obvious explana-tion for differing probabilities of origin firing is the effect ofchromatin structure on the accessibility of origins to initi-ation factors [53]. In the context of competition betweenorigins for limiting activators, it is natural to imagine thatchromatin structure affects that competition, allowingeuchromatic origins greater access to activators and sohigher firing probabilities. This possibility fits well withthe strong correlation observed between heterochromatinand late replication [46]. Another possibility that we haverecently proposed is based on the observation that multipleMCMs are loaded at each origin [8,60]. In this model, eachMCM loaded has a low probability of firing; however,because multiple MCMs are loaded at each origin, originsthat have more MCMs loaded will have a higher aggregatefiring probability. Thus, the probability of origin firing isset in part by the number of MCMs loaded at a given originsite. The probability of origin firing can then be subse-quently altered by chromatin context. For example, arecent study has shown that Rif1, which affects telomerechromatin structure, also binds to chromosome arms andalters origin initiation rates at these sites, perhaps byaltering the loading of the Cdc45 that is required forMCM helicase activation [33].

377

Box 3. Theoretical techniques for analyzing DNA replication

timing

Although determining the firing time of an origin would seem

straightforward, particularly for the relatively simple yeast genome,

the heterogeneous nature of origin firing and the passive replication

of origins by forks from neighboring origins mean that the

distribution of origin firing times cannot be directly inferred from

its average replication time [94]. Therefore, rigorous analysis of

replication timing patterns has relied on more sophisticated

analytical tools. One of the most straightforward and widespread

methods is computer simulation [6,27,28,30,38]. An advantage of

simulation is that, with modest computer resources (especially if

simulations keep track of only positions of forks and origins rather

than use a lattice for each point on the genome [95]), one can

recreate in silico not only the ideal experimental scenario envisaged,

but also any relevant experimental details. For example, it is

straightforward to include the effects of asynchrony in the cell

population, finite microscope resolution, labeling artifacts, and the

like [96]. Once the artifacts and the replication scenario are chosen

correctly, the simulation can reproduce, within statistical error, the

data from any given scenario.

The main disadvantage of simulations is that to analyze experi-

mental data, one must first determine both the appropriate type of

replication scenario to simulate and ways to incorporate experi-

mental details and then determine the appropriate parameters to

use. In situations in which origin firing is not uniformly distributed,

each origin will be characterized by several parameters, and so the

simulation may depend on hundreds or even thousands of

parameters, depending on the type of organism. Curve-fitting

techniques, which amount to a search in the space of parameters,

require simulating a large number of scenarios. Analytical models,

which can be used to directly calculate replication profiles instead of

needing to simulate replication step by step, are one way to get

around such obstacles. Analytical models may be evaluated faster

than simulations. The difficulties are that one must be able to

determine an appropriate model and be able to solve it. Thus,

beginning with [6], a variety of analytical models have been

proposed [8,39,42,94,97]. Because models based on independent

origins are simpler than ones that allow correlated initiations, most

of the above work has assumed such a scenario. Nonetheless, some

analysis of correlated initiations has been done, as discussed in the

main text.


A scenario comprising stochastically firing origins withdifferent firing probabilities naturally leads to a reproduc-ible replication-timing program [66]. Origins with highfiring probabilities will be more likely to fire in early Sphase and so will have early average replication times. Ingeneral, low-probability origins would be unlikely to fireefficiently even in late S phase. However, if the firing rate,I(t), increases during S phase, as described above, even low-probability origins, if not passively replicated, will have ahigh probability of firing late in S phase, leading to efficientreplication of late-replicating regions [18]. Here, we distin-guish between I(t), which describes the timing program,and the underlying biological mechanisms, which try toexplain why I(t) has an observed form. This description oforigin timing applies not only to the individual origins ofsimpler genomes, such as budding yeast, but also to thecomplicated replication domains of metazoan genomes. Inthe latter case, euchromatic replication domains of high-probability origins reproducibly replicate earlier than dodomains of lower-probability origins, but heterochromaticdomains, which harbor the lowest-probability origins,nonetheless replicate efficiently in late S phase. Thus,the order in which various domains of metazoan genomesreplicate may be a secondary consequence of the effect of

378

their chromatin structure on the firing probabilities oftheir origins. This possibility is consistent with the strongcorrelation between chromatin interactions and replica-tion timing [52].

Correlated origin initiationsAlthough much of observed replication timing can beexplained in terms of a picture of independent initiations,there is also evidence for correlations in initiation. Forexample, DNA fiber studies observe clusters of nearbyorigins that initiated at approximately the same time[12,13]. One plausible mechanism for origin clustering isthat the polymerases and other proteins responsible forreplication are localized within the nucleus in small fociknown as replication factories [67,68]. As a consequence, ifthe DNA is tethered to a location in the cell nucleus whilereplicating, it may loop around and find another set ofreplication machinery in the same factory. Such loopingcould increase the likelihood of origin firing of originslocated approximately 10 kb from an active fork and de-crease origin firing for closer origins [19].

Another line of argument suggesting the possibility ofcorrelated initiation lies in an observation of small biasesin the DNA base sequence near certain regions. It has beenshown that if a region of the genome is repeatedly repli-cated by a polymerase on the leading strand, mutationswill eventually lead to strand compositional asymmetries(an excess of G over C and T over A) [69]. Indeed, a largeproportion of known origins for H. sapiens have been foundby looking for signatures of compositional skew [70]. Earlyreplicating regions are then marked by an abrupt jump inthe local skew. Because adjacent early replicating regionsare separated by approximately 1 Mb and because theaverage distance between origins is approximately100 kb, there must be multiple initiations between eachearly region. To explain the observation that the composi-tional skew varies linearly between compositional discon-tinuities associated with origins, it was postulated that awave of correlated initiations occurs, which leads to a‘domino’ [16,17,71] or ‘next-in-line’ model [15]. It is notclear whether a looping mechanism [19] can explain sucheffects, whether some more complicated form of couplingbetween initiation and fork progression is required, orwhether the difference in chromatin structure betweenearly- and late-replicating regions can account for theseobservations. Such a possibility would avoid the need toinvoke coordinated origin firing. In support of this idea, arecent single-molecule replication kinetics analysis of themouse Igh locus is consistent with a stochastic model thatlacks any origin coordination [11] (Paolo Norio, personalcommunication).

In addition to temporal ordering of origin initiation,some models include spatial correlations in the positioningof origins. Recently, it was proposed that the clustering ofinitiated origins observed in Xenopus embryos and, to alesser extent in yeast origins, may speed up the overallcompletion of S phase [72]. A shorter S phase is particu-larly helpful in Xenopus embryos, as it prevents the mitoticcatastrophe discussed above. Clustering several inefficientorigins together can lead to a group that is collectivelyefficient in that one or the other of the origins is likely to


fire early. Although the periodic distribution of such groupsof origins would be an efficient way to replicate the genome,mechanisms that could achieve this global order are notclear, at present.

Concluding remarksThe hypothesis that replication is largely controlled by thelocal rate of initiation has received wide support fromrecent experiments and analyses. Models based on localreplication rates I(x,t) have successfully described thereplication process in budding and fission yeast, in Xeno-pus embryos and in the Igh locus of mouse pro-B cells[6,8,11,28,30,38] (Paolo Norio, personal communication). Alimiting factor in this work is that each of the aboveanalyses involved a long-term collaboration between ex-perimental biologists and modeling laboratories (the latterfrom a variety of fields, including physics, engineering andcomputer science). To broaden the use of quantitativeanalyses of replication and to analyze the growing numberof data sets, it is important that the software and analysisprocedures be usable by non-specialists. The recent deri-vation of ‘inversion’ formulas (A. Baker, PhD thesis, ENSde Lyon, France, 2011) that give I(x,t) directly from data onthe local average replication fraction f(x,t) obtainable frommicroarray or deep sequencing studies on synchronizedcell populations are a first step in that direction.

A second research direction is a more precise under-standing of the relation between the replication program,as described above, and the effects of DNA damage, with itsconcomitant activation of DNA repair mechanisms. Forexample, one consequence of damage that stalls replicationforks is the activation of additional origins, which now havemore time to initiate [73,74], an effect that is straightfor-ward to simulate [75] and model analytically [76]. Themodeling of fork stalls predicts that there is a criticaldensity of stalled forks (approximately one per replicon),above which there is a global delay in S phase and belowwhich the effects are minor and localized. Interestingly,this threshold density matches the observed stall densitiesin fragile zones and in cells with activated oncogenes [76].However, DNA damage can also induce checkpoints thatinhibit subsequent origin firing [77], complicating theoverall effect of DNA damage on replication timing. Arelated topic is the interrelation between mutation ratesand events in S phase. Although formal models to handlesuch situations are beginning to be developed [69], morework is needed to understand observations, such as thelink between mutation rate and S phase timing [78].

Although the independent origin hypothesis is attrac-tive in its simplicity and so far remarkably successful in itsapplication, there is evidence for correlated initiations insomatic metazoan cells. Some of the correlation is explain-able as straightforward consequences of the physical con-straints of clustering polymerases. In such a view, theprimary method of controlling timing in S phase remainsthe local modulation of overall initiation rates, and thecorrelations in the initiation of neighboring origins areproduced by the geometrical effects of loops induced byreplication factories. Whether such mechanisms suffice orwhether a more complicated control mechanism is at playis at present unclear. Time will tell.

AcknowledgmentsJB has been supported by grants from NSERC (Canada) and the HumanFrontiers Science Program. NR has been supported by NIH grantGM098815 and an American Cancer Society Research Scholar Grant.

References1 Baker, T.A. and Wickner, S.H. (1992) Genetics and enzymology of DNA

replication in Escherichia coli. Annu. Rev. Genet. 26, 447–4772 Masai, H. et al. (2010) Eukaryotic chromosome DNA replication: where,

when, and how? Annu. Rev. Biochem. 79, 89–1303 Remus, D. et al. (2009) Concerted loading of Mcm2-7 double hexamers

around DNA during DNA replication origin licensing. Cell 139, 719–7304 Evrin, C. et al. (2009) A double-hexameric MCM2-7 complex is loaded

onto origin DNA during licensing of eukaryotic DNA replication. Proc.Natl. Acad. Sci. U.S.A. 106, 20240–20245

5 Labib, K. (2010) How do Cdc7 and cyclin-dependent kinases trigger theinitiation of chromosome replication in eukaryotic cells? Genes Dev. 24,1208–1219

6 Herrick, J. et al. (2002) Kinetic model of DNA replication in eukaryoticorganisms. J. Mol. Biol. 320, 741–750

7 Jun, S. and Bechhoefer, J. (2005) Nucleation and growth in onedimension. II. Application to DNA replication kinetics. Phys. Rev. E71, 011909

8 Yang, S.C. et al. (2010) Modeling genome-wide replication kineticsreveals a mechanism for regulation of replication timing. Mol. Syst.Biol. 6, 404

9 Hamlin, J.L. et al. (2008) A revisionist replicon model for highereukaryotic genomes. J. Cell. Biochem. 105, 321–329

10 Norio, P. et al. (2005) Progressive activation of DNA replicationinitiation in large domains of the immunoglobulin heavy chain locusduring B cell development. Mol. Cell 20, 575–587

11 Gauthier, M.G. et al. (2012) Modeling inhomogeneous DNA replicationkinetics. PLoS ONE 7, e32053

12 Huberman, J.A. and Riggs, A.D. (1968) On the mechanism of DNAreplication in mammalian chromosomes. J. Mol. Biol. 32, 327–341

13 Blow, J.J. et al. (2001) Replication origins in Xenopus egg extract are 5–15 kilobases apart and are activated in clusters that fire at differenttimes. J. Cell Biol. 152, 15–25

14 Pasero, P. et al. (2002) Single-molecule analysis reveals clustering andepigenetic regulation of replication origins at the yeast rDNA locus.Genes Dev. 16, 2479–2484

15 Shaw, A. et al. (2010) S-phase progression in mammalian cells:modelling the influence of nuclear organization. Chromosome Res.18, 163–178

16 Audit, B. et al. (2009) Open chromatin encoded in DNA sequence is thesignature of ‘master’ replication origins in human cells. Nucleic AcidsRes. 37, 6064–6075

17 Guilbaud, G. et al. (2011) Evidence for sequential and increasingactivation of replication origins along replication timing gradients inthe human genome. PLoS Comput. Biol. 7, e1002322

18 Rhind, N. et al. (2010) Reconciling stochastic origin firing with definedreplication timing. Chromosome Res. 18, 35–43

19 Jun, S. et al. (2004) Persistence length of chromatin determines originspacing in Xenopus early-embryo DNA replication: quantitativecomparisons between theory and experiment. Cell Cycle 3, 223–229

20 McCune, H.J. et al. (2008) The temporal program of chromosomereplication: genomewide replication in clb5D Saccharomycescerevisiae. Genetics 180, 1833–1847

21 Patel, P.K. et al. (2006) DNA replication origins fire stochastically infission yeast. Mol. Biol. Cell 17, 308–316

22 Czajkowsky, D.M. et al. (2008) DNA combing reveals intrinsic temporaldisorder in the replication of yeast chromosome VI. J. Mol. Biol. 375,12–19

23 Patel, P.K. et al. (2008) The Hsk1(Cdc7) replication kinase regulatesorigin efficiency. Mol. Biol. Cell 19, 5550–5558

24 Mantiero, D. et al. (2011) Limiting replication initiation factors executethe temporal programme of origin firing in budding yeast. EMBO J. 30,4805–4814

25 Wu, P.Y. and Nurse, P. (2009) Establishing the program of origin firingduring S phase in fission yeast. Cell 136, 852–864

26 Tanaka, S. et al. (2011) Origin association of sld3, sld7, and cdc45proteins is a key step for determination of origin-firing timing. Curr.Biol. 21, 2055–2063

379


27 Spiesser, T.W. et al. (2009) A model for the spatiotemporal organizationof DNA replication in Saccharomyces cerevisiae. Mol. Genet. Genomics282, 25–35

28 de Moura, A.P. et al. (2010) Mathematical modelling of wholechromosome replication. Nucleic Acids Res. 38, 5623–5633

29 Luo, H. et al. (2010) Genome-wide estimation of firing efficiencies oforigins of DNA replication from time-course copy number variationdata. BMC Bioinform. 11, 247

30 Lygeros, J. et al. (2008) Stochastic hybrid modeling of DNA replicationacross a complete genome. Proc. Natl. Acad. Sci. U.S.A. 105, 12295–12300

31 Koutroumpas, K. and Lygeros, J. (2011) Modeling and analysis of DNAreplication. Automatica 47, 1156–1164

32 Raghuraman, M.K. and Brewer, B.J. (2010) Molecular analysis of thereplication program in unicellular model organisms. Chromosome Res.18, 19–34

33 Hayano, M. et al. (2011) Mrc1 marks early-firing origins andcoordinates timing and efficiency of initiation in fission yeast. Mol.Cell. Biol. 31, 2380–2391

34 Knott, S.R. et al. (2012) Forkhead transcription factors establish origintiming and long-range clustering in S. cerevisiae. Cell 148, 99–111

35 Herrick, J. et al. (2000) Replication fork density increases during DNAsynthesis in X. laevis egg extracts. J. Mol. Biol. 300, 1133–1142

36 Lucas, I. et al. (2000) Mechanisms ensuring rapid and complete DNAreplication despite random initiation in Xenopus early embryos. J. Mol.Biol. 296, 769–786

37 Labit, H. et al. (2008) DNA replication timing is deterministic at thelevel of chromosomal domains but stochastic at the level of replicons inXenopus egg extracts. Nucleic Acids Res. 36, 5623–5634

38 Goldar, A. et al. (2008) A dynamic stochastic model for DNA replicationinitiation in early embryos. PLoS ONE 3, e2919

39 Gauthier, M.G. and Bechhoefer, J. (2009) Control of DNA replication byanomalous reaction–diffusion kinetics. Phys. Rev.Lett. 102, 158104

40 Harland, R.M. and Laskey, R.A. (1980) Regulated replication of DNAmicroinjected into eggs of Xenopus laevis. Cell 21, 761–771

41 Hyrien, O. and Mechali, M. (1993) Chromosomal replication initiatesand terminates at random sequences but at regular intervals in theribosomal DNA of Xenopus early embryos. EMBO J. 12, 4511–4520

42 Yang, S.C. and Bechhoefer, J. (2008) How Xenopus laevis embryosreplicate reliably: investigating the random-completion problem. Phys.Rev. E 78, 041917

43 Graham, C.F. (1966) The regulation of DNA synthesis and mitosis inmultinucleate frog eggs. J. Cell Sci. 1, 363–374

44 Goldar, A. et al. (2009) Universal temporal profile of replication originactivation in eukaryotes. PLoS ONE 4, e5899

45 Blumenthal, A.B. et al. (1974) The units of DNA replication inDrosophila melanogaster chromosomes. Cold Spring Harb. Symp.Quant. Biol. 38, 205–223

46 Lima-de-Faria, A. and Jaworska, H. (1968) Late DNA synthesis inheterochromatin. Nature 217, 138–142

47 Gilbert, N. et al. (2004) Chromatin architecture of the human genome:gene-rich domains are enriched in open chromatin fibers. Cell 118,555–566

48 Schwaiger, M. and Schubeler, D. (2006) A question of timing: emerginglinks between transcription and replication. Curr. Opin. Genet. Dev. 16,177–183

49 MacAlpine, D.M. et al. (2004) Coordination of replication andtranscription along a Drosophila chromosome. Genes Dev. 18, 3094–3105

50 Hiratani, I. et al. (2009) Replication timing and transcriptional control:beyond cause and effect: part II. Curr. Opin. Genet. Dev. 19, 142–149

51 Lieberman-Aiden, E. et al. (2009) Comprehensive mapping of long-range interactions reveals folding principles of the human genome.Science 326, 289–293

52 Ryba, T. et al. (2010) Evolutionarily conserved replication timingprofiles predict long-range chromatin interactions and distinguishclosely related cell types. Genome Res. 20, 761–770

53 Hayashi, M.T. and Masukata, H. (2011) Regulation of DNA replicationby chromatin structures: accessibility and recruitment. Chromosoma120, 39–46

54 Lebofsky, R. et al. (2006) DNA replication origin interference increasesthe spacing between initiation events in human cells. Mol. Biol. Cell 17,5337–5345

380

55 Cayrou, C. et al. (2011) Genome-scale analysis of metazoan replicationorigins reveals their organization in specific but flexible sites defined byconserved features. Genome Res. 21, 1438–1449

56 Wong, P.G. et al. (2011) Cdc45 limits replicon usage from a low densityof preRCs in mammalian cells. PLoS ONE 6, e17533

57 Krasinska, L. et al. (2008) Cdk1 and Cdk2 activity levels determine theefficiency of replication origin firing in Xenopus. EMBO J. 27, 758–769

58 Katsuno, Y. et al. (2009) Cyclin A-Cdk1 regulates the origin firingprogram in mammalian cells. Proc. Natl. Acad. Sci. U.S.A. 106, 3184–3189

59 Thomson, A.M. et al. (2010) Replication factory activation can bedecoupled from the replication timing program by modulating Cdklevels. J. Cell Biol. 188, 209–221

60 Edwards, M.C. et al. (2002) MCM2-7 complexes bind chromatin in adistributed pattern surrounding the origin recognition complex inXenopus egg extracts. J. Biol. Chem. 277, 33049–33057

61 Dijkwel, P.A. et al. (2002) Initiation sites are distributed at frequentintervals in the Chinese hamster dihydrofolate reductase origin ofreplication but are used with very different efficiencies. Mol. Cell.Biol. 22, 3053–3065

62 Harvey, K.J. and Newport, J. (2003) CpG methylation of DNA restrictsprereplication complex assembly in Xenopus egg extracts. Mol. Cell.Biol. 23, 6769–6779

63 Koren, A. et al. (2010) MRC1-dependent scaling of the budding yeastDNA replication timing program. Genome Res. 20, 781–790

64 Alvino, G.M. et al. (2007) Replication in hydroxyurea: it’s a matter oftime. Mol. Cell. Biol. 27, 6396–6406

65 Ma, E. et al. (2012) Do replication forks control late origin firing inSaccharomyces cerevisiae? Nucleic Acids Res. 40, 2010–2019

66 Rhind, N. (2006) DNA replication timing: random thoughts aboutorigin firing. Nat. Cell Biol. 8, 1313–1316

67 Hozak, P. and Cook, P.R. (1994) Replication factories. Trends Cell Biol.4, 48–52

68 Baddeley, D. et al. (2010) Measurement of replication structures at thenanometer scale using super-resolution light microscopy. Nucleic AcidsRes. 38, e8

69 Chen, C.L. et al. (2011) Replication-associated mutational asymmetryin the human genome. Mol. Biol. Evol. 28, 2327–2337

70 Touchon, M. et al. (2005) Replication-associated strand asymmetries inmammalian genomes: toward detection of replication origins. Proc.Natl. Acad. Sci. U.S.A. 102, 9836–9841

71 Chagin, V.O. et al. (2010) Organization of DNA replication. Cold SpringHarb. Perspect. Biol. 2, a000737

72 Karschau, J. et al. (2012) Optimal placement of origins for DNAreplication. Phys. Rev. Lett. 108, 058101

73 Ge, X.Q. et al. (2007) Dormant origins licensed by excess Mcm2-7 arerequired for human cells to survive replicative stress. Genes Dev. 21,3331–3341

74 Blow, J.J. et al. (2011) How dormant origins promote complete genomereplication. Trends Biochem. Sci. 36, 405–414

75 Blow, J.J. and Ge, X.Q. (2009) A model for DNA replication showinghow dormant origins safeguard against replication fork failure. EMBORep. 10, 406–412

76 Gauthier, M.G. et al. (2010) Defects and DNA replication. Phys. Rev.Lett. 104, 218104

77 Sancar, A. etal. (2004)Molecularmechanisms of mammalian DNA repairand the DNA damage checkpoints. Annu. Rev. Biochem. 73, 39–85

78 Herrick, J. (2011) Genetic variation and DNA replication timing, orwhy is there late replicating DNA? Evolution 65, 3031–3047

79 Cairns, J. (1963) The bacterial chromosome and its manner ofreplication as seen by autoradiography. J. Mol. Biol. 6, 208–213

80 Gratzner, H.G. (1982) Monoclonal antibody to 5-bromo- and 5-iododeoxyuridine: a new reagent for detection of DNA replication.Science 218, 474–475

81 Jackson, D.A. and Pombo, A. (1998) Replicon clusters are stable units ofchromosome structure: evidence that nuclear organization contributesto the efficient activation and propagation of S phase in human cells. J.Cell Biol. 140, 1285–1295

82 Bensimon, A. et al. (1994) Alignment and sensitive detection of DNA bya moving interface. Science 265, 2096–2098

83 Michalet, X. et al. (1997) Dynamic molecular combing: stretching thewhole human genome for high-resolution studies. Science 277, 1518–2123


84 Norio, P. and Schildkraut, C.L. (2001) Visualization of DNAreplication on individual Epstein–Barr virus episomes. Science 294,2361–2364

85 Kitamura, E. et al. (2006) Live-cell imaging reveals replication ofindividual replicons in eukaryotic replication factories. Cell 125,1297–1308

86 van Oijen, A.M. and Loparo, J.J. (2010) Single-molecule studies of thereplisome. Annu. Rev. Biophys. 39, 429–448

87 Riehn, R. et al. (2005) Restriction mapping in nanofluidic devices. Proc.Natl. Acad. Sci. U.S.A. 102, 10012–10016

88 Sidorova, J.M. et al. (2009) Microfluidic-assisted analysis of replicatingDNA molecules. Nat. Protoc. 4, 849–861

89 Raghuraman, M.K. et al. (2001) Replication dynamics of the yeastgenome. Science 294, 115–121

90 Woodfine, K. et al. (2004) Replication timing of the human genome.Hum. Mol. Genet. 13, 191–202

91 Desprat, R. et al. (2009) Predictable dynamic program of timing of DNAreplication in human cells. Genome Res. 19, 2288–2299

92 Chen, C.L. et al. (2010) Impact of replication timing on non-CpG and CpGsubstitution rates in mammalian genomes. Genome Res. 20, 447–457

93 Yabuki, N. et al. (2002) Mapping of early firing origins on a replicationprofile of budding yeast. Genes Cells 7, 781–789

94 Retkute, R. et al. (2011) Dynamics of DNA replication in yeast. Phys.Rev. Lett. 107, 068103

95 Jun, S. et al. (2005) Nucleation and growth in one dimension. I. Thegeneralized Kolmogorov–Johnson–Mehl–Avrami model. Phys. Rev. E71, 011908

96 Yang, S.C. et al. (2009) Computational methods to study kinetics ofDNA replication. Methods Mol. Biol. 521, 555–573

97 Brummer, A. et al. (2010) Mathematical modelling of DNA replicationreveals a trade-off between coherence of origin activation androbustness against rereplication. PLoS Comput. Biol. 6, e1000783

381

Human limb abnormalities caused bydisruption of hedgehog signalingEve Anderson, Silvia Peluso, Laura A. Lettice and Robert E. Hill

MRC Human Genetics Unit at the MRC Institute of Genetics and Molecular Medicine, University of Edinburgh,

Edinburgh, EH4 2XU, UK

Review

Glossary

Acheiropodia: an autosomal recessive disorder that results in severe trunca-

tions of the arms and legs, such that there is lack of the distal extremities.

Acrocapitofemoral dysplasia: a rare recessive condition characterized mainly

by short limbs, dwarfism and cone-shaped epiphyses at the joints, mainly in

the hands and hips.

Apical ectodermal ridge (AER): a specialized ectodermal structure that forms

along the distal edge of the limb bud and acts as a major signaling center

through the FGFs.

Brachydactyly: a condition that affects the length of the digits, making the

fingers and toes appear shorter.

Craniosynostosis Philadelphia type: craniosynostosis is a condition in which

one or more of the bony primordia of the infant skull prematurely ossifies, thus

changing the growth pattern of the skull. Philadelphia type has associated

syndactyly of the hands and feet.

Preaxial and postaxial polydactyly: polydactyly means additional digits and

pre-and postaxial refer to the side of the hand or foot that the extra digit

appears. Preaxial is the thumb and big toe side; whereas postaxial is the

opposite side.

Syndactyly: a condition in which two or more digits are fused together.

Syndromic: a syndromic condition is characterized by having several

recognizable clinical features that occur together and are associated for

diagnosis. A nonsyndromic condition has a single clinical feature.

Triphalangeal thumb: whereas each finger has three phalanges (the small

bones of the digits), the thumb only has two. In this condition, the thumb has

an extra phalanx and often has the appearance of a finger.

Zone of polarizing activity (ZPA): an area of mesenchymal cells located along

the posterior margin of the limb bud that produces SHH. SHH patterns the early

limb bud along the A–P axis, specifying digit identity and the number of digits

that will form.

Human hands and feet contain bones of a particular sizeand shape arranged in a precise pattern. The secretedfactor sonic hedgehog (SHH) acts through the con-served hedgehog (Hh) signaling pathway to regulatethe digital pattern in the limbs of tetrapods (i.e. land-based vertebrates). Genetic analysis is now uncoveringa remarkable set of pathogenetic mutations that alterthe Hh pathway, thus compromising both digit numberand identity. Several of these are regulatory mutationsthat have the surprising attribute of misdirecting ex-pression of Hh ligands to ectopic sites in the developinglimb buds. In addition, other mutations affect a funda-mental structural property of the embryonic cell that isessential to Hh signaling. In this review, we focus on therole that the Hh pathway plays in limb development, andhow the many human genetic defects in this pathwayare providing clues to the mechanisms that regulatelimb development.

Human limb abnormalities that affect digit numberStructural abnormalities of the hands and feet are fre-quent birth defects, several of which have known geneticcauses. These defects may affect just the limbs or may bepart of a complex syndrome affecting several organs. Mam-malian limb-bud development is based on a highly con-served pentadactyl pattern for the digits in the hands andfeet [1], and deviation from five digits can be informativefor clinicians and developmental biologists. Too manydigits, or polydactyly (Glossary), is the most frequentlyobserved congenital hand malformation, with a prevalenceof approximately two per 1000 live births [2]. Depending onthe anatomical location of the extra digits, polydactyly isclassified as preaxial (on the side of the thumb and big toe)or postaxial (the opposite side). The genetic contribution topolydactyly was recently surveyed [3] and a remarkablenumber of individual clinical classifications (80) that in-clude polydactyly have been assigned to 99 different genes.

Mechanism that polarizes the limbDuring development, digit number and identity is regulat-ed by a mechanism that initially polarizes the limb bud andthen specifies digit identity and regulates growth. Thecomplementary expression of the transcription factorsGLI3, a zinc finger-containing DNA-binding protein, inthe anterior half and HAND2, a member of the basic

Corresponding author: Hill, R.E. ([email protected]).Keywords: limb development; sonic hedgehog; limb abnormalities; polydactyly; cilia.

364 0168-9525/$ – see front matter � 2012 Elsevier Ltd. All rights reserved. h

helix–loop–helix family of DNA-binding proteins, in theposterior half of the limb [4] are the first molecular indica-tions that the early limb is polarized (Figure 1). This thenpredisposes the posterior margin of the limb bud to expressthe Shh gene, which is the crucial step in regulating spatialvariation along the anterioposterior (A–P) axis of the earlylimb bud. The Shh gene is expressed at the posteriormargin of the limb in a region that was defined in trans-plantation experiments during the 1960s as the zone ofpolarizing activity (ZPA) [5]. These experiments showedthat chick embryonic limb tissue transplanted from theposterior to the anterior limb-bud margin secreted a factor,now known to be SHH, that induced the generation of extradigits.

SHH acts via the Hh signaling pathway, which is re-markably conserved from flies to mammals [6]. Much ofwhat is known about the pathway initially came fromanalysis in Drosophila, which has only a single Hh gene;in mice, three homologs exist [desert hedgehog (Dhh),Indian hedgehog (Ihh) and Shh] (Box 1 and Figure 2).

ZPA regulatory sequence (ZRS): an approximately 800-bp cis-regulatory

sequence that is necessary and sufficient for the limb specific expression of

the Shh gene.

ttp://dx.doi.org/10.1016/j.tig.2012.03.012 Trends in Genetics, August 2012, Vol. 28, No. 8



(a) (b)

AER

GLI3

A P

HAND2

5′ HOXD

GLI3R

GLI3A

HAND2

SHH

ETV4/ETV5

ETS1/GABPα

SHH

AER (FGFs)AER (FGFs) (c)

TRENDS in Genetics

Figure 1. Expression of genes that polarize the limb and regulate sonic hedgehog (Shh) expression in the zone of polarizing activity (ZPA). The earliest limb bud (a) is

polarized by the expression of GLI3 in the anterior (A) and by HAND2 (which downregulates GLI3) in the posterior (P). The expression of the 50Hoxd and Shh genes follow

Hand2 expression and Shh is upregulated by HAND2 in the ZPA. Once SHH is produced (b), it maintains the expression of Hand2 and the 50Hoxd genes in a regulatory loop.

The gradient of GLI3A is shown below. (c) Distal production of ETV4/ETV5 and ETS1/GABPa in overlapping patterns. ETV4/ETV5 ensures that ectopic expression does not

occur in the wild-type limb, whereas ETS1/GABPa determines the position of the Shh expression boundary. Abbreviations: AER, apical ectodermal ridge; FGFs, fibroblast

growth factors 4, 8, 9 and 17.


SHH signaling regulates the proteolytic processing ofmembers of the GLI (after glioma) family of proteins,one of which, GLI3, is of particular interest early in limbdevelopment (Figure 1). GLI3 is expressed across the limbbud; however, in the posterior of the limb bud, where SHHconcentrations are high, GLI3 is present in the full-lengthactivator form, Gli3A; by contrast, in the anterior, whereSHH is low or undetectable, GLI3 is proteolytically pro-cessed into the repressor form, GLI3R [7]. The relativeconcentration of GLI3A:GLI3R across the developing limbbud specifies the differences between the fingers. The mostdistinctive digit, the thumb, develops from a region of the

Box 1. Conservation of the Hh pathway

The Hh gene and much of the Hh signaling pathway is conserved

from flies to mice [6] (Figure 2, main text). In Drosophila, one Hh

gene has been identified, whereas in mice three homologs exist:

Dhh, Ihh and Shh. In signaling cells, Hh is synthesized, cleaved and

lipid modified before being secreted [73]. In responding cells, Hh

binds to the Patched (Ptc) coreceptor, alleviating Ptc inhibition of the

seven-pass transmembrane protein Smoothened (Smo) and activat-

ing the downstream pathway [74].

In Drosophila, the transcriptional effecter of Hh signaling is called

Cubitus interruptus (Ci) and exists in two forms; a full-length activator

protein, and a truncated repressor protein generated by proteolytic

processing [75]. The processing of Ci is blocked by Hh, which also

serves to increase activity of the activator form [76]. In mammals,

members of the GLI protein family are homologs of Ci. There are three

members of the Gli family in vertebrates. Gli1 acts as a transcriptional

activator, whereas Gli2 and Gli3 exist in two forms: a full-length

activator form and a truncated transcriptional repressor [6].

Phosphorylation of Ci or Gli allows binding of the ubiquitin ligase

Slimb (Drosophila) or B-TrCP (mammals) and subsequent polyubi-

quitination and proteasome-mediated processing to their activator

forms [77]. Meanwhile, the activity of both CiA and GliA can be

inhibited by Suppressor of fused [SUFU (mouse) or Su(Fu) (flies)] [78].

The proteins Fused (Fu) and Costal 2 (Cos2) play an important role

in the Hh signaling in Drosophila [79,80]. Knockdown of Fu in mouse

cells does not disrupt Gli signaling. The Cos2 homologs in

mammals, Kif7 and Kif27, as well as Cos2 from Drosophila itself,

have been shown to regulate GLI in mammalian cells, suggesting a

conserved regulatory interaction [81].

limb bud that has the highest concentration of GLI3R andno detectable SHH activity. In addition, SHH and GLI3function together to constrain the number of digits pro-duced, thus ensuring pentadactyly [8,9]. In mice, the ab-sence of both SHH and GLI3 (Shh–/–;Gli3–/–), gives rise tomultiple, unspecified digits forming a polydactylous pawwith as many as six to 11 unspecified digits. This indicatesthat limb buds have an intrinsic capacity to produce digitprimordia and that this process is unregulated in theabsence of both SHH and GLI3.

Several models have been produced to explain SHHactivity [10]; a recent model suggests that SHH [11,12]integrates two different activities to regulate early limb-bud development. SHH initially acts as a morphogen tospecify digit identity at the earliest stages of limb develop-ment. Subsequently, it exhibits mitogenic activity thatensures the production of a sufficient number of cells topromote the normal complement of digits. Together, thesetwo activities of SHH are responsible for specifying theidentity of each digit and, as the limb bud expands, theposition within the limb bud in which each forms. This isobserved as a progressive formation of the digits, such thatthere is a stereotypical order in which each digit appears.For example, in the mouse, digit 4 appears first in the limbbud followed in order and rapid succession by digits 2, 5and then 3 (digit 1 appears to be the last to form). If cellularexpansion in the limb bud is reduced by attenuating SHHactivity, the digits are lost in the reverse order, with digit 3being the first to disappear.

Limb polarity and digit specificationAttempts to understand the genetic basis of preaxial poly-dactyly led to the identification of the cis-regulatory ele-ment responsible for controlling expression of Shh in theposterior part of the limb [13]. This 750–800-bp enhancersequence is both necessary and sufficient for regulating thespatial and temporal activity of Shh, which in turn definesthe ZPA; therefore, it was called the ZPA regulatory se-quence (ZRS) (Figure 3). The ZRS is highly conserved in all

365

HH

VertebrateNo SHH signaling

SMO

PTC

CI-R no transcriptionof target

genes

COS2SUFU Ci

kinases

FU

CI-R

SMOPTC

CI-A

transcriptionof target

genes

COS2

SUFUCi

kinasesFU

DrosophilaNo HH signaling

DrosophilaHH signaling

VertebrateSHH signaling

GLi3-R

KIF7

SUFU

GLI3

SUFU

GLI3

kinases

BTRCP

GLI3

Proteasome

GLI3-R

PTCH1

SMO

SHH

PTCH1

KIF7

SUFU

GLI3GRK2

B-ARRESTIN

KIF3A

SMO

GLi3-A

GLi3-A

(a) (b)

no transcriptionof target

genes

transcriptionof target

genes

SLIMBProteasome

TRENDS in Genetics

Figure 2. Conservation of the Hh signaling pathway. (a) Schematic representation of key components of the Drosophila HH signaling pathway in the absence (top) or

presence (bottom) of HH. In the absence of ligand, Patched (PTC) inhibits Smoothened (SMO), which is held in intracellular vesicles (yellow ovals). A complex of proteins,

including cubitus interruptus (CI), Costal2 (COS2) and several kinases, is established. Phosphorylation of CI establishes recognition signals for SLIMB leading to partial

degradation of CI by the proteasome and formation of the repressor form (CI-R). CI-R then translocates to the nucleus, where it represses transcription of HH targets.

Binding of secreted HH to PTC, blocks PTC activity and releases SMO from inhibition. SMO moves to the plasma membrane, where phosphorylation allows interaction with

COS2. Subsequent phosphorylation of COS2 by FU leads to release of unphosphorylated, full-length CI, which can translocate to the nucleus where it promotes

transcriptional activation. (b) Schematic representation of key conserved components of the vertebrate HH signaling pathway. The cilium (which is absent in Drosophila) is

represented by the central axoneme and the centrosome and basal bodies (gray). In the absence of SHH ligand, PTCH1 inhibits SMO, which is held in intracellular vesicles.

GLI3 is kept in the primary cilium in a complex with KIF7 and SUFU. Phosphorylation of GLI3 by kinases allows its recognition by b-TRCP and leads to partial degradation by

the proteasome, resulting in the formation of the repressor molecule. Activation by SHH relieves the inhibition of SMO by PTCH1. SMO becomes phosphorylated by GRK2,

binds to b-ARRESTIN and KIF3A, and is trafficked to the cilium. This relieves the inhibitory effect of SUFU and allows the full-length GLI3 to translocate to the nucleus and

activate target genes. Homologous genes in Drosophila and vertebrates are colored similarly.


vertebrates with opposing appendages, including fish. Inaddition, the ZRS is located inside an intron of the limbregion 1 homolog (Lmbr1) gene, which has no known role inlimb development and operates over a long distance toactivate the Shh promoter (800 kb–1 Mb away) in mice andhumans.

It is still an open question how the ZRS directs Shhexpression to the ZPA. It is known that the expression ofShh depends on the initial establishment of A–P polarity,and targeted mutations of the Hand2 gene show a role forthis gene in the early determinative process that functionsupstream of Shh expression [14]. However, a low basal levelof Shh expression is established in the absence of HAND2[14]. Therefore, initiation of Shh expression may rely onadditional signals, and one of these may emanate from aspecialized ectoderm that resides at the proximal border ofthe apical ectodermal ridge (AER; Figure 1), operatingthrough the T box-containing transcription factor TBX2[15]. As the limb bud emerges, the Hoxd gene cluster isactivated and becomes confined to the posterior mesen-chyme. Genetic analysis has shown that the 50HOXD factors

366

are essential for activation of Shh expression in the ZPA [16]and, in agreement with this, the 50HOXD proteins (specifi-cally HOXD10 and 13) may bind to the ZRS [17]. In addition,the regulatory function of HAND2 is mediated by directbinding near or at the ZRS and may bind as an activatingprotein complex with HOXD13 (and other 50HOXD proteins)[14]. Given that the ZRS is located a long distance from itstarget gene, it is crucial to convert transcription factorbinding to the ZRS into expression of Shh. In accord, chro-mosome architecture changes specifically in the expressingcells within the limb bud and two events are observed tooccur. First, a chromosomal looping mechanism brings theZRS close to the Shh promoter where they interact. Second,the Shh locus moves out of its chromosomal territory; fur-ther genetic analysis suggests that this is the event thatrelates directly to Shh activation [18].

Once Shh expression is initiated, members of the ETStranscription factor family act to establish the boundary ofexpression (Figure 1). Five ETS binding sites have beenidentified in the ZRS (Figure 3), at which two ETS factors,ETS1 and GABPa, were shown to bind [19]. Occupancy by

(a)

ZRS point mutations 105C

252G

258G

295T

297G

305A

329T

334T

396C

402C

404G

463T

475A

477A

406A

407T

555G

621C

739A

743T

769T

C A C A T C G T

Werner mesomelicsyndrome

T G G T A G G G CG AA,C

G

Wild-type Shh locus

ZRS duplication

ZRS

800kb

Lmbr1 Rnf32 Shh

ETS sites / ETV sites

(b)

Shh Rnf32 ZRS Lmbr1

Lmbr1ZRSRnf32Shh

Hs chromosome 7

Wild-type Shh locus

Inverted Shh locus

Limbenhancer

Limbenhancer

q22.1 q36.3

Chromosomalbreakpoint

TRENDS in Genetics

Figure 3. Mutations and chromosomal lesions in the Shh locus responsible for limb abnormalities. (a) Shh gene and the upstream regulatory region (including enhancers

shown as pink boxes). The ZRS cis-activator is shown as a gray box inside the Lmbr1 gene, enlarged above to show the number and position of the point mutations that

cause preaxial polydactyly. Some of the other Shh enhancers are shown in pink. The position [30] of the human (black), mouse (red), cat (green) and chicken (gray)

mutations are shown above the enlarged ZRS. The position of the ETS (ETS1/GABPa) (green ovals) and ETV (ETV4/ETV5) (blue ovals) binding sites identified by biochemical

and chromatin immunoprecipitation methods [17] are shown below the ZRS. The position of the Werner mesomelic syndrome mutations [29] are highlighted in blue. Below

the wild-type Shh locus is a representative intrachromosomal duplication that results in triphalangeal thumb-polysyndactyly (TPTPS). These duplications can be of various

sizes, such that the duplicated ZRSs can reside at various distances from the other. (b) Approximate position of the breakpoints of the intrachromosomal inversion on

human chromosome 7. Below the chromosome is a representation of the position of the Shh gene before and after the inversion, showing that Shh is now regulated by

another limb enhancer, a process called (enhancer adoption) [41].


these factors at multiple sites within the ZRS is required toset the appropriate boundary of expression in the posteriormesenchyme. Two other ETS factors, the closely relatedETV4 and ETV5, act redundantly to oppose ETS1/GABPa

activation. Limb buds that are deficient in both ETV4 andETV5 ectopically express Shh in a domain in mesenchymeat the anterior margin of the limb [20,21], indicating that

their normal role is to restrict Shh expression to theposterior region of the limb. Although ETV4/5 bind totwo sites within the ZRS, the binding at one of these sitesis sufficient to regulate Shh negatively in the anteriordomain [19]. In addition to this regulatory role, ETV4and ETV5 were shown to modulate the activity of twoother transcription factors, TWIST1 and HAND2, which

367


regulate Shh expression. A fine balance is proposed forTWIST1, an inhibitor of Shh expression, in the anterior ofthe limb and the positive regulator HAND2. An ETV4/5–TWIST1 complex is important in promoting the TWIST1inhibitory activity in the ectopic domain, perhaps by inhi-biting dimerization of TWIST1–HAND2 [22], which acts asan activator.

Finally, fibroblast growth factor (FGF) signaling is cen-tral to Shh expression, both as a positive and a negativeregulator (Figure 1). The FGFs 4, 8, 9 and 17 are expressedin the AER [23] (Figure 1) and mediate limb bud outgrowthand maintenance of Shh expression in the ZPA. In addi-tion, FGFs regulate the production of ETV4 and ETV5 andso are responsible for repression of Shh expression ectopi-cally at the anterior margin [20,21,23].

Limb deformities due to aberrant Shh expressionRegulatory mutations in the ZRS cause misexpression ofShh and are associated with limb malformations [24]. Thelimb defects that result from a mutant ZRS fall into

1

2(a)

(b) (c)

2 23 3 3

∗

4 45

Normal hand Isolatedtriphangial thumb

Preaxiapolydactyly

5

T T

Figure 4. Representative phenotypes for each of the limb abnormalities caused by mis

misexpression of the Shh gene. Bones are represented along the top, and each digit is nu

identified are labeled with an asterisk. Below are pictures of hands of patients with the va

with short-limb dwarfism. The X-ray shows the tibial hypoplasia in the right leg (the whit

(c) exhibiting the severe limb truncations that characterize this abnormality.

368

different clinical classifications that show a robust geno-type–phenotype correlation but comprise an overlappingspectrum of digit abnormalities. These are preaxial poly-dactyly type II (PPD2, MIM# 174500), which includesisolated triphalangeal thumb, triphalangeal thumb-polysyndactyly syndrome (TPTPS), syndactyly type IV(SD4, MIM# 186200) and Werner mesomelic syndrome(WMS) [13,25–32] (Figure 4). It has been suggested thatthis group of limb defects should be collectively referred toas ‘ZRS-associated syndromes’ [29].

PPD2 is characterized by a triphalangeal thumb(Figure 4) sometimes leading to the appearance of a five-fingered hand and, in some cases, may be accompanied byadditional digits. Fifteen single-point mutations in thehuman ZRS have been identified that are associated withthis limb abnormality (Figure 3). Extra toes have also beenfrequently observed in other species, including mice[13,33,34], cats [35] and chickens [36–38], and these ab-normalities are associated with seven more point muta-tions in the ZRS. A mutation in polydactylous dogs was

2 233 4

5

∗∗∗

4 4

l type2

Postaxialpolydactyly typeA

Triphalangial thumbpolysyndactyly

55

1

TRENDS in Genetics

expression of the Shh gene. (a) Types of digit abnormality of the hands caused by

mbered, the triphalangeal thumb is labeled ‘T’ and digits that cannot be accurately

rious disorders [26,28,87,88]. Werner mesomelic syndrome [29] in (b) is associated

e arrow indicates the end of the femur). A patient with acheiropodia [44] is shown in


found in a conserved domain upstream of the ZRS, calledthe pre-ZRS; however, it is not clear how this domainregulates Shh expression [39]. Much of the understandingof the molecular mechanism underlying PPD2 comes fromstudies in mice. A single nucleotide change in the sequenceof the ZRS is sufficient to generate ectopic production ofShh such that it is anomalously expressed at the opposite,anterior margin of the limb bud [30,35,40]. Ectopic Shhexpression presumably produces an additional ZPA and,consequently, affects the GLI3R:GLI3A ratio, leading torespecification of the developing anterior digits. The phe-notypic outcome is seen in some cases as the transforma-tion of the thumb to a fifth finger, often accompanied by theproduction of additional digits.

Mechanisms that give rise to anomalous expression ofShh are being investigated. The two ETS factors thatregulate the SHH expression boundary play a central rolein generating polydactyly in two different families [19]. Inthese families, ZRS point mutations were shown to giverise to new, additional ETS1/GABPa binding sites, leadingto the upregulation of the ZRS in the posterior margin ofthe limb bud, setting a wider boundary of expression andcausing ectopic expression at the anterior margin. Becauseboth Ets1 and Gabpa genes are expressed at the anteriormargin (in mice) and the ZRS is primed for expression inthis ectopic region [18], the additional binding sites aresufficient to override the inhibition of Shh expression andcause ectopic expression. Another point mutation thatchanges transcription factor binding to the ZRS wasreported for a polydactylous mouse designated DZ [34].In this case, the point mutation introduced a higher affinitybinding site in the ZRS recognized by the nuclear factorHnRNP U, which was postulated to mediate the interac-tion between the cis-regulator and the 50 end of the Shhgene.

WMS (Figure 4) is an autosomal dominant disorder withpreaxial polydactyly of the hands and feet that also showsthe additional, distinctive characteristic of associateddwarfism [13,29]. This condition appears to be at the severeend of the phenotypic spectrum of ZRS mutations. Theshort stature is the result of tibial hypoplasia (i.e. verysmall or absent tibia). The molecular basis for this disorderis also a point mutation, but at a specific position, nucleo-tide 404 (either a G>A or G>C change) (Figure 3), of theZRS. Again, this mutation is likely to have an effect ontranscription factor binding that is causative of the pheno-type. Analysis of ZRS activity carrying the G>A mutationby mouse transgenesis suggests that expression in theectopic domain occurs at a high level and extends broadlyalong the anterior limb-bud margin [35]. This level ofectopic SHH production may disrupt specification of thetibia and affect chondrogenesis.

Recently, the genetic basis of a severe form of polysyn-dactyly (extra digits with fusions of digits, particularly ofthe hands) was reported. Haas type (syndactyly type IV)polysyndactyly and TPTPS [27–29] (Figure 4) show aconsistent association with intrachromosomal duplica-tions involving the genomic region that contains theZRS, leading to a tandem duplication of the ZRS (ortriplication in one patient) (Figure 3). The molecularmechanism that gives rise to this limb phenotype is not

known; however, it is reasonable to speculate that ectopicexpression of SHH in the anterior margin of the limb bud isresponsible for the polydactyly. The role that SHH expres-sion plays in the syndactyly phenotype in patients witheither Haas-type polysyndactyly or TPTPS is less clear;however, an isolated case of a patient with a distinct formof syndactyly was recently reported that may shed somelight on this process. This patient had fusions of all fingersand toes along the entire length of each digit, which wasshown to involve misregulation of the SHH gene in thelimb but did not involve the ZRS [41]. Chromosomalanalysis revealed that this patient had an intrachromo-somal inversion (Figure 3) with one breakpoint upstreamof the SHH gene such that it ended up under the influenceof a different enhancer at the other end of the breakpoint,freeing it from the influence of the ZRS and other regula-tors. This event was termed ‘enhancer adoption’. In mousetransgenics, this new enhancer was shown to drive ex-pression broadly in the limb, extending to later develop-mental stages and persisting in the interdigitalmesenchyme. Further transgenic studies showed that,by placing this enhancer upstream of the mouse Shh gene,expression was directed to the interdigital cells at a laterthan normal stage in development. Syndactyly is probablythe result of the rescue of the interdigital tissue from celldeath due to this abnormal expression of Shh. This alsosuggests that the ZRS duplications that cause TPTPSsimilarly affect the temporal expression of SHH in theinterdigital regions of the limb.

Another chromosomal inversion with a breakpoint be-tween Shh and the ZRS was previously reported in mice forthe Dsh (short digits) mutation [42] Shh is ectopicallyactivated in the cartilage of early digit primordia of theDsh heterozygous embryo, representing another exampleof spatial and temporal misregulation due to a chromo-somal rearrangement. However, in this case, it was postu-lated that the misexpression of Shh was the result of theremoval of a repressor that enabled additional expressionto occur in the early developing digits.

One other potential regulatory mutation of the SHHgene was uncovered in the genetic analysis of a conditioncalled acheiropodia (MIM# 200500), a rare, recessive con-dition in which the hands and feet are lacking. Thisphenotype is similar to that seen in mice lacking ZRSactivity [43]. However, the genetic lesion reported for thesefamilies [44] is a 4–6 kb deletion upstream (approximately30 kb) of the ZRS. These genetic data suggest that a second,limb-specific regulatory component exists within the de-leted DNA, the role of which may be to modify ZRS activity.

Taken together, these examples illustrate several dif-ferent mechanisms that can alter the regulation of Shh,with a significant impact on the developing embryo. Theserange from simple nucleotide changes in the ZRS thatcause ectopic expression to duplications that, more sur-prisingly, appear to affect both spatial and temporal ex-pression. These mutational mechanisms only appear toaffect Shh limb expression, as there is no evidence that theexpression is misdirected outside the limb bud. Finally,acheiropodia deletions appear to result in a lack of Shhlimb expression due to removal of an element that modifiesZRS activity.

369


Ihh misregulation also causes limb abnormalitiesAnother Hh signaling factor, Ihh, is expressed in thecartilage of the developing long bones in the limb. Here,Ihh is expressed within the growth plate, where it isresponsible for regulation of chondrocyte proliferationand differentiation [45]. Ihh is not expressed at earlylimb-bud stages when Shh is expressed in the posteriormesenchyme, suggesting that Ihh has a role distinct fromShh. Despite this, IHH and Shh operate along similarsignaling pathways, including regulation of the conservedtarget GLI [46].

In humans, loss-of-function mutations in the IHH generesult in the autosomal recessive condition acrocapitofe-moral dysplasia (MIM# 607778) [47], while gain-of-function mutations of IHH result in brachydactyly typeA1 (MIM# 112500) [48,49]. Evidence suggests that the Ihhgene has a similar regulatory landscape as Shh and thatIhh is also under long-range regulatory control. This hasbeen highlighted through analysis of three families withsyndactyly type 1 (including some family members withpolydactyly) and craniosynostosis Philadelphia type(MIM# 601222). This condition was found to map to asingle locus at 2q35. Further analysis revealed that allthree families contained distinct microduplications, but allshared the same 9-kb region located within the intron of agene 40 kb upstream of IHH. This shared region contains aputative distant regulator of IHH and represents a similarsituation to the duplication of the ZRS in the cases ofTPTPS [50].

Disruption of the long-range regulation of Ihh is alsoconsidered to be the cause of the polydactyly phenotypeseen in the Doublefoot (Dbf ) mouse mutant. Dbf is anautosomal dominant mutation that results in extremepolydactyly of all four limbs, containing six to nine digitson each paw that are triphalangeal and arise preaxially[51,52]. Ihh is expressed ectopically within the mutantlimb bud across the A–P axis, disrupting normal SHHactivity and overriding Shh expression usually driven bythe ZRS. A 600-kb deletion starting approximately 50 kbupstream of Ihh underlies the Dbf phenotype. This regionis expected to contain a cis-acting regulatory element,which could be a repressor of Ihh expression that is re-moved by the deletion or, alternatively, a cryptic enhancerthat may normally be located beyond the deleted regionand moves into an activating position [53].

Gli3 mutants affect Hh signalingThe zinc finger-containing transcription factor GLI3 is theultimate target for Shh signaling in the early limb bud [54].Heterozygous mutations in the GLI3 gene cause Greigcephalopolysyndactyly syndrome (GCPS: MIM# 175700)and Pallister–Hall syndrome (PHS MIM# 146510), bothof which include polydactyly in the spectrum of disorders[55,56]. In addition, in rare cases, GLI3 mutations causenonsyndromic polydactyly (MIM# 174700). The PHS andGCPS phenotypes are clinically distinct and, as with theShh regulatory mutants, there is a robust genotype–pheno-type correlation [57]. The polydactyly phenotype in PHS hasa central or insertional polydactyly; whereas GCPS exhibitspre- or postaxial polydactyly (most commonly preaxial of thefeet and postaxial of the hands) with variable syndactyly

370

[58]. Truncating mutations in the middle third of the Gli3gene cause PHS, whereas large deletions or truncationmutations in the amino or carboxy terminal third of thegene cause GCPS. PHS mutations are predicted to be domi-nant mutations in which the truncated protein ends near theproteolytic cleavage site to constitutively produce a repres-sor protein with a similar activity to GLI3R. This wouldskew the balance of the activator and the repressor forms ofGLI3, resulting in an anteriorizing affect on the limb bud.GCPS mutations are predicted to be null mutations, and thephenotype results from a haploinsufficiency, suggestingthat absolute amounts of GLI3R and GLI3A are requiredfor development. Mouse mutations that represent Gli3 lossof function (Gli3xt and Gli3pdn) and a mutation (Gli3D699)that causes a PHS-like truncation of the protein near theproteolytic cleavage site support the notion that GCPS andPHS are clinically distinct [59,60]. Mouse studies suggestthat GLI3 has a Shh-independent activity in early limb-budstages [60], acting to restrict HAND2 expression (Figure 1);however, it is not clear what role this independent activityplays in heterozygous human GLI3 mutations.

Cilia, the Hh pathway and limb patterningTransduction of the Hh signal to the GLI protein is amultistep process (Box 2 and Figure 2) and, over the pastdecade, it has become clear that there is a connectionbetween the complex steps of the Hh pathway and a uniquestructural component of the cell, the cilia (reviewed in [61–63]). Although the cilia have several signaling roles, it hasbeen suggested [64] that, in early development, primarycilia in vertebrates are dedicated to Hh signal transduc-tion. The phenotypes caused by loss of cilia-associatedproteins are syndromic, and not all patients show limbabnormalities, which suggests that cilia play an active rolein mediating Hh signaling and do not simply serve as acompartment in which pathway components are concen-trated.

The primary cilium is a small organelle that projectsfrom the surface of the cell. It comprises a central structureof microtubules, called the axoneme, that functions tomaintain the cilium and extends the structure by transportof particles along its length. This intraflagellar transport(IFT) mechanism transports molecules from the base to thetip of the cilium. Evidence suggests that components of theIFT machinery are involved in the Hh signaling pathway.The GLI proteins (Gli2 and Gli3) are localized at the ciliatip and trafficked along the axoneme in response to Hhsignaling [65]. Thus, mutations in those proteins involvedin the trafficking process often have phenotypes reminis-cent of Hh signaling defects (Figure 2).

Several congenital human disorders, called ciliopathies,result from recessive mutations in genes that have a role inthe cilia or the basal body [66]. Ciliopathies are a hetero-geneous group of diseases presenting with a broad spec-trum of clinical phenotypes, including pre- and postaxialpolydactyly. For example, Joubert syndrome (MIM#213300), Meckel–Gruber syndrome (MIM# 249000), andBardet–Biedl syndrome (BBS, MIM# 209900) are all asso-ciated with polydactyly. Joubert syndrome can result frommutations in at least ten genes and is characterized by aspecific brain malformation with additional pathologies. A

Box 2. Vertebrate Hh signaling in the cilium

The main difference between mammalian and Drosophila Hh

signaling is the central role played by cilia in mammals but not in

flies [6] (Figure 2, main text). Drosophila lacking cilia develop almost

normally, indicating that cilia are not required for Drosophila Hh

signaling [82]. In vertebrates, several steps from recognition of SHH

to the processing of GLI1-3 (here referred to as GLI) in the limb

involve the cilia and IFT [83]. The cilium is maintained and extended

by transport of particles along the axoneme (reviewed in [60–62]).

The transport of molecules toward the cilia tip, via IFT, is called

‘anterograde trafficking’ (kinesin motor driven) and down the

axoneme toward the base of the cilia is referred to as ‘retrograde

trafficking’ (dynein driven) [62,63].

Signal transduction takes place in the cilia, where PTCH1 is

located in the absence of the ligand and represses the function of

smoothened (SMO), which resides in the repressed state in

cytoplasmic vesicles [84]. Upon activation by SHH, PTCH1 is

internalized and SMO is phosphorylated by a G protein-coupled

receptor kinase (GRK2). This phosphorylation promotes SMO

binding to b-arrestin and Kif3a, a requirement for the trafficking of

SMO into the cilium, where it activates GLI.

Full-length GLIs are present in the cilia in a complex with the

anterograde IFT kinesin motor KIF7 [68]. SUFU promotes the

truncation of GLI into the repressor form (GLIR) and the retrograde

IFT-dynein motor enables GLIR to reach the nucleus. Activation of

SMO relieves the inhibition that SUFU exerts and promotes the

activator form of GLI (GLIA) [85,86]. This process is promoted by

KIF7, which may also block the function of SUFU. GLIA reaches the

nucleus and activates the transcription of Hh targets genes, which

include PTCH.

In the absence of SHH signaling, the processing of GLIs requires

regulated proteolysis by the large multiprotein proteasome com-

plex. The GLIs are sequentially phosphorylated by kinases produ-

cing a phosphopeptide domain that is recognized by b-TrCP, which

recruits an SCF E3 ubiquitin ligase complex. Ubiquitination targets

Gli3 to the proteosome and initiates a limited degradation process,

allowing GLIR to be transported to the nucleus, where it inhibits

transcription [6].


recent report highlights a mutation in the KIF7 gene [67],an ortholog of the Drosophila kinesin-encoding gene Cos-tal2 (Cos2), which is involved in Hh signaling (Figure 2).Reduction of KIF7 leads to a decrease in the number of cellsdisplaying primary cilia and misregulation of GLI. Alter-ation in the GLI3R:GLI3A ratio (as seen in GCPS) may beresponsible for the polydactyly. In mice, KIF7 was shown tobe a core regulator of SHH signaling and a putative ciliarymotor protein [68–70]. Interestingly, Cos2 differs in that ithas lost its kinesin motor function; this is in accord with theobservation that Drosophila do not use cilia for develop-mental signaling.

Mutations that broadly affect cilia structure and func-tion probably disrupt GLI3 processing, leading to polydac-tyly. BBS is a multisystem disorder that results frommutations in any one of 16 different genes and limb defectsusually appear as postaxial polydactyly. BBS is primarily adisease of the basal body [71], a microtubule-based, modi-fied centriole located at the base of the axoneme that servesas a nucleation site for the growth of the axoneme micro-tubules. Thus, cilia assembly (a complex process requiringhundreds of proteins), SHH signaling and GLI3 processingare tightly amalgamated. It seems probable that polydac-tyly in ciliopathies arises for various reasons. Clearly, someof the disease-causing mutations block important steps inthe transduction of the SHH signal. However, other defectsmay be more general and act to disrupt cilia architecture,

thus inhibiting the signaling process [72]. Both routeswould disrupt GLI3 processing, affecting the GLI3R:-GLI3A ratio and creating digit abnormalities, some phe-nocopying GCPS or PHS.

Concluding remarksSeveral mutational mechanisms alter Hh signaling atdifferent points in the pathway often impacting on thedeveloping limb bud. Regulatory mutations affectingShh expression play a central role in generating preaxialpolydactyly. In some cases, regulatory mutations affectingexpression of the closely related molecule, IHH, overridenormal developmental processes to affect adversely thedeveloping limb. It appears that these and an increasinglylarge number of other mutations ultimately disrupt pro-teolytic processing of GLI3, the prime target for Hh signal-ing. These other mutations include those that directlyaffect the structure of GLI3 and those that affect thecilia, a complex cellular structure that has a significantinvestment in Hh signaling. The large number of differentclinical manifestations, that includes the polydactyly phe-notype (at least 80 have been described) [3], is a hallmarkof the hundreds of genes, especially those that affect cilio-genesis, involved in Hh signaling, presenting a consider-able overall target for pathogenetic mutations. It is clearthat further genetic analysis of limb patterning will beinformative in generating insights into not only develop-mental biology, but also the basic biology of the cell.

References1 Abbasi, A.A. (2011) Evolution of vertebrate appendicular structures:

insight from genetic and palaeontological data. Dev. Dyn. 240, 1005–1016

2 Sun, G. et al. (2011) Twelve-year prevalence of common neonatalcongenital malformations in Zhejiang Province, China. World J.Pediatr. 7, 331–336

3 Biesecker, L.G. (2011) Polydactyly: how many disorders and how manygenes? 2010 update. Dev. Dyn. 240, 931–942

4 te Welscher, P. et al. (2002) Mutual genetic antagonism involving GLI3and dHAND prepatterns the vertebrate limb bud mesenchyme prior toSHH signaling. Genes Dev. 16, 421–426

5 Saunders, J.W. and Gasseling, M.T. (1968) Ectodermal–mesenchymalinteractions in the origin of limb symmetry. In Epithelial–Mesenchymal Interactions (Fleischmeyer, R. and Billingham, R.E.,eds), pp. 78–97, Williams & Wilkins

6 Wilson, C.W. and Chuang, P.T. (2010) Mechanism and evolution ofcytosolic Hedgehog signal transduction. Development 137, 2079–2094

7 Wang, B. et al. (2000) Hedgehog-regulated processing of GLI3 producesan anterior/posterior repressor gradient in the developing vertebratelimb. Cell 100, 423–434

8 Litingtung, Y. et al. (2002) Shh and Gli3 are dispensable for limbskeleton formation but regulate digit number and identity. Nature 418,979–983

9 te Welscher, P. et al. (2002) Progression of vertebrate limb developmentthrough SHH-mediated counteraction of GLI3. Science 298, 827–830

10 Towers, M. and Tickle, C. (2009) Growing models of vertebrate limbdevelopment. Development 136, 179–190

11 Zhu, J. et al. (2008) Uncoupling Sonic hedgehog control of pattern andexpansion of the developing limb bud. Dev. Cell 14, 624–632

12 Towers, M. et al. (2008) Integration of growth and specification in chickwing digit-patterning. Nature 452, 882–886

13 Lettice, L.A. et al. (2003) A long-range Shh enhancer regulatesexpression in the developing limb and fin and is associated withpreaxial polydactyly. Hum. Mol. Genet. 12, 1725–1735

14 Galli, A. et al. (2010) Distinct roles of Hand2 in initiating polarity andposterior Shh expression during the onset of mouse limb buddevelopment. PLoS Genet. 6, e1000901

371


15 Nissim, S. et al. (2007) Characterization of a novel ectodermal signalingcenter regulating Tbx2 and Shh in the vertebrate limb. Dev. Biol. 304,9–21

16 Tarchini, B. and Duboule, D. (2006) Control of Hoxd genes’ colinearityduring early limb development. Dev. Cell 10, 93–103

17 Capellini, T.D. et al. (2006) Pbx1/Pbx2 requirement for distal limbpatterning is mediated by the hierarchical control of Hox gene spatialdistribution and Shh expression. Development 133, 2263–2273

18 Amano, T. et al. (2009) Chromosomal dynamics at the Shh locus: limbbud-specific differential regulation of competence and activetranscription. Dev. Cell 16, 47–57

19 Lettice, L.A. et al. (2012) Opposing functions of the ETS factor familydefine Shh spatial expression in limb buds and underlie polydactyly.Dev. Cell 22, 459–467

20 Mao, J. et al. (2009) Fgf-dependent Etv4/5 activity is required forposterior restriction of Sonic Hedgehog and promoting outgrowth ofthe vertebrate limb. Dev. Cell 16, 600–606

21 Zhang, Z. et al. (2009) FGF-regulated Etv genes are essential forrepressing Shh expression in mouse limb buds. Dev. Cell 16, 607–613

22 Zhang, Z. et al. (2010) Preaxial polydactyly: interactions among ETV,TWIST1 and HAND2 control anterior-posterior patterning of the limb.Development 137, 3417–3426

23 Fernandez-Teran, M. and Ros, M.A. (2008) The apical ectodermalridge: morphological aspects and signaling pathways. Int. J. Dev.Biol. 52, 857–871

24 Hill, R.E. (2007) How to make a zone of polarizing activity: insights intolimb development via the abnormality preaxial polydactyly. Dev.Growth Differ. 49, 439–448

25 Albuisson, J. et al. (2011) Identification of two novel mutations in Shhlong-range regulator associated with familial pre-axial polydactyly.Clin. Genet. 79, 371–377

26 Gurnett, C.A. et al. (2007) Two novel point mutations in the long-rangeSHH enhancer in three families with triphalangeal thumb andpreaxial polydactyly. Am. J. Med. Genet. A 143, 27–32

27 Klopocki, E. et al. (2008) A microduplication of the long range SHHlimb regulator (ZRS) is associated with triphalangeal thumb-polysyndactyly syndrome. J. Med. Genet. 45, 370–375

28 Sun, M. et al. (2008) Triphalangeal thumb-polysyndactyly syndromeand syndactyly type IV are caused by genomic duplications involvingthe long range, limb-specific SHH enhancer. J. Med. Genet. 45, 589–595

29 Wieczorek, D. et al. (2010) A specific mutation in the distant sonichedgehog (SHH) cis-regulator (ZRS) causes Werner mesomelicsyndrome (WMS) while complete ZRS duplications underlie Haastype polysyndactyly and preaxial polydactyly (PPD) with or withouttriphalangeal thumb. Hum. Mutat. 31, 81–89

30 Furniss, D. et al. (2008) A variant in the sonic hedgehog regulatorysequence (ZRS) is associated with triphalangeal thumb andderegulates expression in the developing limb. Hum. Mol. Genet. 17,2417–2423

31 Farooq, M. et al. (2010) Preaxial polydactyly/triphalangeal thumb isassociated with changed transcription factor-binding affinity in afamily with a novel point mutation in the long-range cis-regulatoryelement ZRS. Eur. J. Hum. Genet. 18, 733–736

32 Semerci, C.N. et al. (2009) Homozygous feature of isolatedtriphalangeal thumb-preaxial polydactyly linked to 7q36: nophenotypic difference between homozygotes and heterozygotes. Clin.Genet. 76, 85–90

33 Masuya, H. et al. (2007) A series of ENU-induced single-basesubstitutions in a long-range cis-element altering Sonic hedgehogexpression in the developing mouse limb bud. Genomics 89, 207–214

34 Zhao, J. et al. (2009) HnRNP U mediates the long-range regulation ofShh expression during limb development. Hum. Mol. Genet. 18, 3090–3097

35 Lettice, L.A. et al. (2008) Point mutations in a distant sonic hedgehogcis-regulator generate a variable regulatory output responsible forpreaxial polydactyly. Hum. Mol. Genet. 17, 978–985

36 Dunn, I.C. et al. (2011) The chicken polydactyly (Po) locus causes allelicimbalance and ectopic expression of Shh during limb development.Dev. Dyn. 240, 1163–1172

37 Maas, S.A. et al. (2011) Identification of spontaneous mutations withinthe long-range limb-specific Sonic hedgehog enhancer (ZRS) that alterSonic hedgehog expression in the chicken limb mutantsoligozeugodactyly and Silkie breed. Dev. Dyn. 240, 1212–1222

372

38 Dorshorst, B. et al. (2010) Genomic regions associated with dermalhyperpigmentation, polydactyly and other morphological traits in theSilkie chicken. J. Hered. 101, 339–350

39 Park, K. et al. (2008) Canine polydactyl mutations with heterogeneousorigin in the conserved intronic sequence of LMBR1. Genetics 179,2163–2172

40 Maas, S.A. and Fallon, J.F. (2005) Single base pair change in the long-range Sonic hedgehog limb-specific enhancer is a genetic basis forpreaxial polydactyly. Dev. Dyn. 232, 345–348

41 Lettice, L.A. et al. (2011) Enhancer-adoption as a mechanism of humandevelopmental disease. Hum. Mutat. 32, 1492–1499

42 Niedermaier, M. et al. (2005) An inversion involving the mouse Shhlocus results in brachydactyly through dysregulation of Shhexpression. J. Clin. Invest. 115, 900–909

43 Sagai, T. et al. (2005) Elimination of a long-range cis-regulatory modulecauses complete loss of limb-specific Shh expression and truncation ofthe mouse limb. Development 132, 797–803

44 Ianakiev, P. et al. (2001) Acheiropodia is caused by a genomic deletionin C7orf2, the human orthologue of the Lmbr1 gene. Am. J. Hum.Genet. 68, 38–45

45 Kronenberg, H.M. (2003) Developmental regulation of the growthplate. Nature 423, 332–336

46 Koziel, L. et al. (2005) GLI3 acts as a repressor downstream of Ihh inregulating two distinct steps of chondrocyte differentiation.Development 132, 5249–5260

47 Hellemans, J. et al. (2003) Homozygous mutations in IHH causeacrocapitofemoral dysplasia, an autosomal recessive disorder withcone-shaped epiphyses in hands and hips. Am. J. Hum. Genet. 72,1040–1046

48 Guo, S. et al. (2010) Missense mutations in IHH impair IndianHedgehog signaling in C3H10T1/2 cells: implications forbrachydactyly type A1, and new targets for Hedgehog signaling.Cell. Mol. Biol. Lett. 15, 153–176

49 Gao, B. et al. (2001) Mutations in IHH, encoding Indian hedgehog,cause brachydactyly type A-1. Nat. Genet. 28, 386–388

50 Klopocki, E. et al. (2011) Copy-number variations involving the IHHlocus are associated with syndactyly and craniosynostosis. Am. J.Hum. Genet. 88, 70–75

51 Yang, Y. et al. (1998) Evidence that preaxial polydactyly in theDoublefoot mutant is due to ectopic Indian Hedgehog signaling.Development 125, 3123–3132

52 Hayes, C. et al. (1998) Sonic hedgehog is not required for polarisingactivity in the Doublefoot mutant mouse limb bud. Development 125,351–357

53 Babbs, C. et al. (2008) Polydactyly in the mouse mutant Doublefootinvolves altered GLI3 processing and is caused by a large deletion in cisto Indian hedgehog. Mech. Dev. 125, 517–526

54 Hui, C.C. and Angers, S. (2011) GLI proteins in development anddisease. Annu. Rev. Cell Dev. Biol. 27, 513–537

55 Shin, S.H. et al. (1999) GLI3 mutations in human disorders mimicDrosophila cubitus interruptus protein functions and localization.Proc. Natl Acad. Sci. U.S.A 96, 2880–2884

56 Johnston, J.J. et al. (2010) Molecular analysis expands the spectrum ofphenotypes associated with GLI3 mutations. Hum. Mutat. 31, 1142–1154

57 Naruse, I. et al. (2010) Birth defects caused by mutations in humanGLI3 and mouse Gli3 genes. Congenit. Anom. 50, 1–7

58 Biesecker, L.G. (2008) The Greig cephalopolysyndactyly syndrome.Orphanet J. Rare. Dis. 3, 10

59 Hill, P. et al. (2007) The molecular basis of Pallister Hall associatedpolydactyly. Hum. Mol. Genet. 16, 2089–2096

60 Hill, P. et al. (2009) A SHH-independent regulation of Gli3 is asignificant determinant of anteroposterior patterning of the limbbud. Dev. Biol. 328, 506–516

61 Bettencourt-Dias, M. et al. (2011) Centrosomes and cilia in humandisease. Trends Genet. 27, 307–315

62 Gerdes, J.M. et al. (2009) The vertebrate primary cilium indevelopment, homeostasis, and disease. Cell 137, 32–45

63 Wong, S.Y. and Reiter, J.F. (2008) The primary cilium at thecrossroads of mammalian hedgehog signaling. Curr. Top. Dev. Biol.85, 225–260

64 Goetz, S.C. and Anderson, K.V. (2010) The primary cilium: a signallingcentre during vertebrate development. Nat. Rev. Genet. 11, 331–344


65 Haycraft, C.J. et al. (2005) GLI2 and GLI3 localize to cilia and requirethe intraflagellar transport protein polaris for processing and function.PLoS Genet. 1, e53

66 Hildebrandt, F. et al. (2011) Ciliopathies. N. Engl. J. Med. 364, 1533–1543

67 Dafinger, C. et al. (2011) Mutations in KIF7 link Joubert syndromewith Sonic Hedgehog signaling and microtubule dynamics. J. Clin.Invest. 121, 2662–2667

68 Liem, K.F., Jr et al. (2009) Mouse Kif7/Costal2 is a cilia-associatedprotein that regulates Sonic hedgehog signaling. Proc. Natl Acad. Sci.U.S.A 106, 13377–13382

69 Endoh-Yamagami, S. et al. (2009) The mammalian Cos2 homolog Kif7plays an essential role in modulating Hh signal transduction duringdevelopment. Curr. Biol. 19, 1320–1326

70 Cheung, H.O. et al. (2009) The kinesin protein KIF7 is a criticalregulator of Gli transcription factors in mammalian hedgehogsignaling. Sci. Signal. 2, ra29

71 Zaghloul, N.A. and Katsanis, N. (2009) Mechanistic insights intoBardet–Biedl syndrome, a model ciliopathy. J. Clin. Invest. 119, 428–437

72 Ocbina, P.J. et al. (2011) Complex interactions between genescontrolling trafficking in primary cilia. Nat. Genet. 43, 547–553

73 Gallet, A. (2011) Hedgehog morphogen: from secretion to reception.Trends Cell Biol. 21, 238–246

74 Murone, M. et al. (1999) Sonic hedgehog signaling by the patched-smoothened receptor complex. Curr. Biol. 9, 76–84

75 Aza-Blanc, P. et al. (1997) Proteolysis that is inhibited by hedgehogtargets Cubitus interruptus protein to the nucleus and converts it to arepressor. Cell 89, 1043–1053

76 Hooper, J.E. and Scott, M.P. (2005) Communicating with Hedgehogs.Nat. Rev. Mol. Cell Biol. 6, 306–317

77 Jiang, J. (2006) Regulation of Hh/Gli signaling by dual ubiquitinpathways. Cell Cycle 5, 2457–2463

78 Ruel, L. and Therond, P.P. (2009) Variations in Hedgehog signaling:divergence and perpetuation in SUFU regulation of Gli. Genes Dev. 23,1843–1848

79 Hooper, J.E. (2003) Smoothened translates Hedgehog levels intodistinct responses. Development 130, 3951–3963

80 Jia, J. et al. (2003) Smoothened transduces Hedgehog signal byphysically interacting with Costal2/Fused complex through its C-terminal tail. Genes Dev. 17, 2709–2720

81 Marks, S.A. and Kalderon, D. (2011) Regulation of mammalian Gliproteins by Costal 2 and PKA in Drosophila reveals Hedgehog pathwayconservation. Development 138, 2533–2542

82 Basto, R. et al. (2006) Flies without centrioles. Cell 125, 1375–138683 Oh, E.C. and Katsanis, N. (2012) Cilia in vertebrate development and

disease. Development 139, 443–44884 Lum, L. and Beachy, P.A. (2004) The Hedgehog response network:

sensors, switches, and routers. Science 304, 1755–175985 Chen, M.H. et al. (2009) Cilium-independent regulation of Gli protein

function by Sufu in Hedgehog signaling is evolutionarily conserved.Genes Dev. 23, 1910–1928

86 Humke, E.W. et al. (2010) The output of Hedgehog signaling iscontrolled by the dynamic association between Suppressor of Fusedand the Gli proteins. Genes Dev. 24, 670–682

87 Heus, H.C. et al. (1999) A physical and transcriptional map of thepreaxial polydactyly locus on chromosome 7q36. Genomics 57, 342–351

88 Radhakrishna, U. et al. (1999) The phenotypic spectrum of GLI3morphopathies includes autosomal dominant preaxial polydactylytype-IV and postaxial polydactyly type-A/B; no phenotype predictionfrom the position of GLI3 mutations. Am. J. Hum. Genet. 65, 645–655

373

Regulation of chromatin structure bylong noncoding RNAs: focus on naturalantisense transcriptsMarco Magistri1, Mohammad Ali Faghihi1, Georges St Laurent III2 and ClaesWahlestedt1

1 Department of Psychiatry and Behavioral Sciences, and Center for Therapeutic Innovation, University of Miami Miller School of

Medicine, Miami, FL 33136, USA2 St Laurent Institute, Cambridge, MA 02139, USA

Review

In the decade following the publication of the HumanGenome, noncoding RNAs (ncRNAs) have reshaped ourunderstanding of the broad landscape of genome regu-lation. During this period, natural antisense transcripts(NATs), which are transcribed from the opposite strandof either protein or non-protein coding genes, havevaulted to prominence. Recent findings have shown thatNATs can exert their regulatory functions by acting asepigenetic regulators of gene expression and chromatinremodeling. Here, we review recent work on the mecha-nisms of epigenetic modifications by NATs and theiremerging role as master regulators of chromatin states.Unlike other long ncRNAs, antisense RNAs usually reg-ulate their counterpart sense mRNA in cis by bridgingepigenetic effectors and regulatory complexes at specificgenomic loci. Understanding the broad range of effectsof NATs will shed light on the complex mechanisms thatregulate chromatin remodeling and gene expression indevelopment and disease.

Chromatin and ncRNAs: coupling structure anddynamic informationHistone octamer proteins and their tightly associated146 bp of DNA form the nucleosome, the structural andfunctional core of eukaryotic chromatin. Specific combina-tions of DNA and histone post-translational modificationpatterns lead to diverse changes in chromatin states anddistinct functional genomic outputs [1,2]. DNA methyla-tion is perhaps the best-characterized chemical modifica-tion of DNA that impacts chromatin structure andfunction. In mammalian cells, DNA methylation occurson cytosine residues in CpG dinucleotides and correlateswith transcriptional repression. Promoter regions have ahigh density of CpG dinucleotides, whose methylationstate dictates the transcriptional activity of the gene.Chromatin structure and function are also regulated bypost-translational modifications of histone proteins.Histone-modifying enzymes are protein complexes thatdynamically recognize (read), add (write), remove (erase)or replace various chromatin modifications. Examplesof writers include EZH2, the catalytic subunit of the

Corresponding author: Wahlestedt, C. ([email protected]).Keywords: antisense RNA; epigenetics; transcriptome; chromatin; ncRNAs; NATs.

0168-9525/$ – see front matter � 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.101

polycomb repressive complex 2 (PRC2) which is responsiblefor the trimethylation of histone H3 at lysine 27(H3K27me3), and G9a, the histone methyltransferase(HMT) that catalyzes the di- or trimethylation of histoneH3 at lysine 9 (H3K9me2/3) [2,3]. ‘Erasers’, such as thedemethylase LSD1, specifically remove particular histonemarks [4]. ‘Readers’ function as interpreters and includeeffector proteins that recognize specific histone marks andtransduce this information into a genomic response [5–7].Writers, erasers and readers have to work in concert withtheir action tightly coordinated to produce an integratedregulatory effect. Recent discoveries of frequent interactionsbetween ncRNAs and chromatin strongly suggest pivotalroles for ncRNAs in orchestrating the function of theseprotein complexes. How chromatin-modifying enzymesspecifically recognize and bind to their target loci stillremains mysterious. One tempting hypothesis is that localtranscription of low-abundance ncRNAs might be the keyevent in the locus-specific recruitment of different reader,eraser and writer complexes.

Dynamic transcriptional regulation at the level ofchromatinThe classic division of chromatin into two opposing states,gene-rich euchromatin versus the silenced, tightly packedheterochromatin, has been challenged by recent discover-ies suggesting the existence of different chromatin states invarious organisms, including humans [8–13]. The two-statechromatin model assumed that the chromatin structure wasessentially an on/off switch whereby a gene was either activeor repressed, without any intermediate states. By contrast,a dynamic chromatin state varies between these extremesand represents an integration of information derivedfrom an intricate network of histone-modifying enzymes,chromatin-binding proteins, transcription factors andchromatin-associated RNA transcripts [14,15].

Globally, RNA, which is an integral structural compo-nent of chromatin, is required for the maintenance ofcompact chromatin fibers [16]. RNA has also been shownto be involved in the maintenance of higher-order chroma-tin structure at pericentric heterochromatin in mouse cells[17], highlighting the important contribution of RNA to theregulation of chromatin structure and function. Recently, a

6/j.tig.2012.03.013 Trends in Genetics, August 2012, Vol. 28, No. 8 389




genome-wide next-generation RNA sequencing approachwas used to identify the RNA content of chromatin inhuman fibroblasts [18]. Surprisingly, more than 70% ofthe sequencing reads aligned with intergenic and intronicregions of the human genome. Functional experiments on asmall number of chromatin RNA transcripts imply aninteraction with chromatin-modifying enzymes, whichraises the possibility of a functional role of these tran-scripts in chromatin regulation [18].

Further support for the notion that RNA regulateschromatin comes from a small but growing number ofantisense transcripts [19,20] and other long ncRNAs[21–24] that interact with epigenetic effectors to orches-trate chromatin remodeling and epigenetic changes duringdevelopment and disease. Cell type-specific ncRNAs inter-act with ubiquitously expressed regulatory proteins toform RNA–protein complexes that can interact with his-tones, DNA, other RNAs, and other chromatin-modifyingcomplexes, to dynamically coordinate changes in geneexpression programs (reviewed in [25]). RNA motifs com-posed of primary sequence information coupled to highlydiverse secondary structure elements underlie the com-plexity and dynamic nature of these interactions. Thecombination of structural and regulatory elements of thechromatin contributes to the acquisition of a specific chro-matin state and is key to understanding the mechanismsgoverning the organization of the human genome and theregulation of gene expression.

Natural antisense transcripts (NATs)A substantial fraction of the mammalian genome is tran-scribed in the form of non-protein-coding RNAs [26–29]that have important regulatory functions in development,differentiation [30–32] and human diseases [19,33–35].Although there is no unequivocal classification of

mRNA

Figure 1. Epigenetic regulation induced by NATs. NATs regulate the epigenetic lands

secondary structure permits the NAT to interact with different chromatin-modifying enzy

specific epigenetic modifications of the nearby chromatin (green and red flags). Locus sp

and the DNA.

390

non-protein-coding transcripts found in the mammaliangenome, ncRNAs can be roughly divided on the basis of sizeinto short ncRNAs (less than 200 nt in length) and longncRNAs (lncRNAs) [36,37]. Short ncRNAs include miR-NAs, piRNA, endogenous siRNAs and snoRNAs, whichhave been extensively reviewed elsewhere [38–40] andtherefore will not be discussed here. lncRNAs are a hetero-geneous group of RNAs transcribed from intergenic [41] orintragenic regions [42], which vary in length from 200 nt toover 100 kb [37]. NATs are a class of lncRNA molecules [43]that are transcribed from the opposite DNA strand of otherRNA transcripts with which they share sequence comple-mentarity [26,44–46]. Antisense RNAs could potentiallyexert a regulatory function on their corresponding sensemRNA at different levels [47]. NAT regulatory mecha-nisms fall into four main categories: mechanisms relatedto transcription (including epigenetic interactions), RNA–DNA interactions, RNA–RNA interactions in the nucleusand RNA–RNA interactions in the cytoplasm [48]. Amongthese four mechanisms, RNA-mediated epigenetic modifi-cation has received an increasing amount of experimentalsupport. Antisense transcripts can provide a scaffold foreffector proteins to interact with DNA and chromatin in alocus specific way.

NATs: cis-acting epigenetic silencersUnlike transcription factors, many histone-modifyingenzymes lack specific DNA-binding domains [15]. Basedon this important observation, it has been postulated thatncRNAs might interact with ubiquitously expressed his-tone-modifying enzymes providing the required level ofbinding specificity (Figure 1).

In mammalian cells, dosage compensation offeredthe first characterized examples of antisense lncRNA-mediated chromatin remodeling and gene silencing [49].

NAT

TRENDS in Genetics

cape of genomic loci from which they are transcribed (cis regulation). A specific

mes (green, red and purple shapes), thereby coordinating their action and directing

ecificity may be achieved through sequence-specific interactions between the NAT


One of the two mammalian female X chromosomes isinactivated via an RNA-based mechanism in which theantisense ncRNA Xist, expressed from the X chromosome,mediates the recruitment of polycomb repressive complex2 (PRC2) that in turn catalyzes the heterochromatinizationof the entire X chromosome [21,49].

A similar mechanism of RNA-based epigenetic regula-tion of gene expression was found to silence variousimprinted mammalian alleles. Most imprinted mammali-an genes associate in clusters [50], and the presence ofNATs is a common feature of these loci [26,51,52]. Forexample, Air is an imprinted, paternally expressedlncRNA transcribed from the second intron of the mouseinsulin-like growth factor 2 receptor (Igf2r) gene [53]. Inmouse placenta, expression of Air induces the epigeneticsilencing of both the paternal allele of Igf2r, from which Airis expressed, and neighboring upstream genes. Althoughthe transcription unit of Air only overlaps with Igf2r, Airrecognizes and binds to the promoter regions of its neigh-boring genes. The molecular mechanisms underlying theseinteractions have not been clarified and might rely on aspecific secondary structure adopted by Air or on theinvolvement of mediator proteins. The Air ncRNA interac-tion with the promoter of upstream genes in the clusterresults in the recruitment of the HMT G9a, which gener-ates a repressive chromatin state [54]. The ability of Air tosilence non-overlapping genes in cis is reminiscent of Xist-induced X-chromosome inactivation. In the case of Xist,epigenetic silencing spreads through the entire X chromo-some, in contrast to the case of imprinted genes whereepigenetic silencing spreads only to a significant portion ofthe locus. The extent of the spread of epigenetic silencingmay be related to the presence of insulator elements in theDNA sequence and their association with the CCCTC-binding factor (CTCF) [55], a multifunctional protein thatenables insulator function and facilitates higher-orderchromatin interactions [56].

Another interesting example of imprinting regulation isthe antisense ncRNA transcript Kcnq1ot1, which is tran-scribed from intron 10 of the imprinted gene Kcnq1 [57].This paternally expressed NAT silences Kcnq1 in cis, aswell as neighboring genes on the paternal chromosome, bycontrolling chromatin and DNA modifications at that locus[58]. Kcnq1ot1 mediates the allele-specific deposition of therepressive histone marks H3K27me3 and H3K9me3 bydirect interaction with the PRC2 components Ezh2,Suz12 and the H3K9-specific HMT, G9a [58,59]. Similarto the situation with Air, the epigenetic changes caused byKcnq1ot1 occur outside the sequence boundary of thislncRNA, emanating bidirectionally from the Kcnq1 locus.Some of the imprinted genes in this cluster, althoughsilenced, lack Kcnq1ot1 enrichment [60].

Based on these examples, cis-acting NATs may remainlinked to their transcription loci but exert their regulatoryfunction on the neighboring genes via the recruitment ofdifferent proteins and the organization of higher-orderchromatin structures. The presence or absence of insulatorelements may influence the extension of chromatin altera-tions in each locus [61]. In this hypothetical scenario, theantisense transcript acts as a scaffold for the recruitmentof chromatin-modifying enzymes, initiating events that

expand in both directions to the entire chromosome, as inthe case of X-chromosome inactivation, or to the entireimprinted cluster. In this model, the recruitment of chro-matin-modifying complexes is dependent on antisense RNAexpression, whereas the expansion of these effects dependson the subsequent involvement of DNA insulator elements.

Taken together, these imprinting studies imply that alarge portion of NATs could exert their regulatory role bybinding to chromatin enzymes and recruiting them in cis totheir targets. In favor of this hypothesis, RNA immuno-precipitation (RIP) experiments targeting Ezh2, coupledwith directional RNA sequencing (RIP-seq), revealed thatthe PRC2 complex associates with almost 10 000 RNAs inmouse embryonic stem cells (mESCs) [62]. Almost 3000 ofthese RNAs are NATs, and around 1000 are bidirectionaltranscripts. Interestingly, some NATs linked to disease lociwere found to immunoprecipitate with Ezh2, such asHspa1a-AS, Bgn-AS, Foxn2-AS and Malat1-AS [62], sug-gesting that ncRNAs target the PRC2 complex to chroma-tin. Unfortunately, in this study RIP-sequencing data werenot integrated with ChIP-sequencing data, and theauthors did not investigate the possible overlap betweenthe genomic localization of PRC2 and the immunoprecipi-tated RNA transcripts. Nevertheless, the presence of NATsassociated with PRC2 suggests the importance of theseRNA transcripts in mediating the recruitment of chroma-tin-modifying complexes.

Accumulating evidence implies that the interaction ofNATs with EZH2 and other HMTs is more common thanpreviously believed, contributing to the epigenetic regula-tion of many autosomal loci. In addition to the finding thatlncRNAs interact with histone-modifying enzymes, theyhave also been shown to play a role in DNA methylation.ANRIL is a NAT that overlaps with the INK4b/ARF/INK4a locus [63]. This locus encodes two cyclin-dependentkinase inhibitors, p15INK4b and p16INK4a, and a regula-tor of the p53 pathway, ARF [64]. The ANRIL transcriptalso overlaps with several polymorphisms discovered ingenome-wide association studies (GWAS) that correspondto increased risk for cardiovascular disease and diabetes[65]. An initial study showed that ANRIL expressioninversely correlates with p15INK4b expression in acutelymphoblastic leukemia and acute myeloid leukemia. Itwas demonstrated that ANRIL mediates the silencing ofthe tumor suppressor gene p15INK4b via DNA methyla-tion and heterochromatin formation in a Dicer-indepen-dent manner, thus excluding the involvement ofendogenous small RNAs in the process [20]. Later, itwas shown that ANRIL, EZH2 and the PRC1 componentCBX7 are upregulated in several prostate cancer tissuespecimens with an inverse correlation to the expression ofp16INK4a [19]. Moreover, ANRIL physically associateswith CBX7 and colocalizes with EZH2 and CBX7 to thepromoter region of p16INK4a in prostate cancer cells.Thus, the NAT ANRIL participates in the silencing oftwo very important tumor-suppressor genes via two dis-tinct mechanisms, and the alteration of these regulatorycircuits has been found in different types of cancer.

Evidence of a functional interaction between NATs andPRC2 comes from a study on the cyclin-dependent kinaseinhibitor p21, another important tumor-suppressor gene.

391


Bidirectional transcription at the p21 locus generates anantisense transcript and p21 mRNA. The p21 NATrepresses p21 mRNA in a process involving the depositionof the repressive histone mark H3K27me3 [66]. This mech-anism is AGO1-independent, further excluding involve-ment of endogenous small RNA mediators in theprocess. Thus, depending on the cellular context, an im-balanced expression of NATs can result in the silencing oractivation of partner protein-coding genes, providing aninteresting potential mechanism to explain the aberrantupregulation or silencing of cancer-related genes.

Among the different body tissues, the brain expresses ahigh abundance of ncRNAs [67]. Discovered in the devel-oping mouse forebrain, the NAT Evf2 is transcribed fromthe ultra-conserved Dlx5/6 region encoding the homeodo-main transcription factors DLX5 and DLX6 [68]. Evf2forms a complex with the DLX-2 homeodomain proteinto function as a transcriptional coactivator that increasesDlx5/6 enhancer activity [68]. Recently, studies of an Evf2loss-of-function mouse revealed more complex regulatoryfunctions of this NAT in the development of GABAergicinterneurons [69]. Through antisense interference, Evf2negatively regulates the expression of Dlx6 mRNA. More-over, Evf2 exerts a silencing effect on Dlx5 by recruitingDLX and the methyl CpG binding protein 2 (MECP2) to theenhancer region [69]. Mutant Evf2 mice have reducednumbers of GABAergic interneurons in the dentate gyrusof the early postnatal hippocampus and reduced synapticinhibition in the adult hippocampus [69]. This study high-lights the importance of NATs in regulating gene expres-sion during neuronal maturation and raises the possibilityof a more extended role of antisense transcripts in centralnervous system development.

In recent studies, repeat expansion diseases have oftenbeen characterized by bidirectional transcription overlap-ping the repeat region [70]. Spinocerebellar ataxia type 7(SCA7) is a neurological disorder associated with a poly-glutamine repeat (CAG) expansion in the ataxin-7 gene(ATXN7) [71]. SCAANT1 is a 1.4 kb long NAT overlappingthe ATXN7 gene that is actively transcribed upon CTCFbinding to target sites flanking the CAG repeat region [72].SCAANT1 expression is associated with an increased levelof the repressive H3K27me3 mark and a decreased level ofthe activating histone H3 acetylation mark at the ATXN7promoter. The pathological increase of CAG expansion isaccompanied by reduced expression of SCAANT1 ncRNAand increased expression of ATXN7 mRNA, showing aninverse relationship between the NAT and its partnersense transcript [72]. This study reveals an interestingNAT-based mechanism that is potentially involved inSCA7 pathogenesis.

NATs can silence gene expression in cis, making themattractive therapeutic targets to achieve specific upregula-tion of gene expression. It has recently been shown thatbrain-derived neurotrophic factor (BDNF) is under theepigenetic control of an antisense transcript, BDNF-AS[73]. Depletion of BDNF-AS can alter chromatin marks atthe BDNF locus and upregulate locus-specific gene expres-sion. This study also described NAT-mediated endogenousgene suppression of glia-derived neurotrophic factor(GDNF) and ephrin B2 receptor (EPHB2), suggesting that

392

antisense RNA-mediated transcriptional suppression is afrequent phenomenon [73]. Considering the frequency withwhich NATs are transcribed, these examples may repre-sent only the tip of the iceberg, with the regulatory role ofNATs in epigenetic modifications representing a morecommon event than previously imagined.

NATs: cis-acting epigenetic activatorsThe first observation that lncRNAs are involved in epige-netic gene activation stems from dosage compensationstudies in Drosophila, where the imbalanced presence ofX chromosomes in the sexes necessitates compensation bya twofold upregulation of all the genes on the single male Xchromosome [74]. Two lncRNAs, roX1 and roX2, play afundamental role in the correct targeting of the DosageCompensation Complex to many different binding sites onthe male X chromosome, which results in transcriptionalupregulation. These and other examples provide accumu-lating evidence of a central role for NATs in the epigeneticactivation of specific loci on a genome-wide basis, providinginsight into the biological language of lncRNAs [75].

Following these initial findings in Drosophila, severalother examples of ncRNAs in vertebrates have beenreported. Among these, a ncRNA expression-profile studyof mESC differentiation identified several ncRNAs associat-ed with important mESC protein-coding genes [30]. Amongthese ncRNAs, two concordantly upregulated NATs coloca-lized with their sense mRNA partners during a specific stepof mESC differentiation. The NATs, named Evx1as andHoxb5/6as, are transcribed from the opposite DNA strandof Evx1 and Hoxb5/6, respectively [30]. Using RNA-ChIP,the authors found that these NATs immunoprecipitate withH3K4me3, demonstrating a physical interaction with atranscriptional activation mark [30]. Furthermore, RNA-IP experiments showed direct interaction between Evx1asand Hoxb5/6as with MLL1, the mammalian trithorax pro-tein responsible for H3K4me3 in the promoter region ofseveral developmental genes [30]. This finding raise thepossibility that these NATs are involved in the epigeneticactivation of their mRNA partners during differentiation.

In another example of epigenetic activation, the chro-matin-associated ncRNA transcript termed Intergenic10,located in the region 30 to FANK1 in the opposite orienta-tion, overlaps with the protein-coding gene ADAM12 [18].The expression of Intergenic10 correlates positively withthe expression of the neighboring protein coding genes.siRNA depletion of Intergenic10 resulted in the concordantdownregulation of ADAM12 and FANK1 and a decrease inthe levels of the active chromatin mark H3K4me2 in thepromoter regions of the downregulated genes [18]. NATsmay bind and recruit in cis chromatin-modifying enzymesto establish a locus-specific transcriptionally active chro-matin state.

Taken together, these observations show that a chro-matin-associated ncRNA can act as a chromatin remodelerin cis to regulate positively or negatively the expression ofneighboring genes.

LncRNAs: trans-acting chromatin remodelersControversy still exists regarding the functional signifi-cance of many long and short ncRNA transcripts that are


pervasively transcribed in the human genome, and partic-ularly those originating in the proximity of the transcrip-tional start sites (TSSs) of many active genes. However,cell-, tissue- and developmental-specific transcription oflncRNAs argues against the simplistic assumption thatthese arise from transcriptional noise. Moreover, removalof these ncRNAs often correlates with functional conse-quences. Aside from NATs, the human genome producesmany other classes of lncRNAs. For example, the analysisof chromatin signatures revealed a family of over 1000highly conserved lncRNAs, termed large intergenic non-coding RNAs (lincRNAs), that contain sense and antisensemembers with many potential regulatory functions [41].RNA-IP experiments of the PRC2 complex componentEZH2 followed by hybridization to a custom exon-tilingarray for 900 human lincRNAs showed that almost 30% ofexpressed lincRNAs physically interact with PRC2 [76].Immunoprecipitation of lncRNAs with EZH2 is highlysuggestive of functional roles of these transcripts throughthe PRC2 pathway. The catalog of lincRNAs encoded in thehuman genome as well as the understanding of their rolesin mediating the function of chromatin-modifying com-plexes is rapidly expanding.

Unlike most NATs, lincRNAs exert their regulatoryroles in trans to alter chromatin shape and gene expressionat distant loci. HOTAIR is a lincRNA encoded in antisenseorientation in the HOX-C cluster on chromosome 12 that isnecessary for the correct expression of the HOX-D clusterof genes on chromosome 2 [23]. HOTAIR associates withthe PRC2 complex to silence and maintain a large domainof heterochromatin in the HOX-D gene cluster. Genomicregions flanking HOX-D contain high levels of H3K27me3and low levels of H3K4me2/3 [77]. It was shown in severalcellular systems that HOTAIR acts as a modular scaffoldfor the recruitment of both PRC2 and LSD1, the catalyticsubunit of the repressor complex CoREST/REST, which inturn coordinate the methylation of H3K27me3 and de-methylation of H3K4me2/3, respectively, in trans at manydifferent target genomic regions [78]. Interestingly, alteredHOTAIR expression in primary breast tumors is apowerful predictor of metastasis and poor prognosis [35].Inhibition of HOTAIR expression in cancer cells reducesinvasiveness and metastatic potential, consistent with itsphysiological function in dictating chromatin states offibroblast during development [35].

A loss-of-function study in mESCs produced a functionalcharacterization of a large number of lincRNAs [32]. It wasshown that lincRNAs maintain the pluripotent state andrepress lineage programs in mESCs via trans-actingmechanisms of global gene expression regulation. mESCslincRNAs associate with 12 different chromatin complexesinvolved in different aspects of epigenetic regulation, suchas writers (Tip60/P400, Prc2, Setd8, Eset, Suv39), readers(Prc1, Cbx1, Cbx3) and erasers (Jarid1b, Jarid1c, Hdac1)[32]. Seventy-four lincRNAs associate with at least one ofthese complexes and several lincRNAs associate withfunctionally related chromatin complexes [32]. BecauselincRNAs physically associate with multiple chromatin-regulatory proteins, they may serve as scaffolds tobridge together similar complexes into larger functionalunits.

Similar to NATs, lncRNAs can be involved in theepigenetic activation of specific loci. HOTTIP is a spliced,polyadenylated lncRNA transcribed in the opposite orien-tation from the 50 end of the HOXA locus [79]. HOTTIPknockdown in fibroblasts and chick embryos resulted indecreased HOXA expression, affecting a region 40 kbdownstream from the 50 end of the HOXA locus. Thisrepressive effect depends on the distance from the HOTTIPgene; genes in close proximity exhibit a greater decrease inexpression levels [79]. These changes in gene expressioncorrelated with a global loss of H3K4me3 and H3K4me2across the affected region. RIP experiments demonstrateddirect binding of HOTTIP with WDR5, a component ofthe core complex responsible for H3K4 methylation [79].Ectopically expressed HOTTIP does not induce the expres-sion of 50 HOXA genes in fibroblast cells, implying a cismechanism of action for HOTTIP. Artificial recruitment ofHOTTIP RNA upstream of a silent GAL4 promoter canboost transcription in the presence of WDR5, confirmingthe cis effect of the HOTTIP transcript in the proximity ofthe target genes [79].

Mechanisms of lncRNA interactions with chromatin andchromatin-modifying enzymesThe ability of lncRNAs to function as scaffolds for therecruitment of different yet functionally related enzymesand to confer locus specificity to these enzymes raises twoimmediate questions: what mediates the interactionsbetween ncRNAs and specific chromatin enzymes, andwhat is the language of molecular rules governing them?One of the first hints of a mechanism governing ncRNA–enzyme interactions came from studies of the X-chromo-some inactivation phenomenon. It was shown that a novelncRNA termed Repeat A (RepA) directly binds to EZH2 andfunctions in the recruitment of PRC2 to the X chromosome[21]. RepA is a 1.6 kb ncRNA transcribed within Xist and iscomposed of 7.5 tandem repeat sequences that fold into twoconserved stem–loop structures crucial for EZH2 binding[21]. These initial findings were subsequently confirmed byan independent study showing that short RNAs 50–200 ntin length are transcribed from the 50 end of polycomb targetgenes [80]. Interestingly, these short RNAs have stem–loopstructures similar to RepA and are able to bind the PRC2component SUZ12 [80]. Similarly, the antisense Kcnq1ot1has a conserved RNA repeat that was shown to be neces-sary for the epigenetic silencing of imprinted genes [60].These studies imply that lncRNAs assume specific second-ary structures offering different docking sites for differentenzymes.

In large part, how NATs bind to target genes to guidechromatin-modifying enzymes to specific loci remains un-explained (Figure 2). Two recently developed methods forprofiling the genome-wide occupancy of lncRNAs haveallowed high-throughput identification of RNA–DNA andRNA–protein interactions [81,82]. The application of thesenew techniques may represent a promising tool to explorethe mechanisms governing ncRNA–chromatin interac-tions, as shown by the informative analysis performedon a few known lncRNAs (roX2, TERC and HOTAIR)[82]. Interestingly, among the discovered DNA bindingsites of both rox2 and TERC, specific consensus DNA

393

(a)

(b)

(c)

Sense mRNA

Antisense RNA

Sense mRNA

Antisense RNA

Sense mRNA

Antisense RNA

TRENDS in Genetics

Figure 2. Molecular mechanisms of NATs and chromatin interactions. Two types

of interactions are necessary for any ncRNA-induced chromatin modification to

take place: between an antisense RNA molecule and a chromatin-modifying

enzyme (CME), and between either a CME and DNA or antisense RNA and DNA.

The second type of interaction is necessary to confer sequence specificity to the

chromatin modifications. Each one of these interactions (RNA–protein, RNA–DNA

or DNA–protein) can either take place through sequence motifs (digital Watson–

Crick base pairing) or by RNA secondary structure. NATs function as intermediates

that target CMEs to locus-specific regions of the genome. The molecular

mechanisms governing the interaction between NATs and chromatin remain

poorly characterized. Here, we propose three different possible scenarios by which

this interaction occurs. (a) Specific binding of antisense RNA to a CME as well as to

a DNA region by forming a unique secondary structure. (b) The sequence motif

dictates the interaction between the antisense RNA molecule and the target DNA.

In this model, antisense RNA binds specifically to CMEs and to a particular DNA

region. (c) Nonspecific binding of antisense RNA to a DNA sequence. In this model,

local antisense transcription leads to a specific chromatin modification. The

specificity in this model comes from the promoter of antisense RNA and the fact

that transcription will lead to particular modifications. NATs do not physically

associate with the chromatin. In this case, locus-specificity is achieved by nascent

NATs that are recognized by chromatin-modifying enzymes.


sequences have been observed, thus suggesting that spe-cific DNA motifs might be important for the recruitment ofthese and other lncRNAs to their target genomic loci.HOTAIR binding sites contain a GA-rich polypurine motif,reminiscent of mammalian Polycomb response elements. Itis notable that although the HOTAIR binding sites overlapwith PRC2 and H3K27me3 chromatin regions, they arerestricted to small regions of a few hundred bp, raising thepossibility that HOTAIR nucleates PRC2 binding andH3K27me3 spreading [82]. These data, together with

394

the discovery that HOTAIR binding to its genomic targetsdoes not require EZH2, demonstrate that ncRNAs arerequired for specific recognition of DNA sequences as wellas recruitment of polycomb proteins, which in turn modifythe neighboring chromatin. This study demonstrates thatlocus-specific interaction between ncRNAs and chromatintakes place independently from ncRNA–enzyme interac-tion and pointed out the existence of specific RNA-target-ing motifs among ncRNA target sites. These motifs mayrepresent binding sites for structural elements within thencRNA, in case of direct RNA–DNA interaction, or mayfunction as the binding site for mediator proteins that mayinduce HOTAIR recruitment.

Concluding remarksAlthough the examples of NAT and lncRNA mechanismsdescribed above suggest a broad continuum of function forncRNAs in epigenetic regulation, the exact roles and mech-anisms of most of these molecules remain largely un-known. NATs have emerged as powerful transducers ofbiological information, primarily due to their ability tobridge the interaction between proteins and DNA [83].The information content and structural features of thesencRNAs collectively establish a dynamic interface withother macromolecules [83], thus facilitating the formationand modulation of ribonucleoprotein complexes crucial forepigenetic signaling. These unique features permit NATsand other lncRNAs to function as scaffolds to regulateepigenetic mechanisms within the cell. The key to futurestudies of lncRNAs will be to integrate successfully thelayers of knowledge gained from multiple genomic, tran-scriptomic, proteomic and epigenomic approaches to createa multidimensional understanding of NATs within theexisting cellular framework [84].

AcknowledgmentsThe authors would like to thank Dr Chiara Pastori and Roya PedramFatemi for helpful discussions and critical reading of the manuscript. Theresearch on long ncRNAs in C.W.’s laboratory is supported in large part bygrants from the U.S. National Institutes of Health (5R01NS063974 and5R01MH084880). M.M.’s postdoctoral studies are supported by a fellowshipfrom the Swiss National Science Foundation (PBGEP3-136151).

References1 Chi, P. et al. (2010) Covalent histone modifications – miswritten,

misinterpreted and mis-erased in human cancers. Nat. Rev. Cancer10, 457–469

2 Daniel, J.A. et al. (2005) Effector proteins for methylated histones: anexpanding family. Cell Cycle 4, 919–926

3 Cao, R. and Zhang, Y. (2004) The functions of E(Z)/EZH2-mediatedmethylation of lysine 27 in histone H3. Curr. Opin. Genet. Dev. 14, 155–164

4 Shi, Y. et al. (2004) Histone demethylation mediated by the nuclearamine oxidase homolog LSD1. Cell 119, 941–953

5 Maurer-Stroh, S. et al. (2003) The Tudor domain ‘Royal Family’: Tudor,plant Agenet, Chromo, PWWP and MBT domains. Trends Biochem.Sci. 28, 69–74

6 Mellor, J. (2006) It takes a PHD to read the histone code. Cell 126, 22–24

7 Kouzarides, T. (2007) Chromatin modifications and their function. Cell128, 693–705

8 van Steensel, B. (2011) Chromatin: constructing the big picture. EMBOJ. 30, 1885–1895

9 Schubeler, D. (2010) Chromatin in multicolor. Cell 143, 183–18410 Filion, G.J. et al. (2010) Systematic protein location mapping reveals

five principal chromatin types in Drosophila cells. Cell 143, 212–224


11 Kharchenko, P.V. et al. (2011) Comprehensive analysis ofthe chromatin landscape in Drosophila melanogaster. Nature 471,480–485

12 Ernst, J. and Kellis, M. (2010) Discovery and characterization ofchromatin states for systematic annotation of the human genome.Nat. Biotechnol. 28, 817–825

13 Gerstein, M.B. et al. (2010) Integrative analysis of the Caenorhabditiselegans genome by the modENCODE project. Science 330, 1775–1787

14 Bonasio, R. et al. (2010) Molecular signals of epigenetic states. Science330, 612–616

15 Bernstein, E. and Allis, C.D. (2005) RNA meets chromatin. Genes Dev.19, 1635–1655

16 Rodriguez-Campos, A. and Azorin, F. (2007) RNA is an integralcomponent of chromatin that contributes to its structuralorganization. PLoS ONE 2, e1182

17 Maison, C. et al. (2002) Higher-order structure in pericentricheterochromatin involves a distinct pattern of histone modificationand an RNA component. Nat. Genet. 30, 329–334

18 Mondal, T. et al. (2010) Characterization of the RNA content ofchromatin. Genome Res. 20, 899–907

19 Yap, K.L. et al. (2010) Molecular interplay of the noncoding RNAANRIL and methylated histone H3 lysine 27 by polycomb CBX7 intranscriptional silencing of INK4a. Mol. Cell 38, 662–674

20 Yu, W. et al. (2008) Epigenetic silencing of tumour suppressor gene p15by its antisense RNA. Nature 451, 202–206

21 Zhao, J. et al. (2008) Polycomb proteins targeted by a short repeat RNAto the mouse X chromosome. Science 322, 750–756

22 Martianov, I. et al. (2007) Repression of the human dihydrofolatereductase gene by a non-coding interfering transcript. Nature 445,666–670

23 Rinn, J.L. et al. (2007) Functional demarcation of active and silentchromatin domains in human HOX loci by noncoding RNAs. Cell 129,1311–1323

24 Bierhoff, H. et al. (2010) Noncoding transcripts in sense and antisenseorientation regulate the epigenetic state of ribosomal RNA genes. ColdSpring Harb. Symp. Quant. Biol. 75, 357–364

25 Wang, K.C. and Chang, H.Y. (2011) Molecular mechanisms of longnoncoding RNAs. Mol. Cell 43, 904–914

26 Katayama, S. et al. (2005) Antisense transcription in the mammaliantranscriptome. Science 309, 1564–1566

27 Carninci, P. et al. (2005) The transcriptional landscape of themammalian genome. Science 309, 1559–1563

28 Birney, E. et al. (2007) Identification and analysis of functionalelements in 1% of the human genome by the ENCODE pilot project.Nature 447, 799–816

29 Clark, M.B. et al. (2011) The reality of pervasive transcription. PLoSBiol 9, e1000625 discussion e1001102

30 Dinger, M.E. et al. (2008) Long noncoding RNAs in mouse embryonicstem cell pluripotency and differentiation. Genome Res. 18, 1433–1445

31 Ahfeldt, T. et al. (2012) Programming human pluripotent stem cellsinto white and brown adipocytes. Nat. Cell Biol. 14, 209–219

32 Guttman, M. et al. (2011) lincRNAs act in the circuitry controllingpluripotency and differentiation. Nature 477, 295–300

33 Ji, P. et al. (2003) MALAT-1, a novel noncoding RNA, and thymosinbeta4 predict metastasis and survival in early-stage non-small celllung cancer. Oncogene 22, 8031–8041

34 Faghihi, M.A. et al. (2008) Expression of a noncoding RNA is elevatedin Alzheimer’s disease and drives rapid feed-forward regulation ofbeta-secretase. Nat. Med. 14, 723–730

35 Gupta, R.A. et al. (2010) Long non-coding RNA HOTAIR reprogramschromatin state to promote cancer metastasis. Nature 464, 1071–1076

36 Wright, M.W. and Bruford, E.A. (2011) Naming ‘junk’: human non-protein coding RNA (ncRNA) gene nomenclature. Hum. Genomics 5,90–98

37 Kapranov, P. et al. (2007) RNA maps reveal new RNA classes and apossible function for pervasive transcription. Science 316, 1484–1488

38 Ghildiyal, M. and Zamore, P.D. (2009) Small silencing RNAs: anexpanding universe. Nat. Rev. Genet. 10, 94–108

39 Malone, C.D. and Hannon, G.J. (2009) Small RNAs as guardians of thegenome. Cell 136, 656–668

40 Carthew, R.W. and Sontheimer, E.J. (2009) Origins and mechanisms ofmiRNAs and siRNAs. Cell 136, 642–655

41 Guttman, M. et al. (2009) Chromatin signature reveals over a thousandhighly conserved large non-coding RNAs in mammals. Nature 458,223–227

42 Nakaya, H.I. et al. (2007) Genome mapping and expression analyses ofhuman intronic noncoding RNAs reveal tissue-specific patterns andenrichment in genes related to regulation of transcription. GenomeBiol. 8, R43

43 Sun, M. et al. (2006) Evidence for variation in abundance of antisensetranscripts between multicellular animals but no relationship betweenantisense transcription and organismic complexity. Genome Res. 16,922–933

44 Kiyosawa, H. et al. (2003) Antisense transcripts with FANTOM2clone set and their implications for gene regulation. Genome Res. 13,1324–1334

45 Chen, J. et al. (2005) Human antisense genes have unusually shortintrons: evidence for selection for rapid transcription. Trends Genet. 21,203–207

46 Chen, J. et al. (2005) Genome-wide analysis of coordinate expressionand evolution of human cis-encoded sense–antisense transcripts.Trends Genet. 21, 326–329

47 Lapidot, M. and Pilpel, Y. (2006) Genome-wide natural antisensetranscription: coupling its regulation to its different regulatorymechanisms. EMBO Rep. 7, 1216–1222

48 Faghihi, M.A. and Wahlestedt, C. (2009) Regulatory roles of naturalantisense transcripts. Nat. Rev. Mol. Cell Biol. 10, 637–643

49 Lee, J.T. et al. (1999) Tsix, a gene antisense to Xist at the X-inactivationcentre. Nat. Genet. 21, 400–404

50 Verona, R.I. et al. (2003) Genomic imprinting: intricacies of epigeneticregulation in clusters. Annu. Rev. Cell Dev. Biol. 19, 237–259

51 Mohammad, F. et al. (2009) Epigenetics of imprinted long noncodingRNAs. Epigenetics 4, 277–286

52 Wan, L.B. and Bartolomei, M.S. (2008) Regulation of imprinting inclusters: noncoding RNAs versus insulators. Adv. Genet. 61, 207–223

53 Sleutels, F. et al. (2002) The non-coding Air RNA is required forsilencing autosomal imprinted genes. Nature 415, 810–813

54 Nagano, T. et al. (2008) The Air noncoding RNA epigeneticallysilences transcription by targeting G9a to chromatin. Science 322,1717–1720

55 Kim, T.H. et al. (2007) Analysis of the vertebrate insulator proteinCTCF-binding sites in the human genome. Cell 128, 1231–1245

56 Gaszner, M. and Felsenfeld, G. (2006) Insulators: exploitingtranscriptional and epigenetic mechanisms. Nat. Rev. Genet. 7, 703–713

57 Smilinich, N.J. et al. (1999) A maternally methylated CpG island inKvLQT1 is associated with an antisense paternal transcript and loss ofimprinting in Beckwith–Wiedemann syndrome. Proc. Natl. Acad. Sci.U.S.A. 96, 8064–8069

58 Pandey, R.R. et al. (2008) Kcnq1ot1 antisense noncoding RNA mediateslineage-specific transcriptional silencing through chromatin-levelregulation. Mol. Cell 32, 232–246

59 Terranova, R. et al. (2008) Polycomb group proteins Ezh2 and Rnf2direct genomic contraction and imprinted repression in early mouseembryos. Dev. Cell 15, 668–679

60 Kanduri, C. (2011) Kcnq1ot1: a chromatin regulatory RNA. Semin. CellDev. Biol. 22, 343–350

61 Ghirlando, R. et al. (2012) Chromatin domains, insulators, and theregulation of gene expression. Biochim. Biophys. Acta (http://dx.doi.org/10.1016/j.bbagrm.2012.01.016)

62 Zhao, J. et al. (2010) Genome-wide identification of polycomb-associated RNAs by RIP-seq. Mol. Cell 40, 939–953

63 Pasmant, E. et al. (2007) Characterization of a germ-line deletion,including the entire INK4/ARF locus, in a melanoma-neural systemtumor family: identification of ANRIL, an antisense noncoding RNAwhose expression coclusters with ARF. Cancer Res. 67, 3963–3969

64 Popov, N. and Gil, J. (2010) Epigenetic regulation of the INK4b–ARF–INK4a locus: in sickness and in health. Epigenetics 5, 685–690

65 Pasmant, E. et al. (2011) ANRIL, a long, noncoding RNA, is anunexpected major hotspot in GWAS. FASEB J. 25, 444–448

66 Morris, K.V. et al. (2008) Bidirectional transcription directs bothtranscriptional gene activation and suppression in human cells.PLoS Genet. 4, e1000258

67 Qureshi, I.A. et al. (2010) Long non-coding RNAs in nervous systemfunction and disease. Brain Res. 1338, 20–35

395

http://dx.doi.org/10.1016/j.bbagrm.2012.01.016

http://dx.doi.org/10.1016/j.bbagrm.2012.01.016


68 Feng, J. et al. (2006) The Evf-2 noncoding RNA is transcribed from theDlx-5/6 ultraconserved region and functions as a Dlx-2 transcriptionalcoactivator. Genes Dev. 20, 1470–1484

69 Bond, A.M. et al. (2009) Balanced gene regulation by an embryonicbrain ncRNA is critical for adult hippocampal GABA circuitry. Nat.Neurosci. 12, 1020–1027

70 Batra, R. et al. (2010) Partners in crime: bidirectional transcription inunstable microsatellite disease. Hum. Mol. Genet. 19, R77–R82

71 Martin, J-J. (2012) Spinocerebellar ataxia type 7. In Handbook ofClinical Neurology (Vol. 103; Ataxic Disorders) (Subramony, S.H.and Du rr, A., eds), pp. 475–491, Elsevier

72 Sopher, B.L. et al. (2011) CTCF regulates ataxin-7 expression throughpromotion of a convergently transcribed, antisense noncoding RNA.Neuron 70, 1071–1084

73 Modarresi, F. et al. (2012) Inhibition of natural antisense transcripts invivo results in gene-specific transcriptional upregulation. Nat.Biotechnol. (http://dx.doi.org/10.1038/nbt.2158)

74 Straub, T. and Becker, P.B. (2011) Transcription modulationchromosome-wide: universal features and principles of dosagecompensation in worms and flies. Curr. Opin. Genet. Dev. 21, 147–153

75 Ilik, I. and Akhtar, A. (2009) roX RNAs: non-coding regulators of themale X chromosome in flies. RNA Biol. 6, 113–121

396

76 Khalil, A.M. et al. (2009) Many human large intergenic noncodingRNAs associate with chromatin-modifying complexes and affect geneexpression. Proc. Natl. Acad. Sci. U.S.A. 106, 11667–11672

77 Fanti, L. et al. (2008) The trithorax group and Pc group proteins aredifferentially involved in heterochromatin formation in Drosophila.Chromosoma 117, 25–39

78 Tsai, M.C. et al. (2010) Long noncoding RNA as modular scaffold ofhistone modification complexes. Science 329, 689–693

79 Wang, K.C. et al. (2011) A long noncoding RNA maintains activechromatin to coordinate homeotic gene expression. Nature 472, 120–124

80 Kanhere, A. et al. (2010) Short RNAs are transcribed from repressedpolycomb target genes and interact with polycomb repressive complex-2. Mol. Cell 38, 675–688

81 Simon, M.D. et al. (2011) The genomic binding sites of a noncodingRNA. Proc. Natl. Acad. Sci. U.S.A. 108, 20497–20502

82 Chu, C. et al. (2011) Genomic maps of long noncoding RNA occupancyreveal principles of RNA-chromatin interactions. Mol. Cell 44, 667–678

83 St Laurent, G., 3rd and Wahlestedt, C. (2007) Noncoding RNAs:couplers of analog and digital information in nervous systemfunction? Trends Neurosci. 30, 612–621

84 Hawkins, R.D. et al. (2010) Next-generation genomics: an integrativeapproach. Nat. Rev. Genet. 11, 476–486

http://dx.doi.org/10.1038/nbt.2158

Genetic basis of blood pressure andhypertensionSandosh Padmanabhan1, Christopher Newton-Cheh2 and Anna F. Dominiczak1

1 Institute of Cardiovascular and Medical Sciences, College of Medical, Veterinary and Life Sciences, University of Glasgow,

Glasgow G12 8TA, UK2 Harvard Medical School, Massachusetts General Hospital, Broad Institute of Harvard University and Massachusetts Institute

of Technology, Boston, MA 02114, USA

Review

Blood pressure (BP) is a complex trait regulated by anintricate network of physiological pathways involvingextracellular fluid volume homeostasis, cardiac contrac-tility and vascular tone through renal, neural or endocrinesystems. Untreated high BP, or hypertension (HTN), isassociated with increased mortality, and thus a betterunderstanding of the pathophysiological and geneticunderpinnings of BP regulation will have a major impacton public health. However, identifying genes that contrib-ute to BP and HTN has proved challenging. In this reviewwe describe our current understanding of the geneticarchitecture of BP and HTN, which has accelerated overthe past five years primarily owing to genome-wide as-sociation studies (GWAS) and the continuing progress inuncovering rare gene mutations, epigenetic markers andregulatory pathways involved in the physiology of BP. Wealso look ahead to future studies characterizing novelpathways that affect BP and HTN and discuss strategiesfor translating current findings to the clinic.

The complexity of BP and HTNBP is a quantitative trait that is distributed normally inthe general population. In adults there is a continuous,incremental risk of cardiovascular disease, stroke andrenal disease associated with high BP. HTN is definedbased on a cut-off at the upper end of the distribution of BP‘at which the benefits of action (i.e., therapeutic interven-tion) exceed those of inaction’ [1]. Based on this definition,there are over 1 billion people with HTN worldwide, andthe World Health Organization suggests this will rise to1.5 billion by 2020 [2]. The high prevalence of HTN and itsconsequent significant adverse economic impact on theindividual and population highlight the importance ofprimary prevention of HTN. Thus, there is a pressingneed for a greater understanding of the pathophysiologicaland genetic underpinnings of BP regulation and dysregu-lation. Studies have demonstrated that BP is a geneticallydetermined trait, with estimates of heritability rangingfrom 31% to 68% [3,4]. The BP/HTN phenotype posesunique challenges for genetic dissection that have madeprogress slow (Box 1). BP levels are determined by cardiacoutput and peripheral vascular resistance, and these inturn are regulated by a complex network of interactingphysiological pathways involving extracellular fluid

Corresponding author: Dominiczak, A.F. ([email protected]).Keywords: hypertension; blood pressure; genetics; sodium; artery; kidney.

0168-9525/$ – see front matter � 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.101

volume homeostasis, cardiac contractility and vasculartone through renal, neural or endocrine systems. Pertur-bations in any of these physiological pathways can arisefrom environmental (for example salt intake) or geneticfactors or a combination of both that result in high or lowBP. Rare monogenic BP syndromes are characterized by amajor gene defect affecting a single pathway commonlyinvolving renal electrolyte balance (Box 2). The phenotyp-ic heterogeneity is further complicated by intra-individualBP variability caused by a large number of factors includ-ing measurement technique, instrument error and patientfactors such as anxiety and activity level [5]. All thesegenotypic and phenotypic complexities have resulted inboth false-positive and false-negative studies in the past.The search for BP genes initially focused on genome-widelinkage studies that were successful in uncovering genesfor monogenic forms of high and low blood pressure butturned out to be largely unsuccessful in explaining thepolygenic BP phenotype. The recent successes of GWAS[6–12] are testament to greater rigor in phenotypic char-acterization and statistical design. The limitations andpotential of GWAS in the dissection of hypertension arehighlighted in a recent debate [13,14].

In this review we describe the explosion in our under-standing of the genetic architecture of BP and HTN thathas occurred over the past five years and the continuingprogress in uncovering new gene mutations causing rareinherited forms of HTN and hypotension. We review theroad ahead, highlighting the novel pathways identified,both common and rare, and discuss future strategies touncover novel mechanisms and clinical translation. Table1 summarizes the genomic and functional context of all themonogenic and GWAS BP/HTN loci discovered to date.

Common variantsGWAS use dense sets of single nucleotide polymorphisms(SNPs; usually �500 000–1 000 000) and rely on linkagedisequilibrium (LD) or correlation patterns of typed (orimputed) SNPs with functional variants. This means theidentified SNPs are usually proxies of ungenotyped func-tional variants. Although these SNPs show unequivocalassociation with BP, the functional dissection of thesesignals is not straightforward. Associations detected byGWAS between BP/HTN and SNPs, with 50 kb flankingsegments, are shown in Figure 1 (GWAS SNPs were se-lected from studies with sample sizes greater than 20 000

6/j.tig.2012.04.001 Trends in Genetics, August 2012, Vol. 28, No. 8 397



Box 1. Phenotypic complexity

Variation in extracellular fluid volume, the contractile state of the heart

and vascular tone contribute to variation in BP level. Other determi-

nants of BP include age, weight, ethnicity and diet. Systolic BP (SBP)

increases linearly from age 30 to 84 years together with mean arterial

pressure [a weighted average of SBP and diastolic pressure (DBP)],

but DBP increases linearly up to age 50–60, after which it begins to

decline with a steep increase in pulse pressure (the difference between

SBP and DBP) [52,53]. The late decline of DBP after age 60 and the

continuous rise in systolic BP reflects the increased large artery

stiffness in older age. The odds of progression to HTN increase by 20–

30% for every 5% gain in body weight.

At all ages, HTN is more common in African Americans than in

whites; in all ethnic and racial groups it is more common in those with

lower socioeconomic status. Interestingly, HTN is more prevalent in

the African-American population in the USA than in either Afro-

Caribbean or native black African populations [54–56]. In some

societies, BP shows only a small age-related increase and may be

related in part to their agrarian lifestyle as well as the high potassium,

low sodium diet of the hunter-gatherer, a more rural lifestyle and a

lower consumption of food [57–60]. From an evolutionary perspective,

essential HTN is a disease of civilization – with its abundance of

processed foods and long lifespans – and could be an undesirable

pleiotropic effect of a genotype that may have optimized fitness in an

ancient environment [61]. The rates of HTN and sodium sensitivity are

generally higher in individuals carrying the ancestral alleles of

sodium-conserving genes, which show strong latitudinal clines with

the ancestral sodium-conserving alleles more prevalent in African

populations and less so in the northern regions [62–64].


and SNPs attaining a P value <5�10–8 for significantassociation). Many of the SNPs from GWAS that attainedgenome-wide significance also show similar strong associa-tions for other traits (pleiotropy) (Figure 1a) – for example,rs13333226 shows independent association with HTN andchronic kidney disease [11,15,16]; rs3184504 shows

Box 2. Qualitative or quantitative phenotype

From a genetic perspective, whether BP is considered as a

quantitative trait or a dichotomous disease phenotype has major

implications for studies of genetic causation, and this was recognized

very early. In the 1950s a technical controversy about the unimodal or

bimodal distribution of blood pressure led to the famous Platt–

Pickering debate [65], and this is a useful platform to understand the

assumptions that have driven genetic research of BP/HTN so far

(Figure I). Platt measured BP in normotensive and hypertensive

probands and their relatives and found a bimodal distribution of BP

values (Figure Ia). This led him to argue that HTN was simple

mendelian disease caused by a single dominant genetic mutation. By

contrast, Pickering studied BP distributions from the second to eighth

decades in first-degree relatives of normotensive and hypertensive

probands and instead found a unimodal distribution of BP (Figure Ib).

Pickering concluded that the continuous Gaussian distribution of BP

values indicated that BP was inherited as a ‘graded character’ and is

hence a polygenic non-mendelian trait [65,66]. Thus, according to

Platt

Hyp

erte

nsio

n

Blood pressure

Gene mutation absent Gene mutation present

Pop

ulat

ion

freq

uenc

y

(a)

Figure I. The Platt–Pickering debate about the quantitative or qualitative nature of HTN

by a single, heritable genetic mutation. (b) Pickering suggested that there was a range

hypertension and normotension; instead, HTN represented the end of a continuum a

398

significant association with chronic kidney disease, celiacdisease, type 1 diabetes, coronary artery disease, choles-terol, hemoglobin, retinal vascular caliber, plasma eosino-phil count and rheumatoid arthritis [6,9,16–24]; rs1799945is also associated with serum iron concentration andhemoglobin levels in addition to its association with blood

Platt, hypertensive individuals were a distinct sub-population,

whereas according to Pickering, hypertension was only the upper

portion of a continuous distribution curve of BP. The debate dragged

on through the 1960s with much resistance to accepting Pickering’s

quantitative model; this only changed with mounting evidence from

epidemiological studies indicating that high BP was a risk factor for

cardiovascular disease and intervention trials of antihypertensive

therapy all showing similar benefits of reducing BP. Further support

comes from large-scale GWAS for BP that have mapped common

variants at 29 loci with small effects [6,8–11,67]. However, Platt’s

model cannot be discounted entirely because there are rare

mendelian forms of HTN and hypotension that are caused by highly

penetrant rare genetic variants with large effects [40]. Pickering’s

concession at the end of the long debate aptly summarizes the current

understanding of HTN genetics – ‘I never denied the possibility that

there may be a group in what we now call ‘‘essential hypertension’’

characterized by single-gene inheritance.’

Pickering

Hyp

erte

nsio

n

Gene (0-n)

Environment (0-n)

Blood pressure

Pop

ulat

ion

freq

uenc

y

(b)

TRENDS in Genetics

. (a) Platt argued that HTN occurred in a discrete subpopulation and was caused

of BP levels in the population, and that there was no clear dividing line between

nd was therefore polygenic in origin.

Table 1. Genetic loci associated with monogenic BP syndromes and identified through GWASa

CHR Gene/nearest

gene

Monogenic syndrome GWASb Pathway Notes

1p36.13 CLCNKB Bartter syndrome, type 3

OMIM #607364

Renal electrolyte

balance

Autosomal recessive

Impaired chloride reabsorption in

the thick ascending loop of Henle

leads to impaired sodium

reabsorption

Low/normal BP

1p36.13 SDHB Paragangliomas 4

OMIM #115310

Sympathetic system Multiple catecholamine-secreting

head and neck paragangliomas and

retroperitoneal

pheochromocytomas

1p36.2 MTHFR

(NPPA, NPPB)

CHARGE, GBPG,

AGEN, ICBP,

Gene-centric

Renal electrolyte

balance

Methylene-tetrahydrofolate

reductase; has been associated

with changes in plasma

homocysteine levels and pre-

eclampsia. Atrial natriuretic and

brain natriuretic peptides genes

have been associated with

hypertension

1q23.3 SDHC Paragangliomas 3

OMIM #605373

Sympathetic system Tumors or extra-adrenal

paraganglia- associated

pheochromocytoma

1q42.2 AGT Gene-centric Renal electrolyte

balance,

vascular function

The cleaved products angiotensin I,

angiotensin II and angiotensin III

are known regulators of BP and

sodium homeostasis

2q36.2 CUL3 Pseudohypoaldosteronism

type IIE

OMIM *603136

Renal electrolyte

balance

Modulation of renal salt, K+ and H+

handling in response to

physiological challenge

3p25.3 VHL von Hippel–Lindau syndrome

OMIM #193300

Sympathetic system Autosomal dominant

Associated with retinal, cerebellar,

and spinal hemangioblastoma,

renal cell carcinoma (RCC),

pheochromocytoma, and

pancreatic tumors

3q22.1 ULK4 CHARGE, GBPG, ICBP Serine-threonine kinase of

unknown function

3q26.2 MECOM

(MDS1)

GBPG, ICBP Myelodysplasia syndrome

protein 1

4q21.2 FGF5 GBPG, AGEN, ICBP Fibroblast growth factor 5;

stimulates cell growth and

proliferation and is associated with

angiogenesis

4q31.2 NR3C2 Hypertension exacerbation

in pregnancy

OMIM #605115

Renal electrolyte

balance

Autosomal dominant

Missense mutation (S810L) in the

mineralocorticoid receptor

Low-renin, low-aldosterone,

hypokalemia

Pseudohypoaldosteronism

type I

OMIM #177735

Renal electrolyte

balance

Autosomal dominant

Renal unresponsiveness to

mineralocorticoids

5p15.3 SDHA Paragangliomas 5

OMIM #614165

Sympathetic system Tumors or extra-adrenal

paraganglia-associated

pheochromocytoma

5p13.3 NPR3 AGEN, ICBP,

Gene-centric

Renal electrolyte

balance

Natriuretic peptide clearance

receptor

5q31.2 KLHL3 Pseudohypoaldosteronism

type IID

OMIM #614495

Renal electrolyte

balance

Modulation of renal salt, K+ and H+

handling in response to

physiological challenge

6p22.2 HFE Hemochromatosis

OMIM #235200

ICBP, Gene-centric Autosomal recessive

Iron metabolism

7p22 Familial

hyperaldosteronism type 2

OMIM #605635

Steroid/aldosterone

synthesis

Autosomal dominant

Hyperaldosteronism due to

adrenocortical hyperplasia not

suppressed by dexamethasone


399

Table 1 (Continued )

CHR Gene/nearest

gene


7q36.1 NOS3 Pregnancy-induced

hypertension

OMIM +163729

HYPERGENES,

Gene-centric

Endothelial function Nitric oxide plays an important role

in the maintenance of

cardiovascular and renal

homeostasis

8q24.3 CYP11B1,

CYP11B2

Familial

hyperaldosteronism type 1

Glucocorticoid-

remediable

aldosteronism (GRA)

OMIM #103900

Steroid/aldosterone

synthesis

Autosomal dominant

Chimeric gene

Plasma and urinary aldosterone

responsive to ACTH;

dexamethasone suppressible

within 48 h

8q24.3 CYP11B2 Corticosterone

methyloxidase

II deficiency

OMIM #61060

Steroid/aldosterone

synthesis

Autosomal recessive

Enzymatic defect results in

decreased aldosterone and salt-

wasting

8q24.3 CYP11B1 Steroid 11b-hydroxylase

deficiency

OMIM #202010

Steroid/aldosterone

synthesis

Enzyme dysfunction leads to

increased levels of MR-activating

hormones

10p12.3 CACNB2 CHARGE, ICBP ?Vascular/cardiac

function

Subunit of voltage-gated calcium

channel expressed in heart

10q11.2 RET Multiple endocrine

neoplasia type IIA

OMIM #171400

Sympathetic

system

Autosomal dominant

Associated with multiple endocrine

neoplasms, including medullary

thyroid carcinoma,

pheochromocytoma, and

parathyroid adenomas

10q24.3 CYP17A1 17a-hydroxylase and/or

17,20-lyase deficiency

OMIM *609300

CHARGE, GBPG,

AGEN-BP, ICBP

Steroid/aldosterone

synthesis

Cytochrome p450 enzyme

mediating the first step in

mineralocorticoid and

glucocorticoid synthesis. Enzyme

dysfunction leads to increased

levels of MR activating hormones.

Also involved in sex steroid

synthesis

11p15.1 PLEKHA7 CHARGE, ICBP Plextrin-homology domain-

containing family member

expressed in zona adherens of

epithelial cells

11p15.2 SOX6 Gene-centric Transcription Required for normal development

of the central nervous system,

chondrogenesis, and maintenance

of cardiac and skeletal muscle cells

11p15.5 LSP1/TNNT3 Gene-centric ?Endothelial

function

Expressed in leukocytes and

endothelial cells. Involved in

signaling, regulating the

cytoskeletal architecture and

neutrophil migration

11q12.2 SDHAF2 Paragangliomas 2

OMIM #601650

Sympathetic

system

Tumors or extra-adrenal

paraganglia-associated

pheochromocytoma

11q23.1 SDHD Paragangliomas 1

OMIM #16800

Sympathetic

system

Tumors or extra-adrenal

paraganglia associated

pheochromocytoma

11q24.3 KCNJ1 Bartter syndrome,

antenatal, type 2

OMIM #241200

Renal electrolyte

balance

Reduced potassium recycling leads

to impaired sodium

Reabsorption

12p12.2 Hypertension with

Brachydactyly

Bilginturan syndrome

OMIM %112410

Inversion, deletion, and reinsertion

at 12p12.2 to p11.2

No specific biochemical findings

12p12.3 WNK1 Pseudohypoaldosteronism

type IIC

Gordon’s syndrome

OMIM #614492

Renal electrolyte

balance

Autosomal dominant

Gain-of-function mutations in

WNK1

Low plasma renin, normal or

elevated K+

12q21.3 ATP2B1 CHARGE, GBPG, AGEN,

ICBP, Gene-centric

?Vascular function Encodes plasma membrane

calcium- or calmodulin-dependent

ATPase expressed in endothelium


400

Table 1 (Continued )

CHR Gene/nearest

gene


12q24.1 SH2B3 CHARGE, GBPG, ICBP ?Endothelial

function

Also known as lymphocyte-specific

adaptor protein (LNK), may

regulate hematopoietic progenitors

and inflammatory signaling

pathways in endothelium

12q24.2 TBX5–TBX3 CHARGE, ICBP T-box genes involved in regulation

of developmental processes

15q21.1 SLC12A1 Bartter syndrome,

antenatal, type 1

OMIM #601678

Renal electrolyte

balance

Homozygous or compound

heterozygous mutation in the

sodium-potassium-chloride

cotransporter-2 gene

15q24.1 CSK CHARGE, GBPG,

AGEN-BP, ICBP

Vascular function Cytoplasmic tyrosine kinase

involved in angiotensin II-

dependent vascular smooth muscle

cell contraction

16p12.2 SCNN1B,

SCNN1G

Liddle syndrome

OMIM #177200

Renal electrolyte

balance

Autosomal dominant

Constitutive activation of epithelial

sodium

transporter, ENaC. Low plasma

renin, low or normal K+; negligible

urinary aldosterone

16p12.3 UMOD BP-Extremes ?Renal electrolyte

balance

?Renal function

Uromodulin; Tamm–Horsfall

protein. Specifically expressed in

the thick ascending limb of the loop

of Henle where 25% of sodium

reabsorption in the kidney occurs

16q13 SLC12A3 Gitelman syndrome

OMIM #263800

Renal electrolyte

balance

Low BP

Loss-of-function mutation leads to

lower sodium reabsorption

16q22.1 HSD11B2 Apparent mineralocorticoid

excess

OMIM # 218030

Steroid/aldosterone

synthesis

Autosomal recessive

Increased plasma ACTH and

secretory rates of all corticosteroids

17q21.3 ZNF652 GBPG, AGEN-BP, ICBP Zinc-finger protein 652

17q21.3 WNK4 Pseudohypoaldosteronism

type IIB

Gordon’s syndrome

OMIM #614491

Renal electrolyte

balance

Autosomal dominant

Loss-of-function mutations in

WNK4

Low plasma renin, normal or

elevated K+

20q13 GNAS–EDN3 GBPG, ICBP Vascular function GNAS encodes the a subunit of the

G protein mediating b-receptor

signal transduction

EDN3 encodes endothelin 3, the

precursor for the ligand of the

endothelin B receptor

aThe key genes at each locus are shown with their known or potential role in BP regulation. The grey shaded rows indicate genes implicated in monogenic syndromes of

high/low BP. The Pathway column is color-coded according to the pathway involved.

bGWAS studies: AGEN [7], BP-Extremes [10], CHARGE [8], GBPG [9], Gene-centric [6], HYPERGENES [11], and ICBP [5].


pressure [6,25,26]; and proxies for rs1004467 show ge-nome-wide significant associations with coronary arterydisease, schizophrenia, intracranial aneurysm and par-kinsonism [6,8–10,24,27–30]. This illustrates the chal-lenges ahead when attempting to design studies tofunctionally dissect these signals. Figure 1a also showsgenes that are associated with monogenic syndromesfrom Online Mendelian Inheritance in Man (OMIM) thatoccur in the GWAS-related DNA segments shown. Theonly genes known to be associated with monogenic formsof high blood pressure and have been identified by GWASare cytochrome P450, family 17, subfamily A, polypep-tide 1 (CYP17A1) and nitric oxide synthase 3 (NOS3).Even once a SNP has been identified that is associatedwith HTN, it is difficult to identify the gene involved. Forexample, Figure 1b shows the genes within 50 kb on

either side of a blood pressure GWAS SNP, and amongthese only a few genes (NPPA, NOS3 and UMOD) havebeen clearly linked to the GWAS SNP [11,12,31]. Fur-thermore, 50 kb is by no means the limit of the zone ofinfluence of a SNP because the risk-genes implicated bythe GWAS SNPs may lie within the region of linkagedisequilibrium around the SNP, or even more distantlybecause SNPs can influence the regulation of remotegenes. GWAS loci are often rich in copy-number variants,insertion/deletion variants (Figure 1b), microRNA tar-gets and transcription factor binding sites (Figure 1c)that may influence the genotype–phenotype associationand offer another avenue for molecular and functionalexperiments to elucidate the causal pathways or moresimply to identify which risk-gene the GWAS SNPimplicates.

401

(a)

(b)

TRENDS in Genetics

Figure 1. Phenotypic, genetic and regulatory context of GWAS signals for blood pressure and hypertension. (a) Phenotypic landscape of GWAS signals in BP/HTN GWAS.

The strongest SNPs for BP and HTN also show very little overlap with genes involved in monogenic BP syndromes. Only CYP17A1 and NOS3 are associated with

monogenic BP syndromes and occur within 50 kb of BP GWAS SNPs. The strongest BP GWAS SNPs and their proxies are not associated exclusively with BP phenotypes


402

(c)

TRENDS in Genetics

Figure 1. (Continued ).


Finally, the collective effect of all BP loci identifiedthrough GWAS explains only a small fraction (�2%) ofBP heritability. Thus, similarly to other common traits, BPshares the same missing heritability conundrum [32], andefforts are now directed toward identifying additionalcommon variants of small effect and rare variants ofgreater effect. Although GWAS use SNPs selected to pro-vide genome-wide coverage, they provide limited coverageof genes with plausible biological relevance (‘candidategenes’) particularly in relation to lower-frequency geneticvariants (such as those with minor allele frequencies

but show pleiotropy with non-BP traits that can either point to plausible underlying pa

novel common pathways or may be independent associations. The rings from outer t

flanking region); (2) GWAS SNPs; (3) black markers on chromosomal segments – SNP

monogenic syndromes from OMIM present in the chromosomal regions; (5) non-BP

landscape of GWAS signals in BP/HTN GWAS. Only a few genes (NPPA, NOS3, UMOD) h

in gene-rich regions, highlighting the challenges ahead in fine-mapping and identifyin

regulation of distant genes outside the 50 kb regions shown in this figure. Furthermore,

that will need to be considered in the functional dissection of GWAS signals. The rings

within the chromosomal regions. (c) Regulatory landscape of GWAS signals in BP/HT

conserved transcription factor binding sites, and epigenetic loci, that may influence

molecular and functional experiments to elucidate the causal pathways. The SNP positio

in (a) and (b). The rings from outer to inner represent: (1) MicroRNA targets and associa

region); (3) transcription factors binding sites conserved in the human/mouse/rat alignm

Browser showing those transcription factors with score >800; (4) DNase hypersensitiv

height of the line indicates the length of the segment; (a–c) were generated using Circos

(http://genome.ucsc.edu/) [69].

of 1–5%). Large-scale gene-centric analysis of BP using acustomized gene array enriched with common, low-fre-quency variants in �2100 candidate cardiovascular genesreflecting a wide variety of biological pathways in over80 000 individuals identified NPR3, HFE, NOS3, SOX6,LSP1/TNNT3, MTHFR, AGT and ATP2B1, with someoverlap with large GWAS meta-analyses [7]. Among thesingle candidate genes studied, NPPA-NPPB [31] andSCNN1G [33] showed evidence of association with replica-tion, but only the former showed strong concordant signalsin GWAS.

thways (for example UMOD and its association with HTN and kidney function) or

o inner represent: (1) chromosomal segments with GWAS SNPs (including 50 kb

proxy locations for the index SNP in the region (r2>0.8); (4) genes implicated in

phenotypes that showed genome-wide significance within these loci. (b) Genetic

ave been clearly linked to the strongest GWAS SNP, whereas many of the SNPs lie

g the causative gene/variant. It is very likely that GWAS SNPs may influence the

the GWAS loci are also rich in copy-number variants and insertion/deletion variants

from outer to inner represent 1–5 as in (a); (6) shows structural variations present

N GWAS. Bioinformatic analysis of GWAS BP SNP loci show microRNA targets,

the genotype–phenotype association and offer another avenue for the design of

ns are indicated by red bars on the chromosome and are the same SNPs as shown

ted genes; (2) chromosomal segments with GWAS SNPs (including 50 kb flanking

ent in the chromosomal regions using TFBS Conserved (tfbsConsSites) in UCSC

e areas assayed in a large collection of cell types; (5) predicted CpG islands. The

[68] with the Feb 2009(GRCh37/hg19) assembly data from UCSC Genome Browser

403

http://genome.ucsc.edu/


One striking result of the BP GWAS is that the genesfrom highly plausible pathways are not represented nearthe identified SNPs (Figure 1). Using the GRAIL text-mining algorithm (Gene Relationships Across ImplicatedLoci [34]) to search for connectivity between genes near theassociated SNPs, based on existing literature (publishedbefore 2006 – before the explosion of GWAS publications),Figure 2 shows that of the 41 BP GWAS loci, 14 showedunderlying genes with significant relatedness, as definedby the degree of similarity in the text describing themwithin article abstracts, implying these connected genesare involved in a common cellular process or pathway.These regions of GRAIL connectivity show the expectedconnection between NPPA/B and NPR3 but, in cases whenthe GWAS SNPs lie in gene-rich regions, also reveal con-nections that point to specific novel genes for follow-upstudies. This is highlighted by rs805303, present in a verygene-rich locus, and where a connection between NOTCH4

Figure 2. Representation of the connections between 41 BP GWAS SNPs and their co

Relationships Across Implicated Loci [34]). This searches for connectivity between gen

literature before 2006 – before the surge of GWAS publications). The thickness of the red

This type of analysis supports known interactions but also suggests new connections

404

and Jagged 1 (JAG1 – in a different locus – rs1327235)highlights the NOTCH signaling pathway that has beenshown to be important in developing cardiovascular sys-tem and congenital human cardiovascular diseases. Theirconnection with BP regulation is not intuitive, but thisassociation should prioritize this pathway and these genesfor functional dissection.

Despite the increasing pace of discovery of variantsassociated with BP and HTN, the limited predictive utilityof these variants either singly or as part of a composite riskscore is striking. The population distribution of the numberof BP-increasing alleles with nearly similar allele frequen-cies is normally distributed because each SNP is inheritedindependently, and hence the number of individuals in thepopulation that are expected to carry all harmful riskalleles would be vanishingly small. As an example, usingthe BP extreme case–control cohort [11], the probabilitydensity functions of the number of BP-increasing alleles

TRENDS in Genetics

rresponding genes using the GRAIL literature-based text-mining algorithm (Gene

es near the associated SNPs, based on existing literature (we selected published

lines indicates the strength of the literature-based connectivity between the genes.

that are worth following up in future studies.

30 35 40 45 50

0.00

0.02

0.04

0.06

0.08

0.10

0.12

Fre

quen

cy

Number of BP-increasing alleles

Hypertensive populationAge <63 yearsBP>160/100

‘Hypercontrols’Age>50 yearsBP<120/80No prevalent CVD or incident CVD during 10 year follow-up

TRENDS in Genetics

Figure 3. For the prediction of complex diseases, genotypes at multiple SNPs are

often combined into scores (for example, scores are calculated according to the

number of risk alleles carried). The frequency distribution of the number of BP-

increasing alleles carried in the general population would be normally distributed

because each allele is inherited independently. The frequency distributions of 35

BP-increasing alleles from GWAS SNPs in populations selected from the extremes

of BP distribution (top 9% and the bottom 2%) [11] show a large overlap of scores,

and the majority of the individuals from both phenotypic extremes lie in the middle

of the distribution. This illustrates the fallacy of using risk scores from GWAS SNPs

to identify individuals at high risk for hypertension. Abbreviation: CVD:

cardiovascular diseases.


(from 35 genome-wide significant GWAS SNPs) in hyper-controls and the extreme hypertensive cases are shown inFigure 3, illustrating the significant overlap of cases andcontrols by genetic risk score despite extremeness of thephenotypic ascertainment. Using genetic risk scores con-structed from up to 13 GWAS BP SNPs, a novel longitudi-nal study showed that individuals with the highestcombination risk score had significantly higher diastolicBP at the age of nine years, and the effect was persistentfrom childhood through adult age [35]. Genetic risk scores,including many non-genome-wide significant SNPs,explained more of the variance than scores based onlyon very significant SNPs in adults and children(1.2–1.7% in adults and 0.8–1.4% in children) [36].

Novel pathways uncovered by GWASHighly correlated SNPs (r2>0.9) in the 50 end of UMODhave been independently identified in large GWAS of bloodpressure extremes and kidney function [11,16]. The UMODgene [expressed primarily in the thick ascending limb(TAL) of the loop of Henle] encodes the Tamm–Horsfallprotein [THP/uromodulin (UMOD)], an extracellular pro-tein anchored by a glycosyl phosphatidylinositol (GPI)functional group at the luminal face of tubular epitheliaand released into the urine by proteolytic cleavage. It is themost abundant tubule protein in the urine. In the HTNstudy, the minor G allele of rs13333226 at the 50 end ofUMOD gene is associated with a lower risk of HTN [OR(95% CI): 0.87 (0.84;0.91); P = 3.6�10–11], 0.49 mmHglower SBP (P = 2.6�10–5) and 0.30 mmHg lower DBP(P = 1.5�10–5), increased estimated glomerular filtrationrate (eGFR) (3.6 ml/min/minor-allele, P = 0.012), reduced

urinary UMOD excretion and lower fractional excretion ofendogenous lithium. In addition, the genotype associationbetween rs13333226 and urinary UMOD excretion wasmore pronounced with low salt intake and blunted withhigh salt intake, indicating a possible gene–environmentinteraction [11]. Adjustment for eGFR in the HTN GWASdid not alter the association between rs13333226 andHTN. Mutations in UMOD cause medullary cystic kidneydisease type 2 (MCKD2), familial juvenile hyperuricemicneuropathy (FJHN) and glomerulocystic kidney disease(GCKD), but these only lead to HTN during latter stages ofrenal failure. A single mechanism that could explain all theobservations involving the minor G allele of rs13333226 isa decreased sensitivity of the macula densa to luminal Cl–.The decreased sensitivity of the macula densa may bemediated either through the increased UMOD excretionassociated with the G allele or through other mechanisms.Under this model, decreased macula densa sensitivityactivates tubuloglomerular feedback and increases GFR,explaining the increased proximal tubular Na+ reabsorp-tion. The lifetime effect of the elevated GFR would explainthe reduced BP and potentially the age-related effect of thevariant. There may be other possible mechanisms, forexample ROMK function is activated by UMOD, and thusreduced ROMK activity might explain renal salt wasting inTHP knockout mice and patients with Bartter syndrome[37].

Rare variantsGenes involved in monogenic hypertension are summa-rized in Table 1. Pseudohypoaldosteronism type II (Gor-don’s syndrome; familial hyperkalemia; OMIM #145260),an autosomal dominant form of HTN associated withhyperkalemia, non-anion gap metabolic acidosis and in-creased salt reabsorption by the kidney, is caused by eithergain-of-function mutations in WNK1 or loss-of-functionmutations in WNK4. Recently, exome sequencing has beenused to identify mutations in kelch-like 3 (KLHL3) orcullin3 (CUL3) in pseudohypoaldosteronism II (PHAII)patients from 41 unrelated families [38].

Conversely, mutations that reduce salt retention, suchas those associated with Bartter (SLC12A1, KCNJ1,CLCNKB, BSND, CaSR, ClCK-A) and Gitelman(SLC12A3) syndromes, tend to lower BP and protectagainst the development of HTN [39,40]. Resequencingthree candidate genes (SLC12A3, SLC12A1, KCNJ1) in-volved in Bartter or Gitelman syndromes in the Framing-ham Heart Study population identified 30 distinctpotentially deleterious rare mutations present in 49 sub-jects. In the heterozygous state, these variants were asso-ciated with 5.7 mmHg lower BP at age 40, and 9.0 mmHglower BP at age 60, and in aggregate reduce the risk ofHTN by 60% at age 60 [39]. This is the first indication thatrare variants can produce clinically significant BP reduc-tion in the general population and supports the rare-variant–common-disease hypothesis [41]. There arecurrently exome sequencing projects involving individualsat BP extremes to replicate this finding and discover morerare variants with a large effect on blood pressure. Thehypothesis for these studies is that there will be an abun-dance of low-frequency variants of large effect that will

405


explain most of the missing heritability of BP. Although theclinical applications of these findings will be limited giventhe very low frequency of these variants in the population,these studies should uncover novel pathways and provide adeeper understanding of the genetic architecture of bloodpressure.

EpigeneticsNot all features of gene regulation are encoded in genes orcontained in the DNA sequence. MicroRNAs (miRs), his-tone modifications and DNA methylation have all beeninvestigated with regard to their role in BP gene regula-tion. The potential role of miRs in vascular smooth-musclebiology and blood pressure is just beginning to be appre-ciated. Mice lacking miR-143 and miR-145 develop signifi-cant reductions in BP resulting from modulation of actindynamics [42]. Intrarenal expression of miR-200a, miR-200b, miR-141, miR-429, miR-205 and miR-192 were foundto be increased in hypertensive nephrosclerosis, and thedegree of upregulation correlated with disease severity.There are significant correlations between miR species andproteinuria and GFR, suggesting a dose–response type ofrelationship between intrarenal miR expression and theseverity of hypertensive nephrosclerosis [43]. Renin geneexpression appears to be regulated by miR-181a and miR-663 [44]. The identification of these miRs may lead to theelucidation of pathways involved in HTN causation andnovel therapeutics. Recently, an observational studyshowed that human cytomegalovirus (HCMV) seropositiv-ity and titers are positively associated with essential hy-pertension independently of other HTN risk factors [45].The HCMV-encoded miR hcmv-miR-UL112 was highlyexpressed in hypertensive patients, pointing to a poten-tially novel pathway involved in HTN. There is supportfrom an animal study showing that infection of mice withmouse cytomegalovirus can alone elevate blood pressure[46]. Although this is an observational finding, it highlightsthe prospect of an abundance of pathways and risk factorsthat lead to the final common BP phenotype and may haveimplications for the discovery of new treatments.

Recently, renal sympathetic denervation has shownconsiderable promise in treating refractory HTN [47].The sympathetic innervation of the kidney is implicatedin the pathogenesis of HTN by increasing plasma reninactivity that leads to sodium and water retention andreduces renal blood flow (RBF). The procedure involvesradiofrequency ablation of the renal sympathetic nerves,and has shown remarkable reductions in BP, but theunderlying mechanism is unclear. Recently, histone modi-fication has been shown to play an important role in theepigenetic modulation of WNK4 transcription in the devel-opment of salt-sensitive HTN. Isoproterenol-induced tran-scriptional suppression of WNK4 was shown to bemediated via inhibition of histone deacetylase-8 activity(HDAC8) at the WNK4 promoter [48], which in turn canstimulate thiazide-sensitive Na+-Cl+ cotransporter (NCC/SLC12A3) implying that sympathetic nerve activity canincrease BP partly by activating NCC. The evidence thatisoproterenol induces transcriptional suppression ofWNK4 and leads to activation of NCC offers an opportunityto combine genomics, epigenomics and NCC detection in

406

urinary exosomes to test the salt–sympathetic-system–BPaxis [48,49].

There are indications of epigenetic regulation involvinginteractions between a disruptor of telomeric-silencingalternative splice variant a (Dot1a) and the ALL-1 fusedgene from chromosome 9 (Af9) to produce a nuclear repres-sor complex that targets histone H3 Lys-79 methylation inthe promoter region of the sodium channel SCNN1A andsuppresses its transcriptional activity [50]. Aldosteronecan disrupt this nuclear complex, which results in histoneH3 Lys-79 hypomethylation at specific subregions andderepression of the SCNN1A promoter. This adds a novelepigenetic dimension to the complex transcriptional andpost-transcriptional regulation of the epithelial sodiumchannel by aldosterone.

Concluding remarksUnraveling the genetic basis of BP regulation and HTN hasbeen more difficult than might be suggested by their highheritability, but the progress in cataloging common var-iants using GWAS is comparable to other common traits.Ongoing studies include the GWAS meta-analysis of BPextremes and exome sequencing of BP extremes to identifymore sequence variants that are associated with BP. In-deed, it has been estimated that further increasing theGWAS sample size will identify 116 common variants forBP that have similar effect sizes to those found already, butthese will collectively explain only about 2.2% of the phe-notypic variance [6].

Some important issues to be addressed in future studiesinvestigating BP as a quantitative trait are to model BPmore accurately in subjects on antihypertensive treat-ments by taking into account the number of drugs, drugdosage and compliance metrics, or to make use of longitu-dinal BP data – for example long-term average and BPvariability (both visit-to-visit or 24 h intra-individual var-iability). Novel strategies and orthogonal study designs areneeded to discover causal and clinically useful geneticmarkers efficiently. This would require a move from pureBP quantitative traits in larger and larger cohorts todetailed studies of subjects selected on informative inter-mediate traits derived from the extensive interventionalstudies for high BP. SNPs near the genes encoding uro-modulin and natriuretic peptides show allelic associationwith urinary uromodulin and plasma natriuretic peptidesrespectively [11,31] and offer the opportunity for salt-intervention trials to dissect the underlying mechanismsfurther. Randomized clinical trials with stored DNA sam-ples offer a readily available resource to study not onlydrug response but also to dissect pathways of HTN basedon interindividual differences in response to drugs thattarget specific pathways.

The limited predictive utility of common variants thathave emerged from most GWAS would suggest that tobuild better predictive models it will be necessary to iden-tify orthogonal (i.e., uncorrelated) genetic variants that areassociated with new pathways as suggested for biomarkers[51]. The current despondency over poor prediction isprobably related to the early discovery of low-hanging fruitthat are perhaps more correlated with known pathways.The next level of discovery will be more challenging


because the molecular and functional dissection of thenovel variants will require more detailed low-throughputscience in contrast to the high-throughput screeningmethods applied so far.

References1 Evans, J.G. and Rose, G. (1971) Hypertension. Br. Med. Bull. 27, 37–422 Kearney, P.M. et al. (2005) Global burden of hypertension: analysis of

worldwide data. Lancet 365, 217–2233 Hottenga, J.J. et al. (2005) Heritability and stability of resting blood

pressure. Twin Res. Hum. Genet. 8, 499–5084 Kupper, N. et al. (2005) Heritability of daytime ambulatory blood

pressure in an extended twin design. Hypertension 45, 80–855 Padmanabhan, S. et al. (2008) Hypertension and genome-wide

association studies: combining high fidelity phenotyping andhypercontrols. J. Hypertens. 26, 1275–1281

6 Ehret, G.B. et al. (2011) Genetic variants in novel pathways influenceblood pressure and cardiovascular disease risk. Nature 478, 103–109

7 Johnson, T. et al. (2011) Blood pressure loci identified with a gene-centric array. Am. J. Hum. Genet. 89, 688–700

8 Kato, N. et al. (2011) Meta-analysis of genome-wide association studiesidentifies common variants associated with blood pressure variation inEast Asians. Nat. Genet. 43, 531–538

9 6Levy, D. et al. (2009) Genome-wide association study of blood pressureand hypertension. Nat. Genet. 41, 677–687

10 Newton-Cheh, C. et al. (2009) Genome-wide association studyidentifies eight loci associated with blood pressure. Nat. Genet. 41,666–676

11 Padmanabhan, S. et al. (2010) Genome-wide association study of bloodpressure extremes identifies variant near UMOD associated withhypertension. PLoS Genet. 6, e1001177

12 Salvi, E. et al. (2012) Genomewide association study using a high-density single nucleotide polymorphism array and case–control designidentifies a novel essential hypertension susceptibility locus in thepromoter region of endothelial NO synthase. Hypertension 59, 248–255

13 Dominiczak, A.F. and Munroe, P.B. (2010) Genome-wide associationstudies will unlock the genetic basis of hypertension: pro side of theargument. Hypertension 56, 1017–1020

14 Kurtz, T.W. (2010) Genome-wide association studies will unlock thegenetic basis of hypertension: con side of the argument. Hypertension56, 1021–1025

15 Gudbjartsson, D.F. et al. (2010) Association of variants at UMOD withchronic kidney disease and kidney stones-role of age and comorbiddiseases. PLoS Genet. 6, e1001039

16 Kottgen, A. et al. (2009) Multiple loci associated with indices of renalfunction and chronic kidney disease. Nat. Genet. 41, 712–717

17 Gudbjartsson, D.F. et al. (2009) Sequence variants affecting eosinophilnumbers associate with asthma and myocardial infarction. Nat. Genet.41, 342–347

18 Schunkert, H. et al. (2011) Large-scale association analysis identifies13 new susceptibility loci for coronary artery disease. Nat. Genet. 43,333–338

19 Stahl, E.A. et al. (2010) Genome-wide association study meta-analysisidentifies seven new rheumatoid arthritis risk loci. Nat. Genet. 42, 508–514

20 Dubois, P.C. et al. (2010) Multiple common variants for celiac diseaseinfluencing immune gene expression. Nat. Genet. 42, 295–302

21 Ganesh, S.K. et al. (2009) Multiple loci influence erythrocytephenotypes in the CHARGE Consortium. Nat. Genet. 41, 1191–1198

22 Ikram, M.K. et al. (2010) Four novel Loci (19q13, 6q24, 12q24, and5q14) influence the microcirculation in vivo. PLoS Genet. 6, e1001184

23 Teslovich, T.M. et al. (2010) Biological, clinical and populationrelevance of 95 loci for blood lipids. Nature 466, 707–713

24 Wain, L.V. et al. (2011) Genome-wide association study identifies sixnew loci influencing pulse pressure and mean arterial pressure. Nat.Genet. 43, 1005–1011

25 Pichler, I. et al. (2011) Identification of a common variant in the TFR2gene implicated in the physiological regulation of serum iron levels.Hum. Mol. Genet. 20, 1232–1240

26 Chambers, J.C. et al. (2009) Genome-wide association study identifiesvariants in TMPRSS6 associated with hemoglobin levels. Nat. Genet.41, 1170–1172

27 Coronary Artery Disease (C4D) Genetics Consortium (2011) A genome-wide association study in Europeans and South Asians identifies fivenew loci for coronary artery disease. Nat. Genet. 43, 339–344

28 Ripke, S. et al. (2011) Genome-wide association study identifies fivenew schizophrenia loci. Nat. Genet. 43, 969–976

29 Simon-Sanchez, J. et al. (2009) Genome-wide association studyreveals genetic risk underlying Parkinson’s disease. Nat. Genet. 41,1308–1312

30 Yasuno, K. et al. (2010) Genome-wide association study of intracranialaneurysm identifies three new risk loci. Nat. Genet. 42, 420–425

31 Newton-Cheh, C. et al. (2009) Association of common variants in NPPAand NPPB with circulating natriuretic peptides and blood pressure.Nat. Genet. 41, 348–353

32 Manolio, T.A. et al. (2009) Finding the missing heritability of complexdiseases. Nature 461, 747–753

33 Busst, C.J. et al. (2011) The epithelial sodium channel gamma-subunitgene and blood pressure: family based association, renal geneexpression, and physiological analyses. Hypertension 58, 1073–1078

34 Raychaudhuri, S. et al. (2009) Identifying relationships among genomicdisease regions: predicting genes at pathogenic SNP associations andrare deletions. PLoS Genet. 5, e1000534

35 Oikonen, M. et al. (2011) Genetic variants and blood pressure in apopulation-based cohort: the Cardiovascular Risk in Young Finnsstudy. Hypertension 58, 1079–1085

36 Taal, H.R. et al. (2012) Genome-wide profiling of blood pressure inadults and children. Hypertension 59, 241–247

37 Renigunta, A. et al. (2011) Tamm–Horsfall glycoprotein interacts withrenal outer medullary potassium channel ROMK2 and regulates itsfunction. J. Biol. Chem. 286, 2224–2235

38 Boyden, L.M. et al. (2012) Mutations in kelch-like 3 and cullin 3 causehypertension and electrolyte abnormalities. Nature 482, 98–102

39 Ji, W. et al. (2008) Rare independent mutations in renal salt handlinggenes contribute to blood pressure variation. Nat. Genet. 40, 592–599

40 Lifton, R.P. et al. (2001) Molecular mechanisms of humanhypertension. Cell 104, 545–556

41 Eyre-Walker, A. (2010) Evolution in health and medicine Sacklercolloquium: genetic architecture of a complex trait and itsimplications for fitness and genome-wide association studies. Proc.Natl. Acad. Sci. U.S.A 107 (Suppl. 1), 1752–1756

42 Xin, M. et al. (2009) MicroRNAs miR-143 and miR-145 modulatecytoskeletal dynamics and responsiveness of smooth muscle cells toinjury. Genes Dev. 23, 2166–2178

43 Wang, G. et al. (2010) Intrarenal expression of miRNAs in patients withhypertensive nephrosclerosis. Am. J. Hypertens. 23, 78–84

44 Marques, F.Z. et al. (2011) Gene expression profiling reveals reninmRNA overexpression in human hypertensive kidneys and a role formicroRNAs. Hypertension 58, 1093–1098

45 Li, S. et al. (2011) Signature microRNA expression profile of essentialhypertension and its novel link to human cytomegalovirus infection.Circulation 124, 175–184

46 Cheng, J. et al. (2009) Cytomegalovirus infection causes an increase ofarterial blood pressure. PLoS Pathog. 5, e1000427

47 Esler, M.D. et al. (2010) Renal sympathetic denervation in patientswith treatment-resistant hypertension (The Symplicity HTN-2 Trial):a randomised controlled trial. Lancet 376, 1903–1909

48 Mu, S. et al. (2011) Epigenetic modulation of the renal beta-adrenergic–WNK4 pathway in salt-sensitive hypertension. Nat. Med. 17, 573–580

49 Ellison, D.H. and Brooks, V.L. (2011) Renal nerves, WNK4,glucocorticoids, and salt transport. Cell Metab. 13, 619–620

50 Zhang, D. et al. (2009) Epigenetics and the control of epithelial sodiumchannel expression in collecting duct. Kidney Int. 75, 260–267

51 Gerszten, R.E. and Wang, T.J. (2008) The search for newcardiovascular biomarkers. Nature 451, 949–952

52 Leitschuh, M. et al. (1991) High-normal blood pressure progression tohypertension in the Framingham Heart Study. Hypertension 17, 22–27

53 Franklin, S.S. et al. (1997) Hemodynamic patterns of age-relatedchanges in blood pressure. The Framingham Heart Study.Circulation 96, 308–315

54 Burt, V.L. et al. (1995) Prevalence of hypertension in the US adultpopulation. Results from the Third National Health and NutritionExamination Survey, 1988–1991. Hypertension 25, 305–313

55 Kaminer, B. and Lutz, W.P. (1960) Blood pressure in Bushmen of theKalahari Desert. Circulation 22, 289–295

407


56 Truswell, A.S. et al. (1972) Blood pressures of Kung bushmen inNorthern Botswana. Am. Heart J. 84, 5–12

57 Poulter, N.R. et al. (1990) The Kenyan Luo migration study:observations on the initiation of a rise in blood pressure. BMJ 300,967–972

58 Crews, D.E. and Mancilha-Carvalho, J.J. (1993) Correlates of bloodpressure in Yanomami Indians of northwestern Brazil. Ethn. Dis. 3,362–371

59 Carvalho, J.J. et al. (1989) Blood pressure in four remote populations inthe INTERSALT Study. Hypertension 14, 238–246

60 Laville, M. et al. (1994) Epidemiological profile of hypertensive diseaseand renal risk factors in black Africa. J. Hypertens. 12, 839–843

61 Neel, J.V. (1962) Diabetes mellitus: a ‘thrifty’ genotype rendereddetrimental by ‘progress’? Am. J. Hum. Genet. 14, 353–362

62 Nakajima, T. et al. (2004) Natural selection and population history inthe human angiotensinogen gene (AGT): 736 complete AGT sequences

408

in chromosomes from around the world. Am. J. Hum. Genet. 74, 898–916

63 Weder, A.B. (2007) Evolution and hypertension. Hypertension 49, 260–26564 Young, J.H. et al. (2005) Differential susceptibility to hypertension is

due to selection during the out-of-Africa expansion. PLoS Genet. 1, e8265 Pickering, G.W. (1955) The genetic factor in essential hypertension.

Ann. Intern. Med. 43, 457–46466 Oldham, P.D. et al. (1960) The nature of essential hypertension. Lancet

1, 1085–109367 Adeyemo, A. et al. (2009) A genome-wide association study of

hypertension and blood pressure in African Americans. PLoS Genet.5, e1000564

68 Krzywinski, M. et al. (2009) Circos: an information aesthetic forcomparative genomics. Genome Res. 19, 1639–1645

69 Kent, W.J. et al. (2002) The human genome browser at UCSC. GenomeRes. 12, 996–1006

Editor Rhiannon Macrae

Executive EditorFeng Chen

Journal ManagerBasil Nyaku

Journal AdministratorsRia Otten and Patrick Scheffmann

Advisory Editorial BoardK.V. Anderson, New York, USAA. Clark, Ithaca, USAG. Fink, Cambridge, USAW.J. Gehring, Basel, SwitzerlandD. Goldstein, Durham, USAL. Guarente, Cambridge, USAY. Hayashizaki, Yokohama, Japan S. Henikoff, Seattle, USAJ. Hodgkin, Oxford, UKH.R. Horvitz, Cambridge, USAL. Hurst, Bath, UKM. Justice, Houston, USAE. Koonin, Bethesda, USAE. Meyerowitz, Pasadena, USAS. Moreno, Salamanca, SpainC. Scazzocchio, Orsay, FranceJ. Smith, Cambridge, UKM. Takeichi, Kobe, JapanD. Tautz, Plön, GermanyO. Voinnet, Strasburg, France

Editorial EnquiriesTrends in GeneticsCell Press600 Technology Square, 5th floorCambridge MA 02139, USATel: +1 617 397 2818Fax: +1 617 397 2810E-mail: [email protected]

Cover: During conjugation, members of the ciliate genus Oxytricha inherit a genome that looks like typical eukaryotic chromatin but is replete with fragmented and scrambled genes. The subsequent developmental process produces a rearranged somatic genome containing on the order of twenty million of the shortest known telomere-bearing chromosomes. On pages 382–388, Aaron Goldman and Laura Landweber describe recent progress toward understanding Oxytricha’s genomic dimorphism and discuss its various implications for our understanding of ancient genome evolution and early life. The cover shows an SEM image of Oxytricha, false-colored with Photoshop, courtesy of Bob Hammersmith.

August 2012 Volume 28, Number 8 pp. 361–418

Reviews

Letter

364 Human limb abnormalities caused by disruption of hedgehog signaling

374 Replication timing and its emergence from stochastic processes

382 Oxytricha as a modern analog of ancient genome evolution

389 Regulation of chromatin structure by long noncoding RNAs: focus on natural antisense transcripts

397 Genetic basis of blood pressure and hypertension

409 Mechanisms of transcriptional precision in animal development

Eve Anderson, Silvia Peluso, Laura A. Lettice and Robert E. Hill

John Bechhoefer and Nicholas Rhind

Aaron David Goldman and Laura F. Landweber

Marco Magistri, Mohammad Ali Faghihi, Georges St Laurent III and Claes Wahlestedt

Sandosh Padmanabhan, Christopher Newton-Cheh and Anna F. Dominiczak

Mounia Lagha, Jacques P. Bothma and Michael Levine

361 Is ‘forward’ the same as ‘plus’?…and other adventures in SNP allele nomenclature

Sarah C. Nelson, Kimberly F. Doheny, Cathy C. Laurie and Daniel B. Mirel

417 Corrigendum: Human evolutionary genomics: ethical and interpretive issues. [Trends in Genetics 28 (2012)137–145]

Joseph J. Vitti, Mildred K. Cho, Sarah A. Tishkoff and Pardis C. Sabeti

Erratum

Mechanisms of transcriptionalprecision in animal developmentMounia Lagha1, Jacques P. Bothma2 and Michael Levine1

1 Center for Integrative Genomics, Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA2 Biophysics Graduate Group, Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA

Review

Glossary

Canalization: a measure of the ability of a population to produce the same

phenotype regardless of fluctuations in its environment, genotype or other

sources of variability. Our use of the term ‘robustness’ conveys the same

essential meaning.

Enhancer: the predominant regulatory DNA for controlling gene expression. It

has the defining property of driving reporter expression in transgenic assays

from a heterologous promoter.

Gene regulatory network: interacting genes and their associated regulatory

DNAs that are responsible for a specific developmental process such as the

specification of gut or muscle.

Paused polymerase: RNA Pol II that has initiated transcription, but arrests after

producing a small nascent RNA of 30–50 nt. The Pol II is ‘ready to go’ but needs

additional regulators to undergo elongation.

Pioneer factor: a specialized TF (sequence-specific) that binds to nucleosomal

DNA and prepares enhancers for rapid and timely deployment.

Poising: preparing genes for rapid and timely transcription. This can be

achieved by priming the promoter, the enhancer, or both.

Redundancy: two genes are considered to be redundant if they play similar

functions and are able to replace one another. This can be extended to a

genetic interaction or enhancers or binding sites within enhancers. However,

we do not believe in true redundancy. Instead, genes or regulatory DNAs might

appear to possess redundant, or overlapping, activities in the laboratory, but

We review recently identified mechanisms of transcrip-tional control that ensure reliable and reproducible pat-terns of gene expression in natural populations ofdeveloping embryos, despite inherent fluctuations ingene regulatory processes, variations in genetic back-grounds and exposure to diverse environmental condi-tions. These mechanisms are not responsible forswitching genes on and off. Instead, they control thefine-tuning of gene expression and ensure regulatoryprecision. Several such mechanisms are discussed, in-cluding redundant binding sites within transcriptionalenhancers, shadow enhancers, and ‘poised’ enhancersand promoters, as well as the role of ‘redundant’ geneinteractions within regulatory networks. We proposethat such regulatory mechanisms provide populationfitness and ‘fine-tune’ the spatial and temporal controlof gene expression.

Transcriptional precisionThe basic mechanisms for switching genes on and offduring development were intensively studied in the1980s and 1990s. The enhancer was shown to play a keyrole in integrating complex regulatory information to gen-erate cell-specific patterns of gene expression [1]. However,in natural populations enhancer–promoter interactionscan be affected by changes in temperature and variationsin genetic background, but the developmental programremains unperturbed. What is the basis for this stabilityin developmental programming?

Our central premise is that the mechanisms used toprovide stability in gene expression in natural popula-tions also produce greater precision in developmentalpatterning mechanisms. By transcriptional precision werefer to the formation of sharp borders of gene expres-sion, the exact timing of gene activation, coordinateexpression of groups of genes within a developing tissue,and homogenous expression of a given gene across a fieldof coordinately developing cells. The advent of whole-genome technologies and improved imaging methods hasprovided recent insights into more subtle aspects ofdifferential gene activity, namely the reproducibledeployment of developmental programs in naturalpopulations.

Corresponding author: Levine, M. ([email protected]).Keywords: enhancer; paused polymerase; pioneer factors; robustness; gene regulatorynetworks

0168-9525/$ – see front matter . Published by Elsevier Ltd. doi:10.1016/j.tig.2012.03.006 Trends

Redundant genetic interactionsGenetic analysis of Drosophila embryogenesis led to aconceptual breakthrough in our understanding of animaldevelopment [2]. The subdivision of the embryo into aseries of body segments was first envisaged to be a regula-tory cascade or genetic pathway, with maternal determi-nants, such as Bicoid, that establish sequential patterns ofgap gene expression, pair-rule stripes and ultimately,segment-polarity stripes of gene expression (e.g. [3]). Thisview of a sequential pathway gave way to one of genenetworks, whereby both maternal and zygotic activatorsand repressors interact with complex enhancers to producelocalized stripes of gene expression [4]. In recent years,gene networks have been visualized as complex wiring-diagrams [5].

Such networks often contain seemingly redundantinteractions. Moreover, two related transcription factorsare sometimes seen to activate the expression of down-stream target genes in the same cells at the same time.Removal of one copy of the regulatory gene often fails toproduce an obvious or fully penetrant phenotype. None-theless, the gene might augment population fitness, whichis what natural selection ultimately acts on.

not in natural populations subject to stress.

Shadow enhancer: an enhancer that is sometimes located far from the gene it

regulates. The term ‘shadow’ is a metaphor which reflects that, historically,

these distal enhancers tended to be discovered after the proximal/primary

enhancer and in unexpected locations such as in the introns of neighboring

genes.

in Genetics, August 2012, Vol. 28, No. 8 409



skn-1

med-1/2

end-3 end-1

elt-2

Intestinal differentiation

skn-1

med-1/2

end-3 end-1

elt-2

Variable intestinal differentiation

(Variable)

(Variable)

Wildtype End-3 mutant

TRENDS in Genetics

(a) (b)

Figure 1. Redundant interactions in gene regulatory networks. Summary of the genetic cascade governing intestinal cell specification in C. elegans (see ref. [6]). (a) Wild-

type network. skn-1 is maternally deposited and, in concert with other maternal and zygotic factors, activates the expression of transcription factors end-3 and end-1, both of

which activate elt-2, the key regulator of intestine differentiation. (b) In end-3 mutants, end-1 can compensate and intestine differentiation is essentially normal. However,

end-1 expression becomes significantly more variable, resulting in erratic expression of elt-2 and abnormal intestine differentiation in some individuals.


Redundant interactions in gene regulatory networkshave been suggested to provide stability and precision inmetazoan development [5]. An illustrative example is seenfor gut specification in Caennorhabditis elegans [6]. Theintestine is composed of 20 cells that arise from a singleprogenitor in the early embryo. Intestinal identity is speci-fied by a simple regulatory network, beginning with thematernal deposition of skn-1 transcripts and culminatingin the expression of elt-2, which activates hundreds of‘target’ genes required for gut differentiation (Figure 1a).

The activation of elt-2 depends on two related transcrip-tion factors, end-1 and end-3, which function in a largelyredundant fashion. The consequences of disrupting eitherend-1 or end-2 gene activity have been examined, andevidence was obtained for increased ‘noise’ in gut specifi-cation from measurements of single mRNAs in individualembryos [6]. In particular, end-3 mutants show variabilityin both the timing and levels of elt-2 expression (Figure 1b),which might explain why 5% of end-3 mutants lack intes-tinal cells. Similarly, the overlapping activities of twoT-box transcription factors, tbx-8 and tbx-9, appear tobuffer stochastic variations in muscle differentiation [7].

These results suggest that redundant gene interactionswithin developmental networks can stabilize gene expres-sion in natural populations. Such redundancy might alsoplay a key role in ensuring transcriptional precision. Thatis, the combination of end-1 plus end-3 might ensure theprecise timing and exact levels of elt-2 expression. Thereare numerous examples of potential redundancies in genenetworks (e.g. [5]). Are these required for developmentalpatterning, or do they represent a means to stabilizecomplex processes despite genetic and environmental var-iations? These are not mutually exclusive concepts.

Intra-enhancer redundancyA typical developmental enhancer is several hundred bp inlength and contains multiple binding sites for two or more

410

sequence-specific transcription factors (reviewed in [1]).Some of the binding sites appear to be redundant, in thatmutations in a subset of the sites do not qualitatively alterthe expression patterns produced by the modified enhan-cers (e.g. [8]). What is the purpose of these ‘extra’ sites,which are often highly conserved? Evidence is gatheringthat in some cases they ensure robustness or stability inresponse to genetic and environmental variation. A recentanalysis of the eve stripe 2 enhancer provides a particularlycompelling example [9] (Figure 2).

The full-length eve stripe 2 enhancer is over 700 bp inlength and contains several binding sites for each of fourkey regulators: Bicoid, Hunchback, Kru ppel, and Giant[10,11]. It produces a robust and authentic stripe 2 patternwhen attached to a reporter gene and expressed in trans-genic Drosophila embryos. Removal of �200 bp from the 30

end of the enhancer, which contains several TF bindingsites, diminishes the levels of expression, but the resulting�500 bp minimal enhancer produces an essentially normalpattern of expression [11]. BAC transgenesis and geneticcomplementation assays have been used to examine thecontributions of the minimal enhancer and 30 ‘extension’[9].

Removal of the �500 bp minimal enhancer from a ‘res-cuing’ BAC transgene results in lethality due to a severelydiminished stripe 2 pattern. Mutant eve–/eve– embryoscarrying this BAC fail to hatch due to defects in the firstthoracic segment. Interestingly, removal of the �200 bp 30

extension does not cause lethality under optimal cultureconditions, and the viability of these flies is comparable tothat of wild-type flies. These results suggest that theminimal �500 bp eve stripe 2 enhancer is sufficient forsegmentation, at least in the absence of environmentalstress. However, there is a breakdown in the function of theminimal enhancer at elevated temperatures and in ‘sensi-tized’ genetic backgrounds. Thus, binding sites in the30 extension are not redundant under all conditions, but

eveMinimal enhancer

eve480bp

Redundant binding sites

Normal at optimal conditions

Under stress : reduced viability

Non viable

Normaleve BAC

eve BAC,no minimal stripe 2

eve BAC,no extension

eve stripe 2 enhancer

Extension

eve211bp

554bp

(a)

(b)

(c)

TRENDS in Genetics

Figure 2. Importance of redundant binding sites for robustness. (a) Diagram of a BAC transgene containing the entire eve locus, including 50 and 30 stripe enhancers. Only

the stripe 2 regulatory region is shown. The ‘full-length’ enhancer contains both the minimal �500 bp enhancer (green) and �200 bp 30 extension (blue). The yellow ovals

represent a subset of the TF binding sites in the stripe 2 regulatory DNA. (b) Removal of the minimal eve stripe 2 enhancer results in lethality, and embryos die with defects

in the thorax (derived from the region of stripe 2 expression). (c) Removal of the 30 extension does not impair embryogenesis under optimal culturing conditions, and

normal adult flies are obtained. However, under genetic stress, only 5% of the flies survive. Thus, ‘redundant’ binding sites in the 30 extension are required for robustness.


instead they appear to ensure reliable expression of evestripe 2 under stress (Figure 2). This is likely to be ageneral mechanism of robustness or ‘canalization’ in de-velopment ([9,12,13] for a definition of canalization). So-called redundant binding sites in developmental enhan-cers are probably used in natural populations to cope withvariability.

Shadow enhancersA related mechanism for ensuring robustness is the use ofmultiple enhancers for a single pattern of gene expression. Avariety of recently developed whole-genome assays (Box 1)permit the systematic identification of developmental

Box 1. Whole-genome identification of enhancers

During the past 10 years a variety of ‘post-genome’ methods have

been devised for the systematic identification of enhancers

(which can exist both 50 or 30 of the gene or within the transcription

unit). Transgenic assays are required to confirm their identities.

Putative enhancers are attached to a minimal promoter and reporter

gene, and introduced (via injection or electroporation) into a

developing embryo. Either stable or transient transgenic embryos

are assayed for reporter gene expression. Below we provide a brief

review of some post-genome methods for identifying putative

enhancers.

Computational methods: enhancers often contain a high density of

transcription factor binding sites, typically one for every 30–50 bp

across the length of the enhancer (200–300 bp or more). Algorithms

have been developed for identifying high-density clusters of putative

binding sites [58,59]. These methods work, but typically only 10–30%

of ‘hits’ represent authentic enhancers when tested in transgenic

embryos.

ChIP-Seq: permits the genome-wide identification of binding sites

for sequence-specific transcription factors, or histone modifications

(e.g. [40,41]). ChIP-Seq using antibodies against early Drosophila

patterning determinants (e.g. Dorsal, Twist and Snail) led to the

identification of shadow enhancers for a number of genes engaged in

enhancers (e.g. [14–16]). Such approaches suggest thatmany of the crucial developmental patterning genes inDrosophila are regulated by multiple enhancers that directextensively overlapping patterns of gene expression andemploy a similar regulatory ‘logic’ (e.g. [17]). The newlyidentified enhancers are sometimes termed ‘shadow enhan-cers’ because they map to more remote locations than the‘classical’ or primary enhancers situated close to the gene[18,19]. Several examples are discussed below.

The shavenbaby locus (also known as ovo) is importantfor the specification of dorsal hairs in the cuticle of embryosand larvae [20]. It is regulated by a complex array ofenhancers with extensively overlapping activities. It is

dorsal–ventral patterning [17]. In some systems it has been possible

to identify active enhancers on a genome-wide scale for a given tissue

by identifying particular histone modifications, or the enzymes

responsible for these modifications (e.g. [16]).

Chromosome conformation capture (3C) assays: can identify the

sequences in a genome that interact with specific promoters. It relies

on the stabilization of transient ‘loops’ of distal enhancers to target

promoters using formaldehyde cross-linking, similar to the chromatin

cross-linking used for ChIP-Seq assays. 4C (chromosome conforma-

tion capture-on-chip) methods were used to identify multiple and

overlapping enhancers for the regulation of Hoxd genes in the mouse

limb bud [25]. 3C and 4C assays provide an estimate of the overall

interactions that occur in vivo but do not reveal the dynamics of these

long-range interactions.

MNase-Seq and FAIRE assays: micrococcal nuclease (MNase)

induces double-strand breaks within nucleosome linker regions

and single-strand nicks within the nucleosome and can be used to

identify ‘nucleosome-free’ regions. In some cases, these regions

coincide with ‘poised’ enhancers due to the binding of pioneer

transcription factors (e.g. [60]). FAIRE (formaldehyde-assisted isola-

tion of regulatory elements) also identifies nucleosome-free regions,

or ‘open’ chromatin [61].

411

10% failure 1% failure10% failure10% failure

(a)

(b)

Enhancer Enhancer Enhancer

TRENDS in Genetics

Figure 3. Model for enhancer synergy. (a) Schematic showing that the primary and shadow enhancers (green boxes) possess the same regulatory logic (TF binding sites are

illustrated by colored circles). (b) To activate transcription, an enhancer loops to its cognate promoter. This interaction has a typical failure rate of 10%. In the presence of

two enhancers regulating the same gene at the same time (primary and shadow), the combined failure rate is 1% (10% x 10% = 1%). This assumes that the two enhancers

work independently of one another.


possible to remove some of these enhancers and still obtainessentially normal cuticle patterns at optimal tempera-tures. However, these patterns are disrupted when theembryos are grown at either low (15 8C) or elevated (30 8C)temperatures. Moreover, normal embryos are resilient togenetic changes, such as reductions in the levels of Wing-less, but produce abnormal cuticles upon removal of sha-venbaby ‘shadow’ enhancers. Thus, the shadow enhancersensure reliable expression when embryos are subject togenetic and environmental variation.

A similar situation is seen for the regulation of snail,which encodes a zinc finger transcription factor that estab-lishes the boundary between the presumptive mesodermand neurogenic ectoderm [12]. The snail gene is regulatedby a proximal enhancer located near the transcription startsite, as well as by a recently identified shadow enhancerlocated 5 kb upstream of the start site within the firstintron of a neighboring gene. Quantitative imaging assaysand genetic complementation experiments suggest thatthe two enhancers ensure reliable and uniform activationof snail expression in embryos containing only one mater-nal dose of Dorsal, or when subject to high temperatures(30 8C) [12,21]. Removal of either enhancer, particularlythe distal shadow enhancer [21], causes defects in gastru-lation under adverse conditions.

Shadow enhancers have also been implicated in verte-brate developmental processes. For example, the neuro-genic regulatory gene, ATOH7 (Math5), is essential for thedevelopment of the mammalian retina [22]. A geneticdisease causing blindness at birth (nonsyndromic congeni-tal retinal nonattachment) results from the deletion of aremote ‘shadow’ enhancer located more than 20 kb awayfrom the ATOH7 transcription unit [23]. The shadowenhancer directs a very similar spatiotemporal patternof gene expression as the ‘primary’ proximal enhancer inthe developing retina of a mouse. This result suggests thatthe primary enhancer alone cannot sustain sufficient levelsof ATOH7 expression for normal development in the ab-sence of the shadow enhancer. Thus, the two enhancersseem to be redundant in terms of the location and timing ofthe expression patterns they direct, but both are requiredto reinforce ATOH7 expression and achieve correct levels ofexpression during crucial stages of eye development.

There are additional examples of multiple enhancers forkey vertebrate patterning genes. For example, deletion of a

412

limb enhancer of the paired-box homeodomain transcrip-tion factor Prx has no obvious effect on Prx expressionlevels or on limb development in mice [24], suggesting theexistence of additional, shadow enhancers. More recently,4C assays (Box 1) identified multiple putative enhancersfor Hoxd13 expression within a distal gene ‘desert’ thatcontains known regulatory elements, GCR and Prox [25].Deletions of GCR and Prox have little effect on Hoxd13expression in digits, thereby suggesting the occurrence ofredundant regulatory elements. Indeed, complete abolitionof Hoxd13 expression in digits is achieved only when thegene desert, together with the GCR and Prox regions, arecompletely deleted (830 kb deletion).

The preceding examples suggest that multiple enhan-cers represent a simple means for improving the reliabilityof gene expression. The underlying mechanism is uncer-tain, but they might increase the probability of gene acti-vation at any given time during critical windows ofdevelopment and make it more robust to perturbation.For example, if a typical enhancer has a 10% failure rateto loop and engage its target promoter, and if the proximaland distal enhancers function more or less independentlyof one another, then there is a combined failure rate of only1% (e.g. [12]). That is, two enhancers function in an inher-ently multiplicative manner to activate gene expression(Figure 3). Such a mechanism also provides robustness.For example, if the failure rate of each individual enhancerincreases to 30% due to stress, then the combined failurerate is only 9%.

An alternative explanation is that multiple enhancersensure high levels of expression above a minimal thresholdrequired for genetic function (as suggested in the case ofATOH7 regulation). In reality, multiple enhancers could beimportant both for the reliable activation of gene expres-sion and for maintaining high levels of expression. We stilldo not understand the details of how an enhancer switcheson a gene and affects levels of expression, and thereforethis is very much an open question. The source of shadowenhancers is uncertain, but it has been proposed that theymight arise from ‘cryptic’ duplication events [18].

Rendering genes ‘poised’ for activationTiming is crucial in development, and recent studies haveidentified several mechanisms that ensure faithful activa-tion of gene expression upon receipt of key inducing signals.

Promoter

+1

Exon

Nucleosome

Pol II

ser-5PNelfDSIF

mRNA (30nt)

Poised promoter

Nucleosome ‘’free’’ paused promoter

Poised enhancer

Key:

Pioneer TF (ex: FoxA, Zelda?)

Multiple chromatin marks

Enhancer

Pol II

TRENDS in Genetics

Figure 4. Summary of mechanisms of transcriptional priming. Gene transcription depends on enhancers (blue) and promoters (purple). The transcription start site (TSS) is

indicated by an arrow labeled +1. The promoter can be primed or ‘poised’ for transcription by the recruitment of Pol II before gene expression. This ‘promoter pausing’

generates a small mRNA (around 30–50 nt) and then elongation is blocked by the binding of negative elongation factors such as Nelf and DSIF. The enhancer can be

‘prepared’ for activation by the binding of pioneer factors (represented by gray boxes), by recruitment of Pol II, or by the modification of the chromatin landscape

(positioned nucleosomes and associated histone marks). These three features at enhancers may be linked, but for simplicity we illustrate them sequentially. Nucleosomes

are represented by hexagons and histone marks with colored flags. A simplified scheme of a paused promoter is represented in the gray box.


We consider mechanisms that optimize induction of distalenhancers and the core promoter (Figure 4). In some cases,both are ‘primed’ for efficient activation.

Paused promotersMany metazoan genes contain paused RNA polymerase II(Pol II) prior to their activation [26–28]. This paused Pol II

Box 2. Methods for identifying paused promoters

Many developmental patterning genes contain paused Pol II before

their activation during Drosophila embryogenesis (reviewed in [27]).

There is also evidence that a significant number of inactive or weakly

expressed genes contain paused Pol II in mammalian tissues,

including embryonic stem cells. Several different methods have been

used to identify paused genes, as summarized below.

Pol II ChIP-Seq assays: the simplest method is the genome-wide

identification of Pol II binding. This is typically done with a mixture of

antibodies recognizing different isoforms of Pol II (e.g. nonpho-

sphorylated, ser-5P, ser-2P). Active genes contain Pol II extending

signals across the length of their transcription units. Inactive genes

fall into two classes: those completely lacking Pol II and those

containing Pol II near the +1 transcription start site (e.g. [26]). These

latter genes can be regarded as stalled or ‘provisionally’ paused.

However, it is unclear whether Pol II has engaged the DNA template

and undergone promoter escape, or if the signals detected in the

promoter region represent an equilibrium of unstable Pol II associat-

ing and dissociating from the template. Additional methods are

required to determine whether Pol II is truly paused, that is, activated

polymerase containing a capped nascent transcript and arresting

�30–50 bp downstream of +1.

Permanganate protection assays: stably paused Pol II is associated

with a ‘transcription bubble’ of �20 bp due to the local denaturation

is an active form of the enzyme that halts �30–50 bpdownstream of the +1 transcription start site (Figure 4;Box 2). It is present in �30% of all genes in embryonic stemcells and about 15% of genes in the early Drosophilaembryo [26,29,30]. The purpose is uncertain, but manydevelopmental patterning genes contain paused Pol II. Ithas been suggested that it fosters rapid and synchronous

of the double helix by the active polymerase. It is possible to detect

the bubble by the modification of exposed, single-stranded thymidine

residues with potassium permanganate. This method has been used

to identify transcription bubbles for a number of genes containing

stalled Pol II in Drosophila embryos and cultured S2 cells [62].

Direct sequencing: small nuclear RNAs containing 50 caps are

isolated, cloned, and then subjected to deep sequencing [63]. This

method identified +34 as a common site of paused Pol II, with the DPE

(downstream promoter element) or PB (pause button) motifs being

the last nucleotides transcribed before arrest. A significant fraction of

paused genes contain GAGA, INR, and DPE/PB motifs within or near

their core promoters.

Gro-Seq assays: this has emerged as the method of choice for the

systematic identification of paused Pol II [30,64]. However, it is not for

the faint of heart. The method is a whole-genome nuclear run-on assay.

Nuclei are harvested from embryos, tissues, or cultured cells, and

treated with Sarkosyl to block de novo binding of Pol II. A modified

nucleotide (e.g. bromouridine) is added along with a mixture of ATP

and other agents to permit the elongation of pre-existing polymerases

already engaged on DNA templates. These polymerases are allowed to

extend �50–100 nucleotides; the RNAs are then isolated using anti-

bromo antibodies and subjected to deep sequencing. The resulting

sequence information provides the exact locations of paused Pol II.

413


activation of gene expression [31]. The idea is that regu-lating Pol II release, rather than recruitment, permitsrapid induction of gene expression. This hypothesis hasbeen explored using detailed mathematical modeling oftranscription [32], but it still remains to be tested experi-mentally.

A nonexclusive alternative view is that paused Pol II isinvolved in recruiting chromatin-modifying enzymes thatexpedite transcription. For example, the chromatin land-scape of the Hsp70 locus (the prototypic paused gene inDrosophila) is rapidly altered following heat shock, througha mechanism independent of transcription [33]. This rapidchange is key to the effective activation of Hsp70 expressionupon heat shock. Moreover, there is an inverse correlationbetween paused Pol II and positioned nucleosomes at thecore promoter [34,35]. An increase in positioned nucleo-somes has been observed upon destabilization of pausedPol II (e.g. NelfE knockdown in S2 cells) [34]. Conversely,diminished levels of the Polycomb repressor (in esc mutantembryos) correlates with augmented levels of paused Pol II[35]. It would appear that the promoter regions of develop-mentally regulated genes contain either paused Pol II orpositioned nucleosomes, but the basis for this regulatoryswitch is uncertain.

These studies raise the possibility that paused Pol IImight prepare genes for activation by establishing an‘open’ configuration at the promoter. However, this possi-bility has not yet been critically tested.

Poised enhancersThere is also evidence that enhancers can be prepared forrapid deployment before gene activation (Figure 4). Forexample, the forkhead transcription factor FoxA binds tothe Albumin enhancer in the primitive endoderm of mouseembryos where it is inactive (reviewed in [36]). FoxA is anexample of a ‘pioneer’ factor [37]; it binds to inactiveenhancers and renders them ‘poised’ for rapid inductionupon the appearance of key activators, such as thosemediating cell signaling.

To bind inactive enhancers, pioneer factors have thedefining property of binding to nucleosomal DNA andcompact chromatin, and remain bound even during mito-sis. Since the initial discovery of FoxA and GATA factors aspioneer factors in the liver differentiation program, addi-tional examples have been described [38,39].

Zelda is a maternal zinc finger transcription factor thatis essential for the activation of �100 genes 2–3 h afterfertilization during Drosophila embryogenesis (maternalto zygotic transition) [40–42]. It binds to the enhancerregions of many or most developmental control genesbefore their activation. Disrupting Zelda binding sitescan delay the onset of expression, or cause sporadic pat-terns of activation [40,41]. Thus, Zelda renders develop-mental enhancers poised for activation by maternaldeterminants such as Bicoid and Dorsal, and may functionas a pioneer factor. It might also help ensure reliablepatterns of gene activation in natural populations understress, but this idea has not yet been tested.

The mechanisms by which pioneer factors prepareenhancers for efficient activation are not known. It hasbeen suggested that they can displace nucleosomes and

414

thereby render adjacent binding sites available for occu-pancy [36,38]. A nonexclusive possibility is that pioneerfactors recruit chromatin-modifying enzymes that ‘mark’enhancers for rapid deployment. For example, inactiveliver and pancreas enhancers exhibit ‘active’ chromatinmodifications in the mouse foregut endoderm where theyare inactive [36]. This suggests ‘pre-patterning’ of theenhancers in progenitor tissues before their induction inthe liver and pancreas. The P300 histone acetyltransferaseand the EZH2 histone methyltransferase have been impli-cated in these modifications [43]. It is conceivable that suchmodifications are not strictly required for gene expression,but might improve the precision and stability of geneexpression in natural populations.

More recently it has been suggested that histone mod-ifications and Pol II help to prime distal enhancers [44](Figure 4). In this study, whole-genome Chip-Seq assayswere performed on isolated tissues obtained from stagedDrosophila embryos. The timing of gene expression corre-lated with Pol II binding and two types of chromatin marksin enhancers. Pol II occupancy at enhancers is counterin-tuitive, but multiple studies, in human ES cells [45] andmice [45,46], suggest that enhancers can be bound by Pol IIand are sometimes transcribed. Additional members of thegeneral transcription machinery, such as the TATA bind-ing protein TAF3 [47], are also seen at particular enhan-cers. It was suggested that these factors might fosterlooping interactions between distal enhancers and promo-ters, but it is currently unclear how Pol II and associatedfactors might render enhancers poised for activation. It ispossible that they are recruited to enhancers by pioneerTFs, but this idea awaits further studies.

When stochastic expression is ‘purposeful’Many developmental patterning genes in Drosophila con-tain paused Pol II, shadow enhancers, or both. We havediscussed how these mechanisms might foster the preci-sion and stability of gene expression in development. How-ever, there are examples of developmental control genesthat exhibit sporadic or stochastic patterns of expression.Some might exhibit such expression because there is noselective pressure for them to be expressed in a precise andsynchronous manner. However, there are cases wherestochastic expression is used as a purposeful strategy forgenerating regulatory diversity among the cells of a popu-lation [48]. One of the most striking examples is seen in theeye of the adult fly [49–51].

Color vision depends on the differential expression ofrhodopsin-3 (Rh3) and Rh4 in the R7 photoreceptor cellsand the differential expression of Rh5 and Rh6 in the R8photoreceptor cells. These differential patterns depend onstochastic expression of spineless, which encodes a homeo-box transcription factor that activates Rh4 in R7 [52].Approximately 70% of the ommatidia express spineless,but the patterns of activation differ among adult flies.When spineless is expressed, Rh4 is activated in R7; ifnot, Rh3 is expressed instead. The identity of these distinctclasses of R7 cells dictates the identities of the underlyingR8 cells. When spineless and Rh4 are expressed in R7, thenRh6 is expressed in the associated R8 cell. Conversely,when spineless is absent and Rh3 is expressed in R7, then

Box 3. Outstanding questions

� How do multiple enhancers provide precision in gene expression:

do they increase the levels or probability of expression?

� Are genes with multiple enhancers more or less ‘evolvable’? Do

shadow enhancers increase the probability of evolving novel gene

activities?

� How do pioneer factors prime enhancers?

� How does paused Pol II prime the promoter?

� When are imprecise, stochastic modes of gene activation

advantageous in development?


Rh5 is expressed in the associated R8 cell. Thus, diversepatterns of rhodopsin expression are achieved by the sto-chastic expression of spineless. The underlying mechanismis uncertain.

There are other examples of the imporatance of stochas-tic expression in the control of developmental genes. Nota-bly, Nanog, one of the key determinants of pluripotentstem cells, exhibits stochastic expression in cultured EScells and in early mouse embryos [53,54]. There is acorrelation between elevated levels of Nanog expressionand self-renewal of pluripotent stem cells in culture[55,56]. By contrast, low levels correlate with a propensityfor the cells to differentiate.

Concluding remarksThe preceding examples are probably exceptional. Webelieve that most regulatory genes are ‘primed’ for rapidand precise deployment during development. Severalmechanisms were discussed, including redundancies ingene networks and developmental enhancers, shadowenhancers, and primed promoters and enhancers (viapaused Pol II and pioneer TFs). There is little doubt thatadditional mechanisms await discovery (Box 3).

There is something of a chicken and egg issue that wehave skirted. Namely, what is the source of these mecha-nisms of developmental precision? It is conceivable thatthey arose from the demands of natural populations,namely, to stabilize complex developmental processes inresponse to inherent (genetic) and extrinsic (environmen-tal) fluctuations. Alternatively, they might have arisenfrom the demands of the embryo, to produce timely anddynamic on/off patterns of gene expression underlying cellspecification processes. These are not mutually exclusiveconcepts. A regulatory mechanism selected to providestability in natural populations (e.g. shadow enhancer)might be incorporated into the core patterning process toproduce sharper borders of gene expression [12] or homog-enous patterns of activation [57]. Conversely, a mecha-nism selected for developmental precision (e.g. paused PolII) might foster robustness of expression in natural popu-lations. We suggest that the dynamic interplay betweenthe demands of natural populations and the embryo hasproduced the exquisite patterning processes that underlieanimal development.

References1 Levine, M. (2010) Transcriptional enhancers in animal development and

evolution. Curr. Biol. 20, R754–R7632 Nu sslein-Volhard, C. and Wieschaus, E. (1980) Mutations affecting

segment number and polarity in Drosophila. Nature 287, 795–801

3 Nu sslein-Volhard, C. and Roth, S. (1989) Axis determination in insectembryos. Ciba Found. Symp. 144, 37–55

4 Ip, Y.T. et al. (1992) The bicoid and dorsal morphogens use a similarstrategy to make stripes in the Drosophila embryo. J. Cell Sci. 16(Suppl.), 33–38

5 Davidson, E.H. (2009) Network design principles from the sea urchinembryo. Curr. Opin. Genet. Dev. 19, 535–540

6 Raj, A. et al. (2010) Variability in gene expression underlies incompletepenetrance. Nature 463, 913–918

7 Burga, A. et al. (2011) Predicting mutation outcome from earlystochastic variation in genetic interaction partners. Nature 480, 250–253

8 Arnosti, D.N. et al. (1996) The eve stripe 2 enhancer employs multiplemodes of transcriptional synergy. Development 122, 205–214

9 Ludwig, M.Z. et al. (2011) Consequences of eukaryotic enhancerarchitecture for gene expression dynamics, development, and fitness.PLoS Genet. 7, e1002364

10 Stanojevic, D. et al. (1991) Regulation of a segmentation stripe byoverlapping activators and repressors in the Drosophila embryo.Science 254, 1385–1387

11 Small, S. et al. (1992) Regulation of even-skipped stripe 2 in theDrosophila embryo. EMBO J. 11, 4047–4057

12 Perry, M.W. et al. (2010) Shadow enhancers foster robustness ofDrosophila gastrulation. Curr. Biol. 20, 1562–1567

13 Waddington, C.H. (1942) Canalization of development and theinheritance of acquired characters. Nature 150, 563–565

14 Zinzen, R.P. et al. (2009) Combinatorial binding predicts spatio-temporal cis-regulatory activity. Nature 462, 65–70

15 He, Q. et al. (2011) High conservation of transcription factor bindingand evidence for combinatorial regulation across six Drosophilaspecies. Nat. Genet. 43, 414–420

16 May, D. et al. (2011) Large-scale discovery of enhancers from humanheart tissue. Nat. Genet. 44, 89–93

17 Zeitlinger, J. et al. (2007) Whole-genome ChIP-chip analysis of Dorsal,Twist, and Snail suggests integration of diverse patterning processes inthe Drosophila embryo. Gene Dev. 21, 385–390

18 Hong, J.W. et al. (2008) Shadow enhancers as a source of evolutionarynovelty. Science 321, 1314

19 Barolo, S. (2011) Shadow enhancers: frequently asked questions aboutdistributed cis-regulatory information and enhancer redundancy.BioEssays 34, 135–141

20 Frankel, N. et al. (2010) Phenotypic robustness conferred by apparentlyredundant transcriptional enhancers. Nature 466, 490–493

21 Dunipace, L. et al. (2011) Complex interactions between cis-regulatorymodules in native conformation are critical for Drosophila snailexpression. Development 4084, 4075–4084

22 Riesenberg, A.N. et al. (2009) Rbpj cell autonomous regulation ofretinal ganglion cell and cone photoreceptor fates in the mouseretina. J. Neurosci. 29, 12865–12877

23 Ghiasvand, N.M. et al. (2011) Deletion of a remote enhancer nearATOH7 disrupts retinal neurogenesis, causing NCRNA disease. Nat.Neurosci. 14, 578–586

24 Cretekos, C.J. et al. (2008) Regulatory divergence modifies limb lengthbetween mammals. Gene Dev. 22, 141–151

25 Montavon, T. et al. (2011) A regulatory archipelago controls Hox genestranscription in digits. Cell 147, 1132–1145

26 Zeitlinger, J. et al. (2007) RNA polymerase stalling at developmentalcontrol genes in the Drosophila melanogaster embryo. Nat. Genet. 39,1512–1516

27 Levine, M. (2011) Paused RNA polymerase II as a developmentalcheckpoint. Cell 145, 502–511

28 Li, J. and Gilmour, D.S. (2011) Promoter proximal pausing and thecontrol of gene expression. Curr. Opin. Genet. Dev. 21, 231–235

29 Guenther, M.G. et al. (2007) A chromatin landmark and transcriptioninitiation at most promoters in human cells. Cell 130, 77–88

30 Min, I.M. et al. (2011) Regulating RNA polymerase pausing andtranscription elongation in embryonic stem cells. Gene Dev. 25, 742–754

31 Boettiger, A.N. and Levine, M. (2009) Synchronous and stochasticpatterns of gene activation in the Drosophila embryo. Science 325,471–473

32 Boettiger, A.N. et al. (2011) Transcriptional regulation: effects ofpromoter proximal pausing on speed, synchrony and reliability.PLoS Comput. Biol. 7, e1001136

415


33 Petesch, S.J. and Lis, J.T. (2008) Rapid, transcription-independent lossof nucleosomes over a large chromatin domain at Hsp70 loci. Cell 134,74–84

34 Gilchrist, D.A. et al. (2010) Pausing of RNA polymerase II disruptsDNA-specified nucleosome organization to enable precise generegulation. Cell 143, 540–551

35 Chopra, V.S. et al. (2011) The Polycomb group mutant esc leads toaugmented levels of paused Pol II in the Drosophila embryo. Mol. Cell.42, 837–844

36 Zaret, K.S. and Carroll, J.S. (2011) Pioneer transcription factors:establishing competence for gene expression. Gene Dev. 25, 2227–2241

37 Watts, J.A. et al. (2011) Study of FoxA pioneer factor at silent genesreveals Rfx-repressed enhancer at Cdx2 and a potential indicator ofesophageal adenocarcinoma development. PLoS Genet. 7, e1002277

38 Magnani, L. et al. (2011) Pioneer factors: directing transcriptionalregulators within the chromatin environment. Trends Genet. 27,465–474

39 Fakhouri, T.H.I. et al. (2010) Dynamic chromatin organization duringforegut development mediated by the organ selector gene pha-4/FoxA.PLoS Genet. 6, e1001060

40 Liang, H.L. et al. (2008) The zinc-finger protein Zelda is a key activatorof the early zygotic genome in Drosophila. Nature 456, 400–403

41 Nien, C.Y. et al. (2011) Temporal coordination of gene networks byZelda in the early Drosophila embryo. PLoS Genet. 7, e1002339

42 Harrison, M.M. et al. (2011) Zelda Binding in the early Drosophilamelanogaster embryo marks regions subsequently activated at thematernal-to-zygotic transition. PLoS Genet. 7, e1002266

43 Xu, C.R. et al. (2011) Chromatin ‘prepattern’ and histone modifiers in afate choice for liver and pancreas. Science 332, 963–966

44 Bonn, S. et al. (2012) Tissue-specific analysis of chromatin stateidentifies temporal signatures of enhancer activity during embryonicdevelopment. Nat. Genet. 44, 148–156

45 Rada-Iglesias, A. et al. (2011) A unique chromatin signature uncoversearly developmental enhancers in humans. Nature 470, 279–283

46 De Santa, F. et al. (2010) A large fraction of extragenic RNA pol IItranscription sites overlap enhancers. PLoS Biol. 8, e1000384

47 Liu, Z. et al. (2011) Control of embryonic stem cell lineage commitmentby core promoter factor, TAF3. Cell 146, 720–731

48 Eldar, A. and Elowitz, M.B. (2010) Functional roles for noise in geneticcircuits. Nature 467, 167–173

49 Vasiliauskas, D. et al. (2011) Feedback from rhodopsin controlsrhodopsin exclusion in Drosophila photoreceptors. Nature 479, 108–112

416

50 Johnston, R.J. et al. (2011) Interlocked feedforward loops control cell-type-specific rhodopsin expression in the Drosophila eye. Cell 145, 956–968

51 Jukam, D. and Desplan, C. (2010) Binary fate decisions indifferentiating neurons. Curr. Opin. Neurobiol. 20, 6–13

52 Wernet, M.F. et al. (2006) Stochastic spineless expression creates theretinal mosaic for colour vision. Nature 440, 174–180

53 Dietrich, J.E. and Hiiragi, T. (2007) Stochastic patterning in the mousepre-implantation embryo. Development 134, 4219–4231

54 Silva, J. and Smith, A. (2008) Capturing pluripotency. Cell 132, 532–536

55 Kalmar, T. et al. (2009) Regulated fluctuations in nanog expressionmediate cell fate decisions in embryonic stem cells. PLoS Biol. 7,e1000149

56 Glauche, I. et al. (2010) Nanog variability and pluripotency regulationof embryonic stem cells–insights from a mathematical model analysis.PLoS ONE 5, e11238

57 Perry, M.W. et al. (2011) Multiple enhancers ensure precision of gapgene-expression patterns in the Drosophila embryo. Proc. Natl. Acad.Sci. U.S.A. 108, 13570–13575

58 Berman, B.P. et al. (2002) Exploiting transcription factor binding siteclustering to identify cis-regulatory modules involved in patternformation in the Drosophila genome. Proc. Natl. Acad. Sci. U.S.A.99, 757–762

59 Markstein, M. et al. (2002) Genome-wide analysis of clustered Dorsalbinding sites identifies putative target genes in the Drosophila embryo.Proc. Natl. Acad. Sci. U.S.A. 99, 763–768

60 Valouev, A. et al. (2011) Determinants of nucleosome organization inprimary human cells. Nature 474, 516–520

61 Giresi, P.G. and Lieb, J.D. (2009) Isolation of active regulatoryelements from eukaryotic chromatin using FAIRE (formaldehydeassisted isolation of regulatory elements). Methods 48, 233–239

62 Gilmour, D.S. and Fan, R. (2009) Detecting transcriptionally engagedRNA polymerase in eukaryotic cells with permanganate genomicfootprinting. Methods 48, 368–374

63 Nechaev, S. et al. (2010) Global analysis of short RNAs revealswidespread promoter-proximal stalling and arrest of Pol II inDrosophila. Science 327, 335–338

64 Core, L.J. et al. (2008) Nascent RNA sequencing reveals widespreadpausing and divergent initiation at human promoters. Science 322,1845–1848

Is ‘forward’ the same as ‘plus’?. . . and otheradventures in SNP allele nomenclature

Sarah C. Nelson1, Kimberly F. Doheny2, Cathy C. Laurie1 and Daniel B. Mirel3

1 Genetics Coordinating Center, Department of Biostatistics, University of Washington, Seattle, WA, USA2 Center for Inherited Disease Research, Institute of Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA3 Broad Institute (Massachusetts Institute of Technology/Harvard), Cambridge, MA, USA

Letter

In the accelerating and expanding field of research ongenetic variation, it has become standard practice to workwith a combination of datasets generated by multipleresearch groups at different times and by different meth-ods. Synthesizing these data is important for genotypeimputation, meta-analysis, and other applications, butmay be difficult because alleles are typically observedand recorded on only one of the two DNA strands ingenotyping and sequencing experiments. Different nomen-clatures have arisen to designate strand orientation whenreporting single nucleotide polymorphism (SNP) geno-types, but they are neither widely understood nor uniform-ly applied. Here we define the most common allele strandorientation nomenclatures and provide guidance in achiev-ing strand consistency.

The majority of SNPs are ‘strand unambiguous’, suchthat genotypes called on different strands are readilyidentifiable (e.g., A/G alleles on one strand are T/C alleleson the opposite strand). However, determining strandorientation at ‘strand ambiguous’ SNPs is more complicat-ed, where alleles are symmetrical across strands (A/T andC/G). It is assumed that all researchers, as a minimum forconsistency, report the two alleles of a biallelic SNP on thesame strand. It is the choice and the definition of whichstrand is used that leads to ambiguity. Generally, SNPalleles are reported for a single strand designated in one offour strand naming conventions: ‘probe/target’, ‘plus/mi-nus’, ‘TOP/BOT’, and ‘forward/reverse’, defined as follows.

Probe/targetWhen SNPs are assayed with a site-specific probe, one ofthe two strands corresponds to (i.e., is collinear with) theprobe sequence itself, and the other to the complementarygenomic target sequence that flanks or spans the SNP site.Sometimes the probe strand is called the ‘design’ strand (inreference to assay design). Although the specifics varybetween platforms, alternative alleles at a SNP site areoften initially represented using the generic letter codes Aand B. In the following, an italicized A refers to this genericallele designation and not to adenine. In Illumina annota-tion each SNP is defined with design allele nucleotides, andthese occur on the same strand as the probe sequence; theorder in which the alternative alleles are given specifiesthe generic A and B allele designations [1]. To illustrate, fora SNP defined as [T/G], the A allele is T and the B allele is

Corresponding author: Nelson, S.C. ([email protected]).Keywords: allele; strand translation; genotype; nomenclature; genome-wideassociation study; meta-analysis.

G. In Affymetrix allele-specific hybridization technology,the letter codes A and B are assigned differently and couldtherefore occur on either the probe or target strand [2].

Plus (+)/minus (S)In all human reference chromosomes, as for other eukar-yotes [3], the plus (+) strand is defined as the strand with its50 end at the tip of the short arm [4,5] (Genome ReferenceConsortium, personal communication, March 27, 2012).SNP alleles reported on the same strand as the (+) strandare called ‘plus’ alleles and those on the (�) strand are called‘minus’ alleles. Providing SNP alleles on the plus genomicstrand is the convention in publicly available SNP datasetssuch as the HapMap (www.hapmap.org) and 1000 GenomesProjects (www.1000genomes.org).

Although the plus/minus designation is anchored atthe telomeres of each chromosome, the orientation ofintervening sequences may change between genomebuilds as gaps are filled in and sequences are refined.Thus when reporting plus/minus strand, one must specifya genome build. The fluid nature of plus/minus orienta-tion has partly motivated the development of alternativenomenclatures.

Illumina TOP/BOT strandThe TOP/BOT strand naming convention, developed byIllumina and subsequently adopted by dbSNP, has beenthoroughly defined elsewhere [1]. In brief, Illuminastrand designation is determined by either the SNPalternative nucleotides or its flanking sequence. For un-ambiguous SNPs the TOP strand is defined as the onethat contains an A nucleotide allele. The A is designatedgenerically as allele A, whereas the alternative allele onthe TOP strand is designated as allele B. For ambiguousSNPs the strand designation and allele A/B assignmentsare determined by flanking sequence in a similar manner.This strand definition is ‘local’ to a SNP in that allelesreported on the TOP strand for two neighboring SNPsmay be on different physical strands of DNA [6]. Further-more, the TOP/BOT strand definition is intended to beindependent of any genome build or design strand. An-other key feature of this naming system is that allele A fora TOP strand probe is the base pair complement of alleleA for a BOT strand probe, such that the generic A/Bgenotype coding remains consistent regardless of whichstrand is probe or target. This nomenclature offers rela-tive stability in the face of changing human genomeassemblies and SNP databases.

361

http://www.hapmap.org/

http://www.1000genomes.org/


Box 1. An example of allele conversion using Illumina annotation

Here we use Illumina-provided annotation for an example SNP

(rs216614) in Table I to derive a set of allele call conversions in Table

II. In Table I, ‘SNP’ gives alternative alleles on the probe sequence

strand, ‘IlmnStrand’ gives the TOP/BOT status of the probe sequence

strand, ‘TopGenomicSeq’ gives the sequence surrounding the SNP

on the TOP strand, ‘RefStrand’ gives the plus/minus status of the

probe sequence strand, and ‘IlmnID’ encodes the correspondence

between TOP/BOT and forward/reverse (dbSNP) strands. The ‘de-

sign’ alleles (on the probe sequence strand) are given directly by

‘SNP’ = [T/G] and, following the Illumina convention, the first

nucleotide corresponds to allele A and the second to allele B. The

TOP strand alleles are given in brackets in ‘TopGenomicSeq’. The

‘B_R’ in ‘IlmnID’ specifies that the dbSNP reverse strand corresponds

with the BOT strand. The corresponding SNP assay is depicted in

Figure I.

Table I. Excerpt from Illumina HumanOmni1-Quad_v1-0_C annotation file (build 37)

IlmnID Name IlmnStrand SNP TopGenomicSeq RefStrand

rs216614-131_B_R_1865662557 rs216614 BOT [T/G] ...CATCCC[A/C]TGCACA. . . –

(+) strand

C

G G G G GT TC C C C AA A

C C C C C

GG

C

T

A

A AG G G GT T T3′

5′ 3′

5′

(–) strand

TRENDS in Genetics

Figure I. A simplified schematic of the SNP probe, where the probe sequence is

in blue and the target sequence in black text. The ‘design’ alleles (T or G) are the

fluorescently labeled nucleotides recruited to the allele probe in this two-color

primer-extension assay. Adapted from materials available on the Illumina

website (www.illumina.org).

Table II. rs216614 allele-mapping table

AB TOP Design Forward Plus

A A T A A

B C G C C

Letter Trends in Genetics August 2012, Vol. 28, No. 8

Forward/reverseThe dbSNP resource of the US National Center for Bio-technology Information (NCBI) contains detailed informa-tion for each SNP in its database. Each refSNP (or ‘rs’)entry consists of one or more submitted SNP (or ‘ss’)records, each submitted by individual laboratories. EachdbSNP record shows a flanking DNA sequence, which issimply taken from the submission with the longest flank-ing sequence [6,7]. SNP alleles reported on the same strandas this exemplar sequence in dbSNP sequence are called‘forward’ alleles. Conversely, alleles on the opposite strandare called ‘reverse’ alleles. Note that the dbSNP meaning of‘forward’ is easily confused with (+) genomic strand, whichhas been referred to as the ‘forward’ strand by the HapMapproject [8,9].

Achieving strand consistencyThe most basic level of strand consistency requires onlythat genotypes are reported on the same DNA strandacross datasets. At strand-unambiguous SNPs, discrepantnucleotides are sufficient to identify strand inconsistencies(e.g., A/C in one dataset and T/G in another). However,harmonizing strand-ambiguous SNPs requires convertingallele calls to a specific strand, according to one of thestrand naming conventions described above. Given a nu-cleotide sequence with a SNP and its flanking bases (e.g.,CATCCC[A/C]TGCACA) one can determine whether thestrand of that sequence is (i) plus or minus, by sequencematching with the genomic reference sequence; (ii) TOP orBOT, from the SNP itself or its flanking sequence [1]; and(iii) forward or reverse, from the ‘ss’ sequence record indbSNP. Determination of probe or target strand requiresadditional information about assay design. In practice,genotyping assay vendors generally supply annotations

362

that can be used to make strand conversions. Box 1 givesan example of how to interpret Illumina annotation tocreate a table of allele call conversions. Figure I shows asimplified schematic of the genotyping probe at this exam-ple SNP. However, SNP annotations are not infallible andfurther checks on strand consistency are useful. Commonlyused checks are comparisons of minor allele frequency andpatterns of linkage disequilibrium between the datasets tobe harmonized [10,11].

Our intent is not to advocate one allele nomenclatureabove all others because the universal adoption of onenaming system is both unlikely and unnecessary. Instead,our aim is to explain the different nomenclatures and theneed for precise documentation of allele designations foreach dataset. Increased understanding and documentationwill facilitate continued data sharing and collaborationwithin the genetics research community.

AcknowledgmentsThis work was supported in part by the following National Institutes ofHealth grants: GENEVA Coordinating Center (U01 HG004446);GARNET Coordinating Center (U01 HG005157); Center for InheritedDisease Research (U01HG004438, NIH contract numbersHHSN268200782096C and HHSN268201100011I); and Broad Centerfor Genotyping and Analysis (U01HG04424).

References1 Illumina Inc. (2006) ‘TOP/BOT’ strand and ‘A/B’ allele (Technical Note).

http://www.illumina.com/documents/products/technotes/technote_topbot.pdf

2 Affymetrix Inc. (2012) Affymetrix genotyping glossary. http://www.affymetrix.com/support/help/genotyping_glossary/index.affx

3 Cherry, J.M. et al. (1998) SGD: Saccharomyces genome database.Nucleic Acids Res. 26, 73–79

4 Dunham, I. et al. (1999) The DNA sequence of human chromosome 22.Nature 402, 489–495



http://www.affymetrix.com/support/help/genotyping_glossary/index.affx

http://www.affymetrix.com/support/help/genotyping_glossary/index.affx

http://www.illumina.org/

Letter Trends in Genetics August 2012, Vol. 28, No. 8

5 Cartwright, R.A. and Graur, D. (2011) The multiple personalities ofWatson and Crick strands. Biol. Direct 6, 7

6 National Center for Biotechnology Information (2005) Sequenceformatting in dbSNP reports. http://www.ncbi.nlm.nih.gov/books/NBK44414

7 Kitts, A.K. and Sherry, S. (2002) The single nucleotide polymorphismdatabase (dbSNP) of nucleotide sequence variation. In The NCBIHandbook (McEntyre, J. and Ostell, J., eds), National Center forBiotechnology Information (Chap. 5) In: http://www.ncbi.nlm.nih.gov/books/NBK21101/)

8 Frazer, K.A. et al. (2007) A second generation human haplotype map ofover 3.1 million SNPs. Nature 449, 851–861

9 Altshuler, D.M. et al. (2010) Integrating common and rare geneticvariation in diverse human populations. Nature 467, 52–58

10 Browning, S.R. (2009–2011) Strand-switching utility for BEAGLE.http://faculty.washington.edu/sguy/beagle/strand_switching/strand_switching.html

11 Howie, B. and Marchini, J. (2009-2012) IMPUTE2 strand alignmentoptions. http://mathgen.stats.ox.ac.uk/impute/strand_alignment_options.html

0168-9525/$ – see front matter � 2012 Elsevier Ltd. All rights reserved.

http://dx.doi.org/10.1016/j.tig.2012.05.002 Trends in Genetics, August 2012,

Vol. 28, No. 8

363

http://www.ncbi.nlm.nih.gov/books/NBK44414

http://www.ncbi.nlm.nih.gov/books/NBK44414

http://www.ncbi.nlm.nih.gov/books/NBK21101/

http://www.ncbi.nlm.nih.gov/books/NBK21101/

http://faculty.washington.edu/sguy/beagle/strand_switching/strand_switching.html

http://faculty.washington.edu/sguy/beagle/strand_switching/strand_switching.html

http://mathgen.stats.ox.ac.uk/impute/strand_alignment_options.html

http://mathgen.stats.ox.ac.uk/impute/strand_alignment_options.html


Corrigendum: Human evolutionary genomics:ethical and interpretive issues[Trends in Genetics 28 (2012)137–145]

Joseph J. Vitti1,2, Mildred K. Cho3, Sarah A. Tishkoff4 and Pardis C. Sabeti1,2

1 Broad Institute of MIT and Harvard, Cambridge, Massachusetts2 Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts3 Stanford Center for Biomedical Ethics, Stanford University, Palo Alto, California4 Departments of Genetics and Biology, University of Pennsylvania, Philadelphia, Pennsylvania

Erratum

In Figure 1 the genes involved in pigmentation are shownas ‘‘SLC24A5, SLC42A2’’. They should be ‘‘SLC24A5,SLC45A2’’. Similarly, in the legend for Figure 1, line 8,it reads:

‘‘In European populations, genes that affect skin pig-mentation (SLC24A5 and SLC42A2) have undergonepositive selection.’’

It should read:

§ DOI of original article: 10.1016/j.tig.2011.12.001.Corresponding author: Vitti, J.J. ([email protected]).

‘‘In European populations, genes that affect skin pig-mentation (SLC24A5 and SLC45A2) have undergonepositive selection.’’

We apologize to the readers of this article for this error.

0168-9525/$ – see front matter � 2012 Elsevier Ltd. All rights reserved.

http://dx.doi.org/10.1016/j.tig.2012.05.003 Trends in Genetics, August 2012, Vol. 28, No. 8

417



Documents

TiG-8-2012