Mapping and sequencing complex genomes: let's get physical!

  • Published on

  • View

  • Download

Embed Size (px)


<ul><li><p>578 | AUGUST 2004 | VOLUME 5</p><p>R E V I E W S</p><p>The construction of a whole-genome physical map hasbeen an essential component of numerous genome pro-jects initiated since the inception of the HumanGenome Project (HGP). The production and integra-tion of genetic, physical, gene and sequence maps wasthe goal of the HGP1. Although genetic mapping hasbeen pursued in plants and animals for decades, it isonly relatively recently that advances in cloning andclone fingerprinting have allowed the construction ofphysical maps. A physical map is an ordered set of DNAfragments, among which the distances are expressed inphysical distance units (base pairs). These days, a physi-cal map usually comprises a set of ordered large-insertclones such as BACTERIAL ARTIFICIAL CHROMOSOMES (BACs)2,which have largely replaced YEAST ARTIFICIAL CHROMOSOMES3</p><p>as the preferred building blocks of a physical map.Physical maps can be independent of genetic informationbut are more valuable if linked to genetically mappedmarkers, and are even more powerful if integrated withgenomic sequence data.</p><p>Much progress has been made in the development oftechnologies and strategies for whole-genome sequenc-ing, but these strategies still depend on the developmentof a physical map. In the clone-by-clone whole-genomesequencing method, the physical map is constructedfirst, and a MINIMAL TILING PATH of clones is then selected forseparate shotgun sequencing of each clone in the path4.</p><p>An alternative to the clone-by-clone method is whole-genome shotgun (WGS) sequencing, which uses assem-bled sequence data generated randomly from the entiregenome4,5. In theory, WGS sequencing makes obsoletethe process of physical mapping because it should con-struct overlapping contiguous segments (contigs) ofsequence data. However, it is not yet clear whether WGSsequencing alone is sufficient to produce a linearlyordered set of sequences if the sequence contigs are notcoupled to a robust physical map4,69. Therefore, a hybridstrategy of the two methods for whole-genome sequenc-ing will probably prove to be most productive4.With thishybrid approach, WGS sequence data are aligned withmapped BAC-end sequences, and these assembled con-tigs are anchored to a physical map scaffold that com-prises ordered and orientated BACs that include mappedmolecular markers10,11.</p><p>The lack of high-quality physical maps could rapidlybecome one of the limiting factors in assembling newlygenerated WGS sequences for large genomes. Theproductivity of large sequencing centres has alreadyoutstripped the ability of physical mapping laboratoriesto provide ordered sequence maps. Without the linearorder that physical maps provide, the marginal advan-tage that WGS sequencing projects have over a compre-hensive EST or a full-length cDNA sequencing effortdoes not justify the considerable increase in costs.</p><p>MAPPING AND SEQUENCINGCOMPLEX GENOMES: LETS GETPHYSICAL!Blake C. Meyers*, Simone Scalabrin and Michele Morgante</p><p>Physical maps provide an essential framework for ordering and joining sequence data,genetically mapped markers and large-insert clones in eukaryotic genome projects. A goodphysical map is also an important resource for cloning specific genes of interest, comparinggenomes, and understanding the size and complexity of a genome. Although physical maps areusually taken at face value, a good deal of technology, molecular biology and statistics goes intotheir making. Understanding the science behind map building is important if users are tocritically assess, use and build physical maps.</p><p>BACTERIAL ARTIFICIAL</p><p>CHROMOSOME </p><p>(BAC). A cloning vector derivedfrom a single-copy F-plasmid ofEscherichia coli. Large genomicfragments (100200 Kb) can becloned into BACs, making themuseful for constructing genomiclibraries.</p><p>*Department of Plant andSoil Sciences and DelawareBiotechnology Institute,University of Delaware,Newark, Delaware 19711,USA.Dipartimento di ScienzeAgrarie ed Ambientali,Dipartimento diMatematica ed Informatica,Universita di Udine,Via delle Scienze 208,I-33100 Udine, Italy.Correspondence to M.M.e-mail: michele.morgante@uniud.itdoi:10.1038/nrg1404</p></li><li><p>NATURE REVIEWS | GENETICS VOLUME 5 | AUGUST 2004 | 579</p><p>R E V I E W S</p><p>YEAST ARTIFICIAL</p><p>CHROMOSOME </p><p>(YAC). A cloning vector system that can accommodate large genomic fragments (5001,000 Kb).YACs are grown in yeast, and can beunstable and difficult to isolatein comparison to BACs.</p><p>sequence. Some resources, such as RADIATION HYBRID CELLLINES, were used extensively in the construction of physicalmaps of mammals, but have so far proved difficult orimpossible to develop for other species12,13. Several alter-native strategies are now being considered to obtain genicsequences in species with large genomes14. Two of thesestrategies, METHYLATION FILTRATION and HIGH C</p><p>OT SELECTION,</p><p>have recently been applied to maize and shown to bevalid alternatives to traditional approaches to genomicsequencing15,16. However, sequence contigs that are gen-erated by these approaches will have to be ordered onthe basis of a genomic scaffold, and this will require arobust physical map. Even in the absence of a whole-genome sequence assembly, a densely populated physi-cal map allows map-based cloning and comparativegenomics. Physical maps are also being built for wildrelatives of species with a sequenced genome for com-parative purposes; this provides a shortcut to addresscertain questions for which re-sequencing a genome isimpractical.</p><p>The goal of this review is to provide guidance bothin the evaluation of previously constructed physicalmaps and in the choice of methods used to build a physical map de novo. Here, we discuss the differentphysical mapping techniques and their advantages anddisadvantages. In particular, we focus on methods thatorder large-insert clones rather than those that ordermarkers such as radiation hybrid (RH) mapping17 orHAPPY MAPPING18. Physical maps are often made availablethrough the Internet before publication in refereed jour-nals, and before critical evaluation. Moreover, primaryresearch publications do not evaluate techniques orapproaches in a critical or comparative fashion. Here, weaim to address this deficit in critical evaluation to allowpotential users to take full advantage of the maps and tohelp them to understand the science and statistics that liebehind the physical mapping process.</p><p>Fingerprinting technologies for physical mappingBanding patterns on chromosomes might be consideredto be the earliest and least detailed form of a physicalmap, with the complete nucleotide sequence of anorganism representing the other extreme. Current phys-ical maps are based on technologies to detect overlapsamong BACs. Two distinct approaches are used to iden-tify the overlap among clones, and numerous tech-niques have been applied for each approach. The firstapproach is to screen the clones to assess the presence ofDNA landmarks. Screening techniques include PCRamplification of short fragments known as SEQUENCE-TAGGED SITES (STSs)19,20, and hybridization of labelledcDNA clones or short, gene-specific oligonucleotides21</p><p>(see, for example, REF. 22). This approach is laborious,and if used alone to construct a physical map, requiresan extremely high density of markers that is impracticalfor most applications.</p><p>Here, we focus on the second approach to physicalmapping, which is to use DNA fingerprinting andessentially to perform restriction mapping at awhole-genome level23. This approach is better suitedto relatively unexplored genomes and is more amenable</p><p>Large-scale mapping and sequencing is underway orplanned for many diverse organisms. However, most ofthese efforts will need to proceed without the vast molec-ular and financial resources that support organisms suchas human, mouse and rat. Physical maps can now be builtquickly for many species in which complete genomesequences will not be available soon because a map can beobtained at a fraction of the cost of a whole-genome</p><p>Separation Detection Band calling</p><p>20,000 bp10,000 bp</p><p>4,000 bp4,500 bp</p><p>2,000 bp</p><p>1,200 bp800 bp</p><p>Pairwise comparisonsHigh-stringencyassembly</p><p>Low-stringency and manual re-assembly</p><p>Verification and map alignmente</p><p>BAC clone</p><p>Digestion</p><p>BAC clone library, 730 genome equivalents, inserts produced with 1 or more restriction enzymes</p><p>b</p><p>c</p><p>a</p><p>d</p><p>Gel well</p><p>Figure 1 | The DNA fingerprinting approach to building a whole-genome physical map.a | A bacterial artificial chromosome (BAC) library. A BAC library that represents from 7 to 30 (ormore) genome equivalents is constructed. Use of multiple libraries produced with differentrestriction enzymes will result in better genome coverage. b | DNA fingerprinting of BAC clones.Each clone is restriction-enzyme-digested and the resultant fragments are subjected toelectrophoresis to produce the DNA fingerprints. Sizes of all DNA fragments detected on gel areestimated for each clone. c | Automated assembly. Using appropriate software, a full pairwisecomparison of all clones is performed to detect the proportion of shared bands among each pairof clones. Overlapping clones are identified and placed into contigs on the basis of a setthreshold (SULSTON CUTOFF SCORE) of a minimum proportion of shared bands. A clone-orderingalgorithm is then used to find the most likely relative order of BAC clones within each contig. Thishigh-stringency assembly process results in some overlaps that are not detected (the blue bandindicates gaps in the assembly). d | Manual curation and assembly. End clones from each contigcan be compared with one another at a relaxed cutoff score to detect smaller overlaps that wentundetected at the more stringent cutoff score used in the automated assembly (that is, to identifyassembly gaps). Misassembled clones can also be detected and removed from the assembly, orcontigs can be split if deemed unreliable. e | Map alignment and verification. The contigs arealigned to the genetic map or radiation hybrid map using shared markers to verify the map and tofurther merge contigs.The pink boxes indicate BAC-end sequences that have been used asgenetic markers to align contigs to the genetic map.</p></li><li><p>580 | AUGUST 2004 | VOLUME 5</p><p>R E V I E W S</p><p>commercially produced by Molecular Probes, Inc.,Eugene, Oregon, USA) (FIG. 2b). This third method dif-fers substantially from those described above, becausenearly all restriction fragments that are produced from aclone are visible on the agarose gel, whereas the abovemethods visualize only a subset of fragments that havebeen labelled and require sequencing gels. The advan-tage of observing all fragments that result from a cloneis that the integrity of the overlap among clones can beverified easily and the size of the overlapping region canbe directly estimated rather than just inferred on thebasis of the proportion of shared fragments, as with thetwo other methods. The agarose fingerprinting methodhas since been widely applied because of its relative sim-plicity and low costs. This method also has several fur-ther advantages that derive from the fact that it is the</p><p>to high-throughput methods than the STS/hybridiza-tion mapping approach. In the fingerprinting approach(see FIG. 1), each clone is digested into fragments withrestriction enzymes, which are then separated anddetected. Overlapping clones derived from the samegenomic region produce patterns of shared restrictionfragments, seen as bands on a gel. The proportion ofshared bands is indicative of the degree of overlap. Theoverlap across numerous clones is then used to orderthe clones into contigs. Highly repetitive genomes canconfound the fingerprinting process, because the repeti-tive elements can produce identical band sizes and gen-erate false overlaps. Combining information aboutthousands of DNA landmarks, or markers, that areassigned an order on the chromosomes (throughgenetic mapping, for example) with the presence ofthose DNA landmarks on the contigs can allow thesecontigs to be assembled into a genome-wide physicalmap. Finishing work to identify clones that span pre-dicted gaps between adjacent contigs will coalesce thecontigs into larger scaffolds.</p><p>Fingerprinting methods. Modern fingerprintingmethods are derivations of classic techniques thatused restriction enzymes for early genome projectsincluding Escherichia coli, Saccharomyces cervisiae andCaenorhabditis elegans. The first application of whole-genome fingerprinting was the construction of a physi-cal map of the C. elegans genome using cosmid clones24.In this study, radioactively labelled restriction frag-ments were separated on polyacrylamide sequencing gels(FIG. 2a). HindIII a 6-bp-recognizing enzyme (a rarecutter) was used for the initial digestion of the cloneinto fragments, which are then end-labelled. Anotherdigestion with Sau3AI a restriction enzyme that rec-ognizes 4 bp (a frequent cutter) produces smallerfragments that are suitable for separation and detectionon sequencing gels. The subset of these fragments thathave labelled HindIII-ends can be detected24.</p><p>Brenner and Livak proposed a second fingerprintingmethod25 that uses automated sequencers. This methodtook advantage of properties of the type IIS restrictionenzymes that cut at a precisely defined distance fromtheir recognition site, leaving single-stranded overhangsof variable composition. The overhangs are filled inusing unlabelled deoxynucleotides (dNTPs) and fluo-rescently labelled dideoxynucleotides (ddNTPs) to pro-duce bands that automated sequencers can detect. Thesemachines can resolve band sizes at high resolution anddetermine the sequence of the 3 fluorescently labelledbases. The availability of the terminal sequence of thesefragments markedly increases the information contentof fingerprints compared with the older radioactivemethods. This in turn allows more reliable identificationof shared fragments.</p><p>In a substantially different method26, large-insertclones are digested with a restriction enzyme oftenHindIII that recognizes 6 bp, and the resulting frag-ments are detected on agarose gels stained with ethid-ium bromide or in a more recent modification withSYBR Green27 (a highly sensitive DNA dye that is</p><p>MINIMAL TILING PATH</p><p>A minimal set of overlappingclones that together providescomplete coverage across agenomic region.</p><p>SULSTON CUTOFF SCORE</p><p>A score that expresses theprobability that the number ofbands matched between any twoclones being fingerprinted is acoincidence. Clones areconsidered to overlap if the scoreis below a user-suppliedthreshold (cutoff).</p><p>RADIATION HYBRID CELL LINES</p><p>A collection of cell lines, each ofwhich is a clonal population ofcells that are derived by thefusion of lethally X-irradiateddonor cells with mammaliancells. Such cell lines can be usedto create a physical map of thedonor genome.</p><p>METHYLATION FILTRATION</p><p>A method that takes advantageof higher DNA methylation inrepetitive than in low-single-copy sequences to selectivelyclone in Escherichia coli the latter (hypomethylated) on...</p></li></ul>


View more >