Mapping and sequencing complex genomes: let's get physical!

Embed Size (px)

Text of Mapping and sequencing complex genomes: let's get physical!

  • 578 | AUGUST 2004 | VOLUME 5 www.nature.com/reviews/genetics

    R E V I E W S

    The construction of a whole-genome physical map hasbeen an essential component of numerous genome pro-jects initiated since the inception of the HumanGenome Project (HGP). The production and integra-tion of genetic, physical, gene and sequence maps wasthe goal of the HGP1. Although genetic mapping hasbeen pursued in plants and animals for decades, it isonly relatively recently that advances in cloning andclone fingerprinting have allowed the construction ofphysical maps. A physical map is an ordered set of DNAfragments, among which the distances are expressed inphysical distance units (base pairs). These days, a physi-cal map usually comprises a set of ordered large-insertclones such as BACTERIAL ARTIFICIAL CHROMOSOMES (BACs)2,which have largely replaced YEAST ARTIFICIAL CHROMOSOMES3

    as the preferred building blocks of a physical map.Physical maps can be independent of genetic informationbut are more valuable if linked to genetically mappedmarkers, and are even more powerful if integrated withgenomic sequence data.

    Much progress has been made in the development oftechnologies and strategies for whole-genome sequenc-ing, but these strategies still depend on the developmentof a physical map. In the clone-by-clone whole-genomesequencing method, the physical map is constructedfirst, and a MINIMAL TILING PATH of clones is then selected forseparate shotgun sequencing of each clone in the path4.

    An alternative to the clone-by-clone method is whole-genome shotgun (WGS) sequencing, which uses assem-bled sequence data generated randomly from the entiregenome4,5. In theory, WGS sequencing makes obsoletethe process of physical mapping because it should con-struct overlapping contiguous segments (contigs) ofsequence data. However, it is not yet clear whether WGSsequencing alone is sufficient to produce a linearlyordered set of sequences if the sequence contigs are notcoupled to a robust physical map4,69. Therefore, a hybridstrategy of the two methods for whole-genome sequenc-ing will probably prove to be most productive4.With thishybrid approach, WGS sequence data are aligned withmapped BAC-end sequences, and these assembled con-tigs are anchored to a physical map scaffold that com-prises ordered and orientated BACs that include mappedmolecular markers10,11.

    The lack of high-quality physical maps could rapidlybecome one of the limiting factors in assembling newlygenerated WGS sequences for large genomes. Theproductivity of large sequencing centres has alreadyoutstripped the ability of physical mapping laboratoriesto provide ordered sequence maps. Without the linearorder that physical maps provide, the marginal advan-tage that WGS sequencing projects have over a compre-hensive EST or a full-length cDNA sequencing effortdoes not justify the considerable increase in costs.

    MAPPING AND SEQUENCINGCOMPLEX GENOMES: LETS GETPHYSICAL!Blake C. Meyers*, Simone Scalabrin and Michele Morgante

    Physical maps provide an essential framework for ordering and joining sequence data,genetically mapped markers and large-insert clones in eukaryotic genome projects. A goodphysical map is also an important resource for cloning specific genes of interest, comparinggenomes, and understanding the size and complexity of a genome. Although physical maps areusually taken at face value, a good deal of technology, molecular biology and statistics goes intotheir making. Understanding the science behind map building is important if users are tocritically assess, use and build physical maps.

    BACTERIAL ARTIFICIAL

    CHROMOSOME

    (BAC). A cloning vector derivedfrom a single-copy F-plasmid ofEscherichia coli. Large genomicfragments (100200 Kb) can becloned into BACs, making themuseful for constructing genomiclibraries.

    *Department of Plant andSoil Sciences and DelawareBiotechnology Institute,University of Delaware,Newark, Delaware 19711,USA.Dipartimento di ScienzeAgrarie ed Ambientali,Dipartimento diMatematica ed Informatica,Universita di Udine,Via delle Scienze 208,I-33100 Udine, Italy.Correspondence to M.M.e-mail: michele.morgante@uniud.itdoi:10.1038/nrg1404

  • NATURE REVIEWS | GENETICS VOLUME 5 | AUGUST 2004 | 579

    R E V I E W S

    YEAST ARTIFICIAL

    CHROMOSOME

    (YAC). A cloning vector system that can accommodate large genomic fragments (5001,000 Kb).YACs are grown in yeast, and can beunstable and difficult to isolatein comparison to BACs.

    sequence. Some resources, such as RADIATION HYBRID CELLLINES, were used extensively in the construction of physicalmaps of mammals, but have so far proved difficult orimpossible to develop for other species12,13. Several alter-native strategies are now being considered to obtain genicsequences in species with large genomes14. Two of thesestrategies, METHYLATION FILTRATION and HIGH C

    OT SELECTION,

    have recently been applied to maize and shown to bevalid alternatives to traditional approaches to genomicsequencing15,16. However, sequence contigs that are gen-erated by these approaches will have to be ordered onthe basis of a genomic scaffold, and this will require arobust physical map. Even in the absence of a whole-genome sequence assembly, a densely populated physi-cal map allows map-based cloning and comparativegenomics. Physical maps are also being built for wildrelatives of species with a sequenced genome for com-parative purposes; this provides a shortcut to addresscertain questions for which re-sequencing a genome isimpractical.

    The goal of this review is to provide guidance bothin the evaluation of previously constructed physicalmaps and in the choice of methods used to build a physical map de novo. Here, we discuss the differentphysical mapping techniques and their advantages anddisadvantages. In particular, we focus on methods thatorder large-insert clones rather than those that ordermarkers such as radiation hybrid (RH) mapping17 orHAPPY MAPPING18. Physical maps are often made availablethrough the Internet before publication in refereed jour-nals, and before critical evaluation. Moreover, primaryresearch publications do not evaluate techniques orapproaches in a critical or comparative fashion. Here, weaim to address this deficit in critical evaluation to allowpotential users to take full advantage of the maps and tohelp them to understand the science and statistics that liebehind the physical mapping process.

    Fingerprinting technologies for physical mappingBanding patterns on chromosomes might be consideredto be the earliest and least detailed form of a physicalmap, with the complete nucleotide sequence of anorganism representing the other extreme. Current phys-ical maps are based on technologies to detect overlapsamong BACs. Two distinct approaches are used to iden-tify the overlap among clones, and numerous tech-niques have been applied for each approach. The firstapproach is to screen the clones to assess the presence ofDNA landmarks. Screening techniques include PCRamplification of short fragments known as SEQUENCE-TAGGED SITES (STSs)19,20, and hybridization of labelledcDNA clones or short, gene-specific oligonucleotides21

    (see, for example, REF. 22). This approach is laborious,and if used alone to construct a physical map, requiresan extremely high density of markers that is impracticalfor most applications.

    Here, we focus on the second approach to physicalmapping, which is to use DNA fingerprinting andessentially to perform restriction mapping at awhole-genome level23. This approach is better suitedto relatively unexplored genomes and is more amenable

    Large-scale mapping and sequencing is underway orplanned for many diverse organisms. However, most ofthese efforts will need to proceed without the vast molec-ular and financial resources that support organisms suchas human, mouse and rat. Physical maps can now be builtquickly for many species in which complete genomesequences will not be available soon because a map can beobtained at a fraction of the cost of a whole-genome

    Separation Detection Band calling

    20,000 bp10,000 bp

    4,000 bp4,500 bp

    2,000 bp

    1,200 bp800 bp

    Pairwise comparisonsHigh-stringencyassembly

    Low-stringency and manual re-assembly

    Verification and map alignmente

    BAC clone

    Digestion

    BAC clone library, 730 genome equivalents, inserts produced with 1 or more restriction enzymes

    b

    c

    a

    d

    Gel well

    Figure 1 | The DNA fingerprinting approach to building a whole-genome physical map.a | A bacterial artificial chromosome (BAC) library. A BAC library that represents from 7 to 30 (ormore) genome equivalents is constructed. Use of multiple libraries produced with differentrestriction enzymes will result in better genome coverage. b | DNA fingerprinting of BAC clones.Each clone is restriction-enzyme-digested and the resultant fragments are subjected toelectrophoresis to produce the DNA fingerprints. Sizes of all DNA fragments detected on gel areestimated for each clone. c | Automated assembly. Using appropriate software, a full pairwisecomparison of all clones is performed to detect the proportion of shared bands among each pairof clones. Overlapping clones are identified and placed into contigs on the basis of a setthreshold (SULSTON CUTOFF SCORE) of a minimum proportion of shared bands. A clone-orderingalgorithm is then used to find the most likely relative order of BAC clones within each contig. Thishigh-stringency assembly process results in some overlaps that are not detected (the blue bandindicates gaps in the assembly). d | Manual curation and assembly. End clones from each contigcan be compared with one another at a relaxed cutoff score to detect smaller overlaps that wentundetected at the more stringent cutoff score used in the automated assembly (that is, to identifyassem