10
B404 - 3. Bacterial Genomics - Jan 1. Fred Sanger sequenced the first complete genomes, e.g. the 5kbp genome of the phiX174 phage in 1978, the 16kb human mitochondrial genome in 1981, and then developed the method of whole genome shotgun cloning and sequencing to determine the 48kb lambda phage genome in 1982. 2. All alternatives, such as primer-walking, nested deletions, transposon-insertions, etc., involve additional costs. 3. When faced with the 4.6 Mbp E. coli genome, Fred Blattner at the University of Wisconsin, chose to map the genome physically as overlapping large clones, before shotgun sequencing each clone to build the genome. It took a decade, using mostly manual radioactive sequencing, finally published in 1997. It is annotated as containing

IB404 - 3. Bacterial Genomics - Jan 25 1. Fred Sanger sequenced the first complete genomes, e.g. the 5kbp genome of the phiX174 phage in 1978, the 16kb

Embed Size (px)

Citation preview

Page 1: IB404 - 3. Bacterial Genomics - Jan 25 1. Fred Sanger sequenced the first complete genomes, e.g. the 5kbp genome of the phiX174 phage in 1978, the 16kb

IB404 - 3. Bacterial Genomics - Jan 251. Fred Sanger sequenced the first complete genomes, e.g. the 5kbp genome of the phiX174 phage in 1978, the 16kb human mitochondrial genome in 1981, and then developed the method of whole genome shotgun cloning and sequencing to determine the 48kb lambda phage genome in 1982.

2. All alternatives, such as primer-walking, nested deletions, transposon-insertions, etc., involve additional costs.

3. When faced with the 4.6 Mbp E. coli genome, Fred Blattner at the University of Wisconsin, chose to map the genome physically as overlapping large clones, before shotgun sequencing each clone to build the genome. It took a decade, using mostly manual radioactive sequencing, finally published in 1997. It is annotated as containing about 4,500 genes, so one gene per kb (generally true for bacteria and viruses, e.g. 10kb HIV has 10 genes).

Page 2: IB404 - 3. Bacterial Genomics - Jan 25 1. Fred Sanger sequenced the first complete genomes, e.g. the 5kbp genome of the phiX174 phage in 1978, the 16kb

The E. coli genome. The origin and terminus of replication are shown as green lines, with blue arrows indicating replichores 1 and 2. A scale indicates the coordinates both in base pairs and in “minutes” of recombination. The distribution of genes is depicted on two outer rings: The orange boxes are genes located on the presented strand, and the yellow boxes are genes on the opposite strand. Red arrows show the location and direction of transcription of rRNA genes, and tRNA genes are shown as green arrows.

Page 3: IB404 - 3. Bacterial Genomics - Jan 25 1. Fred Sanger sequenced the first complete genomes, e.g. the 5kbp genome of the phiX174 phage in 1978, the 16kb

4. Craig Venter, who had already shaken up the human genome field by generating large numbers of ESTs (expressed sequence tags) from the ends of randomly picked human cDNA clones at his new TIGR institute (The Institute for Genome Research – later run by his wife, Claire Fraser) in Maryland, tried a whole genome shotgun (WGS) in 1995 to sequence the 1.8 Mbp genome of Haemophilus influenzae and the 0.58 Mbp genome of Mycoplasma genitalium, together with Hamilton Smith at Johns Hopkins (he grew up here, went to Uni and UIUC, won a Nobel for the first endonuclease restriction enzyme in H. influenzae).

J. Craig Venter Claire Fraser Hamilton Smith

Page 4: IB404 - 3. Bacterial Genomics - Jan 25 1. Fred Sanger sequenced the first complete genomes, e.g. the 5kbp genome of the phiX174 phage in 1978, the 16kb

Origin of replication

60kb total here

H. influenzae genome - outer circle is genes in one direction, inner circle the other. Colors are functional categories, e.g. enzyme, channel, receptor, repair, transporter, structural, replication, transcription, translation, etc. Arrowhead is the origin of replication.

Detail of region around the origin of replication. Note that there is little “spacer” DNA between genes. There are operons of multiple genes. Not all genes are named or had known functions, e.g. HIN0006, at least when this was done. Even today, ~100 of the 483 genes in M. genitalium have unknown functions.

Page 5: IB404 - 3. Bacterial Genomics - Jan 25 1. Fred Sanger sequenced the first complete genomes, e.g. the 5kbp genome of the phiX174 phage in 1978, the 16kb

Whole genome shotgun sequencing strategy1. Randomly shear genomic DNA into small pieces, size-fractionate ona gel (e.g. only 2-3kb or 9-11kb pieces), and clone in a plasmid. 2. Sequence each randomly picked plasmid clone insert from each end using flanking primers that anneal to the plasmid vector sequence. These

plasmid insert end sequences don’t usually overlap, but their orientation and a rough size are known - they are mate-pairs.

3. Do this enough times that you have generated 6-10X coverage of the entire genome, usually from roughly 20-30X clone coverage.

4. Use an assembly program to build the genome, for bacteria usually circular, by first building contigs of contiguous overlapping sequence, and then link these contigs into scaffolds using mate-pair information, leaving sequence gaps between contigs.

5. Finish sequence gaps, and any clone gaps between scaffolds, by directed methods, e.g. using PCR with primers to the ends of contigs or scaffolds to amplify across gaps and sequence the purified PCR products, usually directly without cloning them.

Page 6: IB404 - 3. Bacterial Genomics - Jan 25 1. Fred Sanger sequenced the first complete genomes, e.g. the 5kbp genome of the phiX174 phage in 1978, the 16kb

WGS schemaOne plasmid clone with two mate-pairs sequenced from ends – dots are unknown sequence.

Contig1 Contig2

A scaffold

Sequencegap

Clone gap

Page 7: IB404 - 3. Bacterial Genomics - Jan 25 1. Fred Sanger sequenced the first complete genomes, e.g. the 5kbp genome of the phiX174 phage in 1978, the 16kb

Many bacterial genomes1. Today there are >2000 genomes available and >200 from Archaea.2. For example, Blattner sequenced several strains of E. coli, including the “hamburger” strain, and related Shigella and Salmonella species, yielding information on pathogenicity islands of genes implicated in causing disease.3. Many others are other famous pathogens, e.g. Borrelia burgdorferi, Helicobacter pylori, Treponema pallidum, Neisseria menigitidis, Yersinia pestis, and Vibrio cholera.4. Others exhibit unusual biology, e.g. Deinococcus radiodurans, Thermatoga maritima, and Methanococcus jannaschii.5. They range in size from around 0.5 Mbp for various intracellular parasites, such as Buchnera species, to over 12 Mbp for Streptomyces species, which form colonies making antibiotics.6. The small genomes of intracellular parasites result from gene loss, e.g. Rickettsia only have about 800 genes, while the aphid endosymbiont Buchnera genome is largely colinear with E. coli, but has lost about 4000 genes!

Page 8: IB404 - 3. Bacterial Genomics - Jan 25 1. Fred Sanger sequenced the first complete genomes, e.g. the 5kbp genome of the phiX174 phage in 1978, the 16kb

7. The phylogenetic trees derived from these genome sequences largely agree with the 3-domain 16S rRNA-based trees of Carl Woese, but only when the core set of replication, transcription, and translation proteins are employed.8. When other gene sets are examined the result is usually a web rather than a tree, indicating that horizontal gene transfer between distantly related bacteria, and even archaea, but seldom eukaryota, has been widespread.

Page 9: IB404 - 3. Bacterial Genomics - Jan 25 1. Fred Sanger sequenced the first complete genomes, e.g. the 5kbp genome of the phiX174 phage in 1978, the 16kb

MetagenomicsVenter and others have continued to push the envelope of bacterial genome sequencing, most prominently by doing metagenomics, in which genomic DNA is extracted from environmentally collected samples, e.g. ocean water or a mine dump or human skin, without trying to culture bacteria, and sequenced extensively. These studies have confirmed that there is an extraordinary diversity of uncultured Bacteria and Archaea out there, and that some have entirely novel metabolic abilities. They also confirm that there are only the known three domains of life.

When the sample is relatively simple, e.g. a few species from a toxic mine sample, entire circular genomes will sometimes assemble. Otherwise they generally obtain long scaffolds containing multiple genes together in operons, which is often enough to define metabolic pathways.

Today a major effort is underway to do this for human commensal bacteria, called the microbiome, including oral, gut, vaginal, and skin bacterial communities.

Page 10: IB404 - 3. Bacterial Genomics - Jan 25 1. Fred Sanger sequenced the first complete genomes, e.g. the 5kbp genome of the phiX174 phage in 1978, the 16kb

As an example of the kinds of findings from this work, last year a group published an analysis of the frequency of horizontal gene transfer (HGT)across bacteria that are human commensals versus those that are not. They had ~1000 genomes in each category, and looked for regions with 99% DNA sequence identity in species with <97% rRNA identity (so they were not closely related). They found high levels of HGT across human commensals, and even higher HGT across species living in the same regions of the human body. Thus ecology facilitates or drives HGT.