Genomics -sequencing of microbial sequencesGenomics-sequencing of microbial genomes
This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome content and organisation amongst microbes, and shows how to derive information on gene function across genome.
Objectives for students:
Expected to describe strategies involved in microbial genome sequencing and functional genomics
Provide examples of information that can be derived from genomics
Microbial Genome Sequencing
Genome Sequencing Projects
strategy & methods
Concentrate on strategy & ideas
Genome Sequencing Projects
Genome sequencing progress (2009)
Archaeal: 70 (2007 = 49) (2008= 55)
Bacterial: 945 (2007 = 554) (2008= 728)
(Eukaryotc : 121) (2007 = 76) (2008= 97)
Metagenome projects: 200
Bacterial genome projects
Helicobacter pylori (x2)
E. coli O157
Completed microbial eukaryote projects
Yeast -Saccharomyces cerevisiae
Aspergillus nidulans, A.niger, A.oryzae &A.fumigatus
Trypanosoma cruzi & brucei
Candida albicans & glabrata
Genome sequencing strategy
In the pre-genome era there were a number of considerations regarding the benefits of sequencing. The piecemeal collection of sequenced genes was slow and costly. Issues also arose over ownership, strain choice, approach and data release. The genome project, however, provided a rational approach to sequencing which was efficient and rapid, and was able to address novel questions. The post genomic era has allowed the application of comparative and functional genomics.
Genome sequencing strategy:
large collaborative cosmid/BAC-based projects
now better suited for larger genomes
small insert shotgun approach
rapid and efficient
choice for bacteria
fresh isolate vs lab strain
clinical vs environmental
subsequent genetic analysis
E.g. Yeast genome sequence strategy
Yeast chromosomes (16) individually sequenced
several approaches used
Make genome library in cosmids
order cosmid library
need to know which cosmid overlaps with which
link cosmid to genome map
produced tiled set of cosmids
only sequence minimum number
Use chromosome specific probe to identify chromosome-specific cosmids
sequence cosmid inserts by subcloning
Solve problems by direct PCR sequencing, walking and other libraries (lambda)
Whole genome/chromosome shot-gun strategy (WGS)
Generation of small insert genomic library
Library is not initially ordered
DNA sequence ends of inserts
Depends on powerful computing to assemble sequence reads
Main steps in generating a complete genome sequence
Manually chain termination sequencing requires four reaction tubes each containing a different type of terminator base as well as a radioactive nucleotide for labelling the newly synthesised DNA fragments. Each of the four reactions is electrophoresed in a separate lane of a gel. Demand for the ability to read more sequence in a shorter amount of time, led to the automation of the DNA sequencing process.
The attachment the of different fluorescent dyes to each of the four terminator bases ensured four separate sequencing reactions were no longer required; the entire sequencing reaction could be accomplished in a single tube. The development of these automated sequencing machines using multiple capillaries, thin, hollow glass tubes filled with a gel polymer, removed the need for a technician to add each sequencing reaction into an individual lane of the gel prior to the run
The ABI 3700s (made by Applied Biosystems) are the most widely used automated sequencers. They have 96 capillaries, with a robot loading from 384-well plates.
The MegaBACE is made by Amersham. It also has 96 capillaries and robotic loading from 384well plate. Each run takes two to four hours, and can read up to 800 bases.
These advances have lead to the industrialization of sequencing. Most genome sequencing projects divide tasks (such as genome libraries, production sequencing and finishing) among different teams. Sequencing machines run are run 24 hours a day, 7 days a weeks and many tasks can be perfomed by robots.
454 sequencing- the future?
454 sequencing was developed Roche, and relies on a technique known as pyrosequencing (sequencing by synthesis). It differs from Sanger sequencing, relying on the detection of pyrophosphate release on nucleotide incorporation, rather than chain termination with dideoxynucleotides.
Nucleotides are flowed sequentially in a fixed order across the PicoTiterPlate device during a sequencing run.
During the nucleotide flow, hundreds of thousands of beads each carrying millions of copies of a unique single-stranded DNA molecule are sequenced in parallel.
If a nucleotide complementary to the template strand is flowed into a well, the polymerase extends the existing DNA strand by adding nucelotide(s).
Addition of one (or more) nucleotide(s) results in a reaction that generates a light signal that is recorded by the CCD camera in the instrument.
The signal strength is proportional to the number of nucleotides incorporated in a single nucelotide flow.
The GS FLX System software tracks the location of DNA carrying beads on a XY axis. Each bead corresponds to a XY-coordinate on a series of images. The signal intensity per nucleotide flow is recorded for each bead over time and is plotted to generate a flowgram. Each 10 hour sequencing run on the GS FLX Titanium series will typically produce over one million flowgrams, one flowgram per read.
The development and impact of 454 sequencing. http://www.ncbi.nlm.nih.gov/pubmed/18846085
Rothberg et al.Biotechnology. Volume 26, 1117-1124 9/10/2008
Work involved in whole genome sequencing:
individual sequencing reads accumulate
each read about 500bp
computing used to assemble reads
contiguous sequences called contigs
Aim for 8-10 read coverage of genome for accuracy
24,304 reads assembled
Gaps in genome sequence need to be filled in:
A contig is a set of gel readings that are related to one another by overlap of their sequences. The gel readings in a contig can be summed to form a contiguous consensus sequence, the length of this sequence forms the length of the contig.
rise in contig number as amount of reads increases
steady fall as accumulating sequence bridges gaps between contigs
levels off as new reads more likely in known contig than gap
Why are gaps present?
sequence gaps choose appropriate clone and walk
alternative libraries (which?)
PCR across gap
areas where sequence reads are less than 8-10
repeated sequences -rRNA
closure and completion
look for ATG-Stop (+alternatives)
over certain size
computer based (Glimmer & Orpheus) and trained eye
Search databases with predicted translated sequences BLASTX
Consider level of similarity and context
Artemis: sequence viewer and annotation tool from the Sanger Centre (http://www.sanger.ac.uk/Software/Artemis/)
xBASE is a database for comparative genome analysis of all bacterial genome sequences
Chaudhuri RR, Pallen MJ. xBASE, a collection of online databases for bacterial comparative genomics. Nucleic Acids Res. 2006 Jan 1;34(Database issue):D335-7.
Post Genome Sequence Approaches
comparing genome organisation and content
minimal gene content
Functional genomics ascribing gene function across a genome
gene function knowns