Genomics -sequencing of microbial sequences .Web viewGenomics-sequencing of microbial genomes This

  • View
    215

  • Download
    0

Embed Size (px)

Text of Genomics -sequencing of microbial sequences .Web viewGenomics-sequencing of microbial genomes This

Genomics -sequencing of microbial sequencesGenomics-sequencing of microbial genomes

This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome content and organisation amongst microbes, and shows how to derive information on gene function across genome.

Objectives for students:

Expected to describe strategies involved in microbial genome sequencing and functional genomics

Provide examples of information that can be derived from genomics

Microbial Genome Sequencing

Genome Sequencing Projects

strategy & methods

annotation

Comparative genomics

organisation

gene content

Functional genomics

transcriptome

proteome

genome-wide mutation

Concentrate on strategy & ideas

Genome Sequencing Projects

Genome sequencing progress (2009)

Complete:

Archaeal: 70 (2007 = 49) (2008= 55)

Bacterial: 945 (2007 = 554) (2008= 728)

(Eukaryotc : 121) (2007 = 76) (2008= 97)

Ongoing:

Archaeal: 111

Bacterial: 3498

(Eukaryotic: 1223)

Metagenome projects: 200

www.genomesonline.org

Bacterial genome projects

Many completed:

Haemophilus influenzae

Escherichia coli

Bacillus subtilis

Mycoplasma genitalium

Helicobacter pylori (x2)

Campylobacter jejuni

Treponema pallidum

Neisseria menigitidis

Neisseria gonnorhoea

Vibrio cholerae

E. coli O157

Links:

http://www.tigr.org/

http://www.ncbi.nlm.nih.gov/

http://www.sanger.ac.uk/

http://www.genomesonline.org/

Completed microbial eukaryote projects

Yeast -Saccharomyces cerevisiae

Plasmodium falciparum

Aspergillus nidulans, A.niger, A.oryzae &A.fumigatus

Trypanosoma cruzi & brucei

Leishmania

Entamoeba histolytica

Giardia lamblia

Candida albicans & glabrata

Paramecium

Genome sequencing strategy

In the pre-genome era there were a number of considerations regarding the benefits of sequencing. The piecemeal collection of sequenced genes was slow and costly. Issues also arose over ownership, strain choice, approach and data release. The genome project, however, provided a rational approach to sequencing which was efficient and rapid, and was able to address novel questions. The post genomic era has allowed the application of comparative and functional genomics.

Genome sequencing strategy:

Strategy choice

large collaborative cosmid/BAC-based projects

now better suited for larger genomes

slow

small insert shotgun approach

centralised

rapid and efficient

choice for bacteria

Strain choice

fresh isolate vs lab strain

clinical vs environmental

subsequent genetic analysis

E.g. Yeast genome sequence strategy

Yeast chromosomes (16) individually sequenced

several approaches used

Make genome library in cosmids

order cosmid library

need to know which cosmid overlaps with which

link cosmid to genome map

produced tiled set of cosmids

only sequence minimum number

Use chromosome specific probe to identify chromosome-specific cosmids

sequence cosmid inserts by subcloning

Solve problems by direct PCR sequencing, walking and other libraries (lambda)

Telomeres

Whole genome/chromosome shot-gun strategy (WGS)

Rapid

Generation of small insert genomic library

Library is not initially ordered

DNA sequence ends of inserts

Depends on powerful computing to assemble sequence reads

Main steps in generating a complete genome sequence

Automated sequencers:

Manually chain termination sequencing requires four reaction tubes each containing a different type of terminator base as well as a radioactive nucleotide for labelling the newly synthesised DNA fragments. Each of the four reactions is electrophoresed in a separate lane of a gel. Demand for the ability to read more sequence in a shorter amount of time, led to the automation of the DNA sequencing process.

The attachment the of different fluorescent dyes to each of the four terminator bases ensured four separate sequencing reactions were no longer required; the entire sequencing reaction could be accomplished in a single tube. The development of these automated sequencing machines using multiple capillaries, thin, hollow glass tubes filled with a gel polymer, removed the need for a technician to add each sequencing reaction into an individual lane of the gel prior to the run

ABI 3700

The ABI 3700s (made by Applied Biosystems) are the most widely used automated sequencers. They have 96 capillaries, with a robot loading from 384-well plates.

MegaBACE

The MegaBACE is made by Amersham. It also has 96 capillaries and robotic loading from 384well plate. Each run takes two to four hours, and can read up to 800 bases.

These advances have lead to the industrialization of sequencing. Most genome sequencing projects divide tasks (such as genome libraries, production sequencing and finishing) among different teams. Sequencing machines run are run 24 hours a day, 7 days a weeks and many tasks can be perfomed by robots.

454 sequencing- the future?

454 sequencing was developed Roche, and relies on a technique known as pyrosequencing (sequencing by synthesis). It differs from Sanger sequencing, relying on the detection of pyrophosphate release on nucleotide incorporation, rather than chain termination with dideoxynucleotides.

Nucleotides are flowed sequentially in a fixed order across the PicoTiterPlate device during a sequencing run.

During the nucleotide flow, hundreds of thousands of beads each carrying millions of copies of a unique single-stranded DNA molecule are sequenced in parallel.

If a nucleotide complementary to the template strand is flowed into a well, the polymerase extends the existing DNA strand by adding nucelotide(s).

Addition of one (or more) nucleotide(s) results in a reaction that generates a light signal that is recorded by the CCD camera in the instrument.

The signal strength is proportional to the number of nucleotides incorporated in a single nucelotide flow.

The GS FLX System software tracks the location of DNA carrying beads on a XY axis. Each bead corresponds to a XY-coordinate on a series of images. The signal intensity per nucleotide flow is recorded for each bead over time and is plotted to generate a flowgram. Each 10 hour sequencing run on the GS FLX Titanium series will typically produce over one million flowgrams, one flowgram per read.

The development and impact of 454 sequencing. http://www.ncbi.nlm.nih.gov/pubmed/18846085

Rothberg et al.Biotechnology. Volume 26, 1117-1124 9/10/2008

Work involved in whole genome sequencing:

individual sequencing reads accumulate

each read about 500bp

computing used to assemble reads

contiguous sequences called contigs

Aim for 8-10 read coverage of genome for accuracy

example:

H.influenzae

19,687 templates

24,304 reads assembled

11,631,485 bp

Gaps in genome sequence need to be filled in:

Bridging Gaps

A contig is a set of gel readings that are related to one another by overlap of their sequences. The gel readings in a contig can be summed to form a contiguous consensus sequence, the length of this sequence forms the length of the contig.

rise in contig number as amount of reads increases

steady fall as accumulating sequence bridges gaps between contigs

levels off as new reads more likely in known contig than gap

start finishing

Finishing

Why are gaps present?

Gap bridging

sequence gaps

sequence gaps choose appropriate clone and walk

physical gaps

alternative libraries (which?)

PCR across gap

Mistakes/poor sequence

areas where sequence reads are less than 8-10

repeated sequences -rRNA

closure and completion

Genome annotation

Find ORFs

look for ATG-Stop (+alternatives)

over certain size

overlaps

computer based (Glimmer & Orpheus) and trained eye

ORF function

Search databases with predicted translated sequences BLASTX

Consider level of similarity and context

Domain comparisons

Pfam/Prosite

Other features

http://www.yeastgenome.org/MAP/GENOMICVIEW/GenomicView.shtml

http://mips.gsf.de/genre/proj/yeast/index.jsp

Artemis: sequence viewer and annotation tool from the Sanger Centre (http://www.sanger.ac.uk/Software/Artemis/)

http://xbase.bham.ac.uk/

xBASE is a database for comparative genome analysis of all bacterial genome sequences

Chaudhuri RR, Pallen MJ. xBASE, a collection of online databases for bacterial comparative genomics. Nucleic Acids Res. 2006 Jan 1;34(Database issue):D335-7.

http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D335

Post Genome Sequence Approaches

Comparative genomics

comparing genome organisation and content

genome size

genome repeats/Tn/phages

gene content

minimal gene content

Functional genomics ascribing gene function across a genome

gene function knowns

phenotype prediction