Metagenomics Tools: BACS/Fosmid Libraries Whole Genome ... 17, 2005  · History of Marine...

Preview:

Citation preview

Metagenomics Tools:

BACS/Fosmid Libraries

Whole Genome Shotgun Sequencing

Amy Apprill

OCN 750: Molecular Methods in Biological Oceanography

November 17, 2005

- Limited physiology and functional role information known about microbes from cultures

-Phylotypes of noncultured microbes derived from rRNA genes only provide phylogenetic info, no information about physiology, biochemistry, or ecological function; subject to PCR-based biases

- Metagenomics allows isolation of large portions of genomes which provide access to genes for protein-coding for biochemical pathways

Why Metagenomics?

→ insight into specific physiological and ecological functions, metabolic variability of an environment

History of Marine Metagenomics

1991: Lambda phage used as a vector to create 10-20 kb insert shotgun library of picoplankton rRNA gene sequences, but also revealed other genes of interest (Schmidt TM, DeLong EF, Pace NR, 1991)

1992: Introduction of BAC & Fosmid cloning vectors from E. coli improved cloning efforts by controlling copy numbers

- BAC vectors replicate >300kb & display few chimeras (Shizuya et al. 1992)

1996: First environmental fosmid library with environmental samples from Oregon coast (Stein et al. 1996)

2000: First BAC library from marine environment (Beja et al. 2000); proteorhodopsin discovered from Monterey Bay BAC (Beja et al. 2000)

2005: Whole genome shotgun sequencing approach used on first marine environmental samples from the Sargasso Sea (Venter et al. 2005)

2002: AAnP diversity uncovered from Monterey Bay BAC (Beja et al. 2002)

BAC: bacterial artificial chromosomeA modified plasmid that contains an origin of replication derived from the E. coli F factor frequently used for large insert cloning experiments; exists within the cell very much like a cellular chromosome.

- 100- 300kb (even 600kp!) inserts; 1 insert ~10-15% bacterial genome

- Requires large amounts DNA (800-2000 L seawater)

- Useful for screening specific protein-coding genes and genes of uncultivated microbes

- Used to discover proteorhodopsin in several phylotypes, genes for anoxygenicphotosynthesis

Specifics:

Marine bacteria BAC/ fosmid construction - general

DeLong, 2005

How to create a BAC from seawater:1. Collect ~1000 L seawater

2. Pre-filter, use TFF to pellet cells

3. Agarose embed cell pellet

4. Lyse agarose embedded cells

5. Prepare large DNA fragments byHindIII digestion of agarose slices

- Run PFGE- Excise 150-400 kbp

regions- Extract gel-embedded DNA

(Beja et al 2000, Fig. 1)

6. Ligate DNA into vector (previously removed from cells)

How to create a BAC, cont.

http://www.ptf.okstate.edu/pulser.html

7. Transform vector into cells usingelectrophoration

8. Screen for phylogenetic info, purify & sequence

plasmid

(Beja et al 2000, Fig. 2A)

Pulse Field Gel Electrophoresis of BAC clones digested with NotI describes size of inserts

BAC Screening: rRNA Gene Surveys using Multiplex PCR

- Digest BAC/fosmid DNA to remove E. coli chromosome

- Screen fragments for rRNA gene from clones using 3 bacterial primer sets (SSU & LSU) and Archaea-specific

- Excise amplicons form gel, purify

- Clone & sequence purified products

Phylogenetic-informative multiplex PCR products describes phylogenetic groupings (Beja et al 2000, Fig. 5)

BAC Screening: ITS-LH-PCR

Figure 4. Suzuki et al. 2004

Uses natural length variations in ITS, and location of tRNA-alanine gene within the ITS, to ID unique gene fragments corresponding to phylogenetic groupings

1. Pool plasmid-safe treated DNA and PCR with fluorescent labeled SSU & LSU primers to amp ITS & tRNA genes

2. Capillary electrophoresis compares size stds to fragment lengths

3. Sequence unknown fragments w/ ITS primers and 16S primers

PROS:

- Sequence data; no fragment interpretation

rRNA gene surveysPROS:

- No direct DNA sequencing

- Easier to distinguish E. coli fragments

- High-throughput analysis

LH-ITS-PCR

CONS:

- Contaminating E. coliDNA

- PCR-based biases

- Not suitable for high-throughput analysis

CONS:

- Multiple clones w/ over lapping size

- Disruption of ITS may occur w/ cloning

- Some groups w/o linked SSU & LSU

- PCR-based biases

BAC Screening Comparison

- Represents 10-15% bacterial genome; gain info about uncultured microbes

- Functional gene presence implies physiology or ecology

- Controlled replication (replicon at 2 copies/cell)

- Low level of chimerism

- Requires large amounts sample (800-2000L sw)

- No direct phylogeneticinformation

- Screening may introduce PCR biases

- Expensive (time, screening)

Pros & Cons of BAC libariesPros: Cons:

- F1 origin-based cosmid vector

- ~40kb DNA inserts

- Requires smaller samples (>1L sw)

PROS: Quick; Takes days compared to months – year for BACS

CONS: Recovers fewer clones & more sheared DNA compared to BACS

Figure from Epicentre® biotechnologies (http://www.epibio.com/item.asp?ID=278&CatID=125&SubCatID=60)

Fosmid library

Whole genome shotgun sequencing: cloning the entire genome in a random fashion and sequencing the resultant clones

-Collect >200L seawater, pre-filter, TFF or 0.22µm

- Shotgun cloning of small fragments ranging 2-6 kb

-Shotgun Assembly: Computer program searches for overlapping sequences and assembles the sequenced fragments in correct order

(DeLong 2005)

Figure 2. Venter et al. 2005

Assembled FragmentsProchlorococcus marinus MED4

Pros:

- Lots of data

- Various phylogenetic marker genes assess diversity without PCR biases

- Unbiased identification of gene diversity

- Functional gene info implies ecology, physiology for generating hypothesis

Cons:

- Challenging to assemble fragments correctly in current context (lots of data!)

- Redundant sequencing

- Unknown order and orientation of clones

- Expensive

- Large sample size (>200L)

Whole genome shotgun sequencing

Table 1. Suzuki et al. 2004

Figure 1. Suzuki et al. 2004

Figure 2. Suzuki et al. 2004

Figure 3. Suzuki et al. 2004

Figure 4. Suzuki et al. 2004

Figure 1. Venter et al. 2005

Figure 2. Venter et al. 2005

Figure 3. Venter et al. 2005

Figure 4. Venter et al. 2005

Figure 5. Venter et al. 2005

Table 1. Venter et al. 2005

Figure 6. Venter et al. 2005

Venter et al. 2005

Table 3.

Table 2.

Figure 7. Venter et al. 2005

Whole genome shotgun sequencing success

Large magnitude and total gene count-1.045 billion base pairs non-redundant sequence

-1,625 Mb DNA sequence

-1,214,207 new genes identified

New discoveries

- 1,800 new microbial species

- 148 previously unknown bacterial phylotypes

- 782 new rhodopsin-like photoreceptors

- Open ocean Burkholderia Shewanella presence (??)

- Archaea with amo gene (followed up by Francis et al. 2005)

Sargasso Sea WGS (Venter et al. 2005):

What we can learn from marine BAC libraries

Apparent taxonomic affiliation of protein-encoding genes from different depths in Monterey Bay (DeLong 2005).

Published Metagenomics studies

DeLong 2005

Recommended