16S rRNA gene marker 16S rRNA gene marker
intra-gene variabilityintra-gene variability
primer selectionprimer selection
size & information contentsize & information content
Primer selection, information content, alignment and lengthPrimer selection, information content, alignment and length
16s rRNA gene marker
Conserved 2º structure
Natural gene amplificationGenealogy reconstruction
Ludwig and Schleifer, 1994 FEMS Rev 15:155-173
http://rna.ucsc.edu/rnacenter/ribosome_images.html
Intra-gene variability
secondary structure shows differences in the conservation of homologous sites
highly conserved zones give information on deep-genealogies
(higher resolution for distantly related)
hypervariable zones give information on recent events
(higher resolution for close relatives)
Anderson et al., 2008 PLoS ONE, 3: e2836
Stahl and Amann, 1991 John Wiley and Sons
Primer selection Primer selection universality universality
Universal primers target highly conserved regionsUniversal primers target highly conserved regions
Universality depends on the known datasetUniversality depends on the known dataset
Different phyla may have differences in the Different phyla may have differences in the ““universaluniversal”” regions (e.g. EUB 338) regions (e.g. EUB 338)
Primers used for rRNA cloning may give biased resultsPrimers used for rRNA cloning may give biased results
Metagenomics without amplification steps may reveal hidden diversityMetagenomics without amplification steps may reveal hidden diversity
EUB338 IEUB338 I Most Most BacteriaBacteria GCTGCCTCCCGTAGGAGTGCTGCCTCCCGTAGGAGT
EUB338 IIEUB338 II PlanctomycetalesPlanctomycetales GCGCAAGCCGCCAACCCGTAGGCCCGTAGGTTGTGT
EUB338 IIIEUB338 III VerrucomicrobialesVerrucomicrobiales GCTGCCGCTGCCAACCCGTAGGCCCGTAGGTTGTGT
Daims et al. 1999. System Appl Microbiol 22, 434-444
Primer selection Primer selection size of the amplicon size of the amplicon
GM38
616Valt 8
GM5 GM5-clamp
341
518F518
518R518
GM41492
907R907
945F945
Bac1055F1055
630R1529
S1505
ideally the almost complete gene (ideally the almost complete gene (~ 1520 nucleotides) should be sequenced~ 1520 nucleotides) should be sequenced
many amplifications skip sequencing the helix 50 (~ 1490 nucleotides)many amplifications skip sequencing the helix 50 (~ 1490 nucleotides)
many clone libraries are based on just partial amplicons (~ 900 nucleotides)many clone libraries are based on just partial amplicons (~ 900 nucleotides)
Pairs GM3 (8) – GM4 (1492) most widely usedPairs GM3 (8) – GM4 (1492) most widely used
16S rRNA sequencing has grown exponentially in parallel to the development of sequencing techniques
Yarza et al., Nature Revs. 2014. 12: 635-645 Tamames & Rosselló-Móra 2012 TIM 20:514-516
rRNA cataloguing
radioactive Sanger
sequencing
non-radioactive
Sanger sequencing
reverse transcription sequencing
NSG
The database is exponentially increasing99% environmental sequences
1% cultured organisms
3.8 x106 sequences700,000 / year (last three)
Sources of sequences and quality
rRNA Cataloguing (up to late 80’s), bad quality
reverse transcription sequencing (up to late 90’s), bad quality
Sanger methods (radioactive, biotin-labelled, terminal-dye… still in use)
cloning DNA, good quality
direct amplification, good quality
DGGE/TGGE, short sequences, bad quality
NSG, short sequences
454 technology (now up to 800nuc, mean of 500nuc), moderate quality
illumina (now 2x 250nuc), too short
16S rRNA sequencing has grown exponentially in parallel to the development of sequencing techniques
Quast et al., 2013, Nuc Acid Res. 41: D590-D596
www.arb-silva.de
SILVA release 119 (July 2014)
rate of rejection of about 30% of the existing sequences
short sequences are generally worse than long stretches
We divided the 16S rRNA gene into We divided the 16S rRNA gene into 66
regions of regions of 250250 nucleotides nucleotides
-Calculated taxa recovery in each stretchCalculated taxa recovery in each stretch
-Compare with that of the full sequenceCompare with that of the full sequenceRegions Regions V1 & V2V1 & V2
Regions Regions V3 & V4V3 & V4
Regions Regions V5 & V6V5 & V6Category minimum
Species 98.7%
Genus 94.5%
Family 86.5%
Order 82.0%
Class 78.5%
Phylum 75.0%
Yarza et al., Nature Revs. 2014. 12: 635-645
- 77% 77% of the 16S rRNA gene of the 16S rRNA gene
sequences sequences < 900pb< 900pb
- The 5The 5‘‘region (V1-V2) region (V1-V2)
overestimates speciesoverestimates species
- The remaining regions tend to The remaining regions tend to
underestimate all taxaunderestimate all taxa
- Increases in length tend to mirror Increases in length tend to mirror
that of the full sequencethat of the full sequence
Yarza et al., Nature Revs. 2014. 12: 635-645
Size & information contentSize & information content
complete sequences give complete information complete sequences give complete information
partial sequences lose phylogenetic signalpartial sequences lose phylogenetic signal
short sequences lose resolutionshort sequences lose resolution
1500 nuc1500 nuc
900 nuc900 nuc
300 nuc300 nuc
Primer selection & size of ampliconsPrimer selection & size of amplicons
selection of primers is important for representative results selection of primers is important for representative results
the length of the amplified/sequenced gene the length of the amplified/sequenced gene adequate phylogenetic signal adequate phylogenetic signal
short sequences may lose resolutionshort sequences may lose resolution