8
Intragenomic Heterogeneity of 16S rRNA Genes Causes Overestimation of Prokaryotic Diversity Dong-Lei Sun, a Xuan Jiang, a Qinglong L. Wu, b Ning-Yi Zhou a Key Laboratory of Agricultural and Environmental Microbiology, Wuhan Institute of Virology, Chinese Academy of Sciences, Wuhan, China a ; State Key Laboratory of Lake Science and Environment, Nanjing Institute of Geography & Limnology, Chinese Academy of Sciences, Nanjing, China b Ever since Carl Woese introduced the use of 16S rRNA genes for determining the phylogenetic relationships of prokaryotes, this method has been regarded as the “gold standard” in both microbial phylogeny and ecology studies. However, intragenomic het- erogeneity within 16S rRNA genes has been reported in many investigations and is believed to bias the estimation of prokaryotic diversity. In the current study, 2,013 completely sequenced genomes of bacteria and archaea were analyzed and intragenomic heterogeneity was found in 952 genomes (585 species), with 87.5% of the divergence detected being below the 1% level. In partic- ular, some extremophiles (thermophiles and halophiles) were found to harbor highly divergent 16S rRNA genes. Overestimation caused by 16S rRNA gene intragenomic heterogeneity was evaluated at different levels using the full-length and partial 16S rRNA genes usually chosen as targets for pyrosequencing. The result indicates that, at the unique level, full-length 16S rRNA genes can produce an overestimation of as much as 123.7%, while at the 3% level, an overestimation of 12.9% for the V6 region may be in- troduced. Further analysis showed that intragenomic heterogeneity tends to concentrate in specific positions, with the V1 and V6 regions suffering the most intragenomic heterogeneity and the V4 and V5 regions suffering the least intragenomic heteroge- neity in bacteria. This is the most up-to-date overview of the diversity of 16S rRNA genes within prokaryotic genomes. It not only provides general guidance on how much overestimation can be introduced when applying 16S rRNA gene-based methods, due to its intragenomic heterogeneity, but also recommends that, for bacteria, this overestimation be minimized using primers target- ing the V4 and V5 regions. F or decades, 16S rRNA genes, which encode the small subunit of rRNA in prokaryotes, have been widely used in taxonomic assignment and phylogenetic relationship determination (1, 2). The specific properties of the 16S rRNA gene, including its ubiq- uitous distribution, mosaic structure (3), and relative stability (4), qualify it as an optimal choice to fulfill these applications. Al- though some argue that 16S rRNA genes alone may not be suffi- cient to identify closely related species (5, 6) and the use of mono- copy genes like rpoB to perform similar studies has been proposed (7), 16S rRNA genes are undoubtedly the most widely used mo- lecular markers in microbial ecological studies due to well-main- tained databases (8) and their easy accessibility. For many years, researchers have been trying to estimate the microbial diversity of complex environments, such as soil (9), marine systems (10), and animal gut systems (11, 12). Various techniques have been developed, from culture-dependent meth- ods to 16S rRNA genes-based methods of clone library (13, 14), denaturing gradient gel electrophoresis (DGGE) (15), terminal restriction fragment length polymorphism (T-RFLP) (16), and the recently developed next-generation sequencing (17). How- ever, the question of how diverse an environment can be has still not been precisely answered due to biases introduced from DNA extraction (18), PCR (19), or pyrosequencing noise (20). In addi- tion, the bias caused by intragenomic heterogeneity in 16S rRNA genes has also been gradually realized as more strains of the ar- chaea and bacteria have been reported to harbor multiple and different 16S rRNA gene copies (21, 22). Although general inves- tigations of intragenomic heterogeneity have been conducted in several cases (6, 7, 23, 24), those studies were mainly based on limited databases and none of them provided the degree of over- estimation of microbial diversity, although Acinas et al. reported an upper bound of roughly 2.5 times (24). With more and more complete prokaryotic genomes becoming available, it is time to comprehensively study this phenomenon and to predict the over- estimation of prokaryotic diversity introduced when using 16S rRNA gene-based methods. In this study, as many as 2,013 complete genomes from 1,212 unique species were retrieved from up-to-date databases. The per- vasiveness of the intragenomic heterogeneity of 16S rRNA genes and the different types of variations were analyzed. Overestima- tion for prokaryotes was evaluated by focusing on full-length and different regions of 16S rRNA genes under different dissimilarity levels. The per base inter- and intragenomic variation profile was calculated for bacteria and archaea to demonstrate the hot spots that mainly contribute to this overestimation. Finally, the V4 and V5 regions, which showed sufficient intergenomic variation and the least intragenomic heterogeneity, were proposed to be ideal targets for bacteria in 16S rRNA-based analyses. MATERIALS AND METHODS Genomes and 16S rRNA gene retrieval. Completely sequenced prokary- otic genomes were downloaded from Complete Microbial Genomes Da- tabase of the National Center for Biotechnology Information (NCBI) Received 22 April 2013 Accepted 17 July 2013 Published ahead of print 19 July 2013 Address correspondence to Ning-Yi Zhou, [email protected]. This article is in memory of Carl R. Woese (1928 –2012). Supplemental material for this article may be found at http://dx.doi.org/10.1128 /AEM.01282-13. Copyright © 2013, American Society for Microbiology. All Rights Reserved. doi:10.1128/AEM.01282-13 5962 aem.asm.org Applied and Environmental Microbiology p. 5962–5969 October 2013 Volume 79 Number 19 on March 21, 2020 by guest http://aem.asm.org/ Downloaded from

Intragenomic Heterogeneity of 16S rRNA Genes Causes ...erogeneity within 16S rRNA genes has been reported in many investigations and is believed to bias the estimation of prokaryotic

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Intragenomic Heterogeneity of 16S rRNA Genes Causes ...erogeneity within 16S rRNA genes has been reported in many investigations and is believed to bias the estimation of prokaryotic

Intragenomic Heterogeneity of 16S rRNA Genes CausesOverestimation of Prokaryotic Diversity

Dong-Lei Sun,a Xuan Jiang,a Qinglong L. Wu,b Ning-Yi Zhoua

Key Laboratory of Agricultural and Environmental Microbiology, Wuhan Institute of Virology, Chinese Academy of Sciences, Wuhan, Chinaa; State Key Laboratory of LakeScience and Environment, Nanjing Institute of Geography & Limnology, Chinese Academy of Sciences, Nanjing, Chinab

Ever since Carl Woese introduced the use of 16S rRNA genes for determining the phylogenetic relationships of prokaryotes, thismethod has been regarded as the “gold standard” in both microbial phylogeny and ecology studies. However, intragenomic het-erogeneity within 16S rRNA genes has been reported in many investigations and is believed to bias the estimation of prokaryoticdiversity. In the current study, 2,013 completely sequenced genomes of bacteria and archaea were analyzed and intragenomicheterogeneity was found in 952 genomes (585 species), with 87.5% of the divergence detected being below the 1% level. In partic-ular, some extremophiles (thermophiles and halophiles) were found to harbor highly divergent 16S rRNA genes. Overestimationcaused by 16S rRNA gene intragenomic heterogeneity was evaluated at different levels using the full-length and partial 16S rRNAgenes usually chosen as targets for pyrosequencing. The result indicates that, at the unique level, full-length 16S rRNA genes canproduce an overestimation of as much as 123.7%, while at the 3% level, an overestimation of 12.9% for the V6 region may be in-troduced. Further analysis showed that intragenomic heterogeneity tends to concentrate in specific positions, with the V1 andV6 regions suffering the most intragenomic heterogeneity and the V4 and V5 regions suffering the least intragenomic heteroge-neity in bacteria. This is the most up-to-date overview of the diversity of 16S rRNA genes within prokaryotic genomes. It not onlyprovides general guidance on how much overestimation can be introduced when applying 16S rRNA gene-based methods, due toits intragenomic heterogeneity, but also recommends that, for bacteria, this overestimation be minimized using primers target-ing the V4 and V5 regions.

For decades, 16S rRNA genes, which encode the small subunitof rRNA in prokaryotes, have been widely used in taxonomic

assignment and phylogenetic relationship determination (1, 2).The specific properties of the 16S rRNA gene, including its ubiq-uitous distribution, mosaic structure (3), and relative stability (4),qualify it as an optimal choice to fulfill these applications. Al-though some argue that 16S rRNA genes alone may not be suffi-cient to identify closely related species (5, 6) and the use of mono-copy genes like rpoB to perform similar studies has been proposed(7), 16S rRNA genes are undoubtedly the most widely used mo-lecular markers in microbial ecological studies due to well-main-tained databases (8) and their easy accessibility.

For many years, researchers have been trying to estimate themicrobial diversity of complex environments, such as soil (9),marine systems (10), and animal gut systems (11, 12). Varioustechniques have been developed, from culture-dependent meth-ods to 16S rRNA genes-based methods of clone library (13, 14),denaturing gradient gel electrophoresis (DGGE) (15), terminalrestriction fragment length polymorphism (T-RFLP) (16), andthe recently developed next-generation sequencing (17). How-ever, the question of how diverse an environment can be has stillnot been precisely answered due to biases introduced from DNAextraction (18), PCR (19), or pyrosequencing noise (20). In addi-tion, the bias caused by intragenomic heterogeneity in 16S rRNAgenes has also been gradually realized as more strains of the ar-chaea and bacteria have been reported to harbor multiple anddifferent 16S rRNA gene copies (21, 22). Although general inves-tigations of intragenomic heterogeneity have been conducted inseveral cases (6, 7, 23, 24), those studies were mainly based onlimited databases and none of them provided the degree of over-estimation of microbial diversity, although Acinas et al. reportedan upper bound of roughly 2.5 times (24). With more and more

complete prokaryotic genomes becoming available, it is time tocomprehensively study this phenomenon and to predict the over-estimation of prokaryotic diversity introduced when using 16SrRNA gene-based methods.

In this study, as many as 2,013 complete genomes from 1,212unique species were retrieved from up-to-date databases. The per-vasiveness of the intragenomic heterogeneity of 16S rRNA genesand the different types of variations were analyzed. Overestima-tion for prokaryotes was evaluated by focusing on full-length anddifferent regions of 16S rRNA genes under different dissimilaritylevels. The per base inter- and intragenomic variation profile wascalculated for bacteria and archaea to demonstrate the hot spotsthat mainly contribute to this overestimation. Finally, the V4 andV5 regions, which showed sufficient intergenomic variation andthe least intragenomic heterogeneity, were proposed to be idealtargets for bacteria in 16S rRNA-based analyses.

MATERIALS AND METHODSGenomes and 16S rRNA gene retrieval. Completely sequenced prokary-otic genomes were downloaded from Complete Microbial Genomes Da-tabase of the National Center for Biotechnology Information (NCBI)

Received 22 April 2013 Accepted 17 July 2013

Published ahead of print 19 July 2013

Address correspondence to Ning-Yi Zhou, [email protected].

This article is in memory of Carl R. Woese (1928 –2012).

Supplemental material for this article may be found at http://dx.doi.org/10.1128/AEM.01282-13.

Copyright © 2013, American Society for Microbiology. All Rights Reserved.

doi:10.1128/AEM.01282-13

5962 aem.asm.org Applied and Environmental Microbiology p. 5962–5969 October 2013 Volume 79 Number 19

on March 21, 2020 by guest

http://aem.asm

.org/D

ownloaded from

Page 2: Intragenomic Heterogeneity of 16S rRNA Genes Causes ...erogeneity within 16S rRNA genes has been reported in many investigations and is believed to bias the estimation of prokaryotic

website (http://ftp.ncbi.nih.gov/genomes/Bacteria/) in June 2012. All 16SrRNA genes of each genome (including plasmids) were searched and re-trieved by the stand-alone server RNAmmer (25) running in Linux. Linuxscripts were written to batch process multiple genomes. The retrieval ofrRNA genes from the genomes of bacteria and archaea was performedusing different options embedded in RNAmmer, which has previouslydisplayed an accurate and rapid ability to predict 5S, 16S, and 23S rRNAgenes (25). However, flanking regions of 16S rRNA genes that were mis-takenly recognized, which occasionally occurs, were manually correctedand are marked in Table S4 in the supplemental material.

Analysis of 16S rRNA intragenomic heterogeneity. After retrieval,the number of copies of 16S rRNA genes in each genome was recorded.Genomes with continuous ambiguous bases were excluded from thisstudy. Multiple 16S rRNA genes from single genomes were aligned usingthe MUSCLE program (26). The level of dissimilarity of sequences (dis-tance) was calculated using the MOTHUR program (27), with continuousgaps penalized once (onegap option) and end gaps counted. The logicbehind this type of penalty is that a gap represents an insertion and it islikely that a gap of any length represents a single insertion (27). Whencontinuous gaps were penalized by each gap (eachgap option), distanceswere also calculated since the onegap option can miss large continuousinsertions and deletions. Genes showing distances of greater than 1% werefurther analyzed for types of sequence diversity according to the previousdefinition (6), with minor changes. A BLAST search was performed forsequences that were inserted into 16S rRNA genes to find their origins.

Calculation of diversity overestimation. Since one species can be rep-resented by multiple entries in the database, data sets were constructedwith all 16S rRNA genes from unique species in each data set to avoidoverrepresentation of any species. Programs were written and executed torandomly construct 10 data sets, with each data set containing 1,212 dif-ferent species. Reference data sets were constructed with a sole represen-tative of the 16S rRNA genes sequences from each species. For each dataset, 16S rRNA genes were merged into one Fasta file and aligned to atemplate Greengenes (http://greengenes.lbl.gov/Download/) alignment(including 4,938 bacterial and archaeal sequences) using MOTHUR. Dis-tances were calculated using default settings. Operational taxonomic units(OTUs) were clustered under unique, 3%, 5%, and 10% dissimilaritylevels using the furthest neighbor. The degree of overestimation was cal-culated by comparing OTUs clustered with or without consideration ofintragenomic heterogeneity. Given that in microbial ecological studies therecently developed pyrosequencing usually focuses on partial regions of16S rRNA genes, such as the V1-V3 (28), V3 (29), V4-V5 (30), V5-V6(31), and V6 regions (32), 16S rRNA gene sequences were chopped ac-cording to published universal primers, as shown in Table 1, and thedegree of overestimation was calculated as described above. A dissimilar-ity level of 3% was used in further investigation of the overestimation forbacteria, archaea, and dominant phyla.

Per base intragenomic heterogeneity. As intragenomic heterogeneityis genome specific, investigation into per base intragenomic heterogeneity

was then calculated by averaging the Shannon information entropy ateach nucleotide position of 1,882 bacterial genomes and 129 archaealgenomes. Shannon entropy was calculated as ��p(xi) log2p(xi), wherep(xi) is the frequency of nucleotide i (A, T, C, G), or the gap character (33).Because of the significant divergence present between bacteria and ar-chaea 16S rRNA genes, the nucleotide position was determined with ref-erence to the Escherichia coli sequence for bacteria and to the Sulfolobussolfataricus sequence for archaea (34). The intergenomic per base varia-tion, measured as the information entropy, was calculated from the pre-aligned Greengenes reference alignment of 4,938 bacteria and archaeaspecies. Linux scripts written by ourselves were executed to perform thiscalculation.

RESULTS16S rRNA gene data set. In this study, 2,013 complete genomes(130 archaea, 1,883 bacteria) from 1,212 unique species (115 ar-chaea, 1,097 bacteria) were obtained from the NCBI completegenome database. The 16S rRNA genes were successfully retrievedfrom 2,011 genomes (see Table S1 in the supplemental material),except for the genomes of Chlamydophila psittaci RD1 (NCBI Bio-Project accession number 162063) in the bacteria domain andPyrobaculum sp. strain 1860 (NCBI BioProject accession number82379) in the archaea domain. This may be due to incompletegenome sequencing, although they were claimed to be completegenomes. These 1,212 unique species were from 35 different phyla(see Fig. S1 in the supplemental material), of which the Proteobac-teria, with 485 unique species, was the most abundant, followed bythe Firmicutes (206 species), Actinobacteria (132 species), Euryar-chaeota (78 species), Bacteroidetes (68 species), Crenarchaeota (33species), Spirochaetes (31 species), and Tenericutes (30 species).The remaining 27 phyla were represented by 149 species.

16S rRNA gene copy number in bacteria and archaea. Copynumbers ranging from 1 to 15 in bacteria and 1 to 4 in archaeawere discovered (Fig. 1) and were consistent with previously pub-lished statistics obtained using a smaller database (35). Specieswith only one 16S rRNA gene copy made up 17% of all species,while species with two copies were the most abundant (23.1%).Nevertheless, the number of species with multiple copies showed ageneral declining trend as the copy number increased (Fig. 1).Only 23 species (1.9%) with copy numbers greater than 10 were

TABLE 1 Primers used in this study to extract different regions of 16SrRNA genes

Region Name Sequence (5=–3=) Positiona Reference

V1-V3 8F AGAGTTTGATCCTGGCTCAG 8–27 48V1-V3 518R ATTACCGCGGCTGCTGG 518–534 15V3 341F CCTACGGGAGGCAGCAG 341–357 15V3 518R ATTACCGCGGCTGCTGG 518–534 15V4-V5 515F GTGNCAGCMGCCGCGGTAA 515–533 30V4-V5 R926 CCGYCAATTYMTTTRAGTTT 907–926 30V5-V6 U789F TAGATACCCSSGTAGTCC 789–806 31V5-V6 U1068R CTGACGRCRGCCATGC 1053–1068 31V6 967F CAACGCGAAGAACCTTACC 967–985 32V6 1046R CGACAGCCATGCANCACCT 1046–1064 32a Positions are according to the numbering for E. coli.

FIG 1 Distribution of different 16S rRNA gene copy numbers among bacteriaand archaea.

Intragenomic Heterogeneity of 16S rRNA Genes

October 2013 Volume 79 Number 19 aem.asm.org 5963

on March 21, 2020 by guest

http://aem.asm

.org/D

ownloaded from

Page 3: Intragenomic Heterogeneity of 16S rRNA Genes Causes ...erogeneity within 16S rRNA genes has been reported in many investigations and is believed to bias the estimation of prokaryotic

detected. The average copy number was 3.82 � 2.61 for bacteriaand evidently less for archaea (1.62 � 0.84). The average copynumber for all prokaryotes was 3.61 � 2.58. The average copynumber for each phylum is shown in Fig. 2. In the bacteria do-main, the Firmicutes had an average copy number of 6.01 � 2.82,which was the most abundant of all phyla, followed by the Fuso-bacteria at 5.40 � 1.36. The largest phylum, Proteobacteria,showed an overall average copy number of 3.94 � 2.62. Furtherstudy into this phylum showed that the copy number for the Gam-maproteobacteria (5.72 � 2.86) was significantly higher than thatfor the other proteobacteria (less than 3 copies). Low copy num-bers were generally observed in the Tenericutes (1.6 � 0.5), Chla-mydiae (1.6 � 0.8), and Acidobacteria (1.3 � 0.4). In the archaeadomain, 81.7% of 115 archaeal species contained only a singlecopy. Notably, all species from the Crenarchaeota harbored onlyone copy, and those with four copies were all from the Euryar-chaeota. Copy number variation within the same species was de-tected in 46 species, as shown in Table S2 in the supplementalmaterial.

Heterogeneity of 16S rRNA genes within genomes. Heteroge-neity of 16S rRNA genes was detected in 952 out of 2,011 genomes(585 out of 1,212 species) from which 16S rRNA genes were suc-cessfully retrieved. A majority of the heterogeneity detected wasbelow 1% (833 out of 952 genomes), but for the remaining 119genomes with intragenomic heterogeneity greater than 1%, clas-sification to a different species using 16S rRNA-based methodsmay occur. Table S3 in the supplemental material shows detailedinformation for 952 strains in which intragenomic heterogeneitywas detected. From Table S3, as many as 14 haplotypes were foundin some genomes and 16S rRNA gene heterogeneity varied from0.06% to 9.73% (mean � standard deviation, 0.51% � 0.91%).The pseudogene previously reported in Borrelia afzelii, with a highdifference of 20.38% (6), was not included in this study, as it is notpredicted to be a 16S rRNA gene by RNAmmer. A difference of11.6% for Thermoanaerobacter tengcongensis was calculated using

a different calculation method (24) (eachgap option). The greatestdistance calculated was found in Streptococcus suis JS14 (9.7%),and several extremophiles demonstrated high intragenomic vari-ation rates, as shown in Table S4 in the supplemental material,including Halomicrobium mukohataei DSM 12286 (9.29%), Ther-moanaerobacter tengcongensis MB4 (6.67%), Haloarcula maris-mortui ATCC 43049 (5.63%), Haloarcula hispanica ATCC 33960(5.23%), and Thermoanaerobacter pseudethanolicus ATCC 33223(2.94%). It is also worth noticing that although archaea generallyharbor fewer 16S rRNA gene copies, 26 out of 52 archaea strainswith multiple copies with intragenomic heterogeneity were de-tected.

Since variation in the full-length 16S rRNA sequence of lessthan 1% or 1.3% is generally accepted to be the threshold fordetermination of species (36), genomes with divergence in 16SrRNA genes of greater than 1% (calculated using the eachgap op-tion to avoid missing large deletions or insertions) were furtheranalyzed. The types of mutations were classified into the followingfive categories: intervening sequence (IVS; inserts larger than 10nucleotides [6]), deletion, truncation, regional diversity, or ran-dom. Among 143 genomes analyzed, as shown in Table S4 in thesupplemental material, 19 with IVSs (see Table S5 in the supple-mental material), 12 genomes with 16S rRNA gene truncations(see Table S6 in the supplemental material), 16 genomes withregional diversity, and 5 genomes with deletions (see Table S7 inthe supplemental material) were detected. The remaining 91 wereassigned to the random category, since no regular patterns werediscovered.

Although most of the IVSs in the 19 genomes with IVSs wereshorter than 300 bp, insertions of greater than 1,000 bp were alsodetected in 5 different genomes (see Table S5 in the supplementalmaterial). Further investigation indicated that all these large IVSswere the sequences of transposase genes with an imperfect in-verted repeat sequence. Intriguingly, 16S rRNA genes with suchhuge insertions exist only in a single copy, regardless of how many16S rRNA gene copies that the genome harbors. This is differentfrom the finding for shorter IVSs, which can coexist in multiplecopies within a single genome. In addition to IVSs, 16S rRNAgenes were also found to be truncated by sequences encoding aprotein (6 strains), 23S rRNA (4 strains), or a 16S-23S rRNA in-tergenic spacer (2 strains). All truncations were detected in onlyone copy of all 16S rRNA gene copies in each genome. Either endof 16S rRNA genes can be truncated, as shown in Table S6 in thesupplemental material. On the other hand, deletions were de-tected in 5 genomes (see Table S7 in the supplemental material),each within a length of 200 bp. Like truncations, all deletions werefound to exist in only one copy of all 16S rRNA gene copies.

Overestimation of OTUs under different dissimilarity levels.It has been reported that the intragenomic heterogeneity of 16SrRNA genes can lead to overestimation of the number of OTUsclustered in 16S rRNA gene-based microbial diversity studies (6,7, 24, 37). The results of a deeper and more comprehensive anal-ysis of this issue are shown in Table 2. At the unique level, anoverestimation of 123.7% was introduced when focusing on full-length 16S rRNA genes (usually generated by cloning and se-quencing) and an overestimation of 82.6% was introduced whenfocusing on the V1-V3 region. The lowest overestimation was seenfor the V3 (23.0%) and V6 (20.0%) regions at the unique level. Inmost studies using a partial 16S rRNA gene sequence, OTUs areusually defined at the 3% dissimilarity level (9, 12, 31). At this

FIG 2 Average copy number for each phylum. Numbers in parentheses indi-cate the total number of species in the phylum.

Sun et al.

5964 aem.asm.org Applied and Environmental Microbiology

on March 21, 2020 by guest

http://aem.asm

.org/D

ownloaded from

Page 4: Intragenomic Heterogeneity of 16S rRNA Genes Causes ...erogeneity within 16S rRNA genes has been reported in many investigations and is believed to bias the estimation of prokaryotic

level, the degree of overestimation was the lowest for the V4-V5region (3.0%) but the greatest for the V6 (12.9%) and V1-V3(8.9%) regions. A rarefaction curve displaying the degree of over-estimation at the unique and 3% levels is shown in Fig. 3. Whendissimilarity levels increased from 3% to 5% and 10%, the lowest

degree of overestimation was continuously shown for the V4-V5region (see Table 2 for detailed information).

When the overestimations for bacteria and archaea were sepa-rately investigated at the 3% dissimilarity level, the V4-V5 regionundoubtedly exhibited the lowest overestimation in bacteria, butall regions showed similar degrees of overestimation in the ar-chaea (Fig. 4). It is worth noting that only 129 genomes of archaeawere used in this study. Moreover, the Firmicutes and Proteobac-teria, which generally have more 16S rRNA gene copies, also hadgreater degrees of overestimations than phyla with fewer copies,such as the Tenericutes, as shown in Fig. 5. In contrast, no overes-timation was seen in the Crenarchaeota, since all genomes in thisphylum contain only one copy of the 16S rRNA gene.

To further study the nucleotide positions that contribute themost to the observed overestimation, the per base intragenomicvariation rate and the intergenomic variation rate were calculatedfor bacteria (Fig. 6) and for archaea (Fig. 7). Interestingly, the perbase intragenomic heterogeneity of the 16S rRNA gene tended toconcentrate in specific positions (Fig. 6C and 7C), and these po-sitions overlap corresponding intergenomic hypervariable re-gions in Fig. 6B and Fig. 7B. Unlike per base intergenomic varia-tion rates, which varied little among nine hypervariable regions(i.e., they provided equivalent phylogenetic information), intra-

TABLE 2 Overestimation at different dissimilarity levels for different16S rRNA gene regions

Level Region

Clustering whenconsideringintragenomicheterogeneity

Clustering whenruling outintragenomicheterogeneity

Overestimation(%)

No. ofOTUs E value

No. ofOTUs E value

Unique Full length 2,655 6.3 1,187 1.5 123.7V1-V3 2,127 16.8 1,165 2.0 82.6V3 1,278 9.0 1,039 2.1 23.0V4-V5 1,374 10.3 1,078 2.2 27.5V5-V6 1,396 11.3 1,069 1.5 30.6V6 1,162 5.6 968 2.1 20.0

0.03 Full length 941 0.9 889 1.9 5.8V1-V3 1,049 2.1 963 1.5 8.9V3 900 3.5 855 2.4 5.2V4-V5 805 1.8 782 0.8 3.0V5-V6 917 3.3 862 1.5 6.4V6 1,039 3.8 920 2.7 12.9

0.05 Full length 779 1.1 748 0.4 4.1V1-V3 907 2.7 856 1.1 6.0V3 764 1.3 744 2.8 2.7V4-V5 665 1.5 655 1.1 1.6V5-V6 769 1.1 731 1.3 5.1V6 935 3.2 851 2.1 9.8

0.1 Full length 513 1.9 498 1.6 3.0V1-V3 672 1.5 644 1.7 4.4V3 524 2.3 511 0.8 2.5V4-V5 435 1.5 429 0.5 1.4V5-V6 541 1.7 520 1.7 4.0V6 788 2.7 736 1.3 7.1

FIG 3 Rarefaction curve indicating the overestimation caused by 16S rRNA gene intragenomic heterogeneity at the unique (A) and 0.03 (B) dissimilarity levels.Results for levels of dissimilarity of 0.05 and 0.10 are not shown because the relatively small amount of overestimation was hard to distinguish on a rarefactioncurve. Solid lines, intragenomic heterogeneity was considered; dashed lines, intragenomic heterogeneity was ruled out. To better display the results, error barswere not plotted.

FIG 4 Degree of overestimation at the 3% level for archaea and bacteria,focusing on full-length and different hypervariable regions. Some error barsare too small to be seen.

Intragenomic Heterogeneity of 16S rRNA Genes

October 2013 Volume 79 Number 19 aem.asm.org 5965

on March 21, 2020 by guest

http://aem.asm

.org/D

ownloaded from

Page 5: Intragenomic Heterogeneity of 16S rRNA Genes Causes ...erogeneity within 16S rRNA genes has been reported in many investigations and is believed to bias the estimation of prokaryotic

genomic variation varied more significantly. In bacteria, the aver-age intragenomic variation rate between different regions differedas much as more than 10-fold (Table 3). It is obvious that the V4and V5 regions suffered the least intragenomic heterogeneity,whereas the V1 and V6 regions suffered the most (Fig. 6B).

DISCUSSION

The intragenomic heterogeneity of 16S rRNA genes has been rec-ognized in several studies (21, 38, 39) and is believed to causeoverestimation of microbial diversity when using 16S rRNA gene-based methods (7, 37). Although many 16S rRNA gene-basedmethods have been used in microbial ecological studies, few stud-ies have considered the influence of intragenomic heterogeneity.There are limited studies on the general profile of 16S rRNA geneintragenomic heterogeneity. These were conducted from 55 ge-

nomes (23), 81 genomes (24), and 883 genomes (6). Nevertheless,none of the above-mentioned studies provided a quantitativeevaluation for the overestimation, except the study using 81 ge-nomes, in which an upper bound of roughly 2.5-fold was pro-posed (24). Our current research with more than 2,000 completegenomes, which is 25 times more than the number used in above-mentioned study conducted 10 years ago, has provided an in-depth and up-to-date analysis of 16S rRNA gene intragenomicheterogeneity and the overestimation of prokaryotic diversity thatit has introduced.

The copy number of 16S rRNA genes has been curated in therrnDB database (http://rrndb.mmg.msu.edu/) since 2001 byKlappenbach et al. (35); however, the database does not seem tohave been well updated recently. The current rrnDB database con-tains records for only 1,322 bacteria and 89 archaea, an amountwhich is 30% less than the amount of our data. Although copynumbers ranging from 1 to 15 in bacteria and 1 to 4 in archaeawere reported by Acinas et al. (24), a clear trend toward a declinein the number of species as 16S rRNA gene copy numbers increasecan be seen from the current study (Fig. 1). There had been adebate over whether the average rRNA operon copy numbershould be revised up or down as databases become more complete(7). The reason to revise the number down was the overrepresen-tation of generalist bacteria among bacteria whose genomes hadbeen sequenced in early studies (24) (generalists usually containmore copies than specialists [40]). The opposite opinion to revisethe number up suggested that earlier studies were biased towardbacteria with specialist lifestyles (such as pathogens and symbi-onts) (7). In this study, after analyzing the up-to-date database, wesuggest that the average 16S rRNA gene copy number for pro-karyotes be revised down from 4.2 (40) or 4.1 (7) to 3.61. Thisresult indicates that generalists were indeed overrepresented inprevious databases and newly sequenced genomes are more likelyfrom species of specialists.

In accordance with previous observations (6), extensive heter-

FIG 5 Degree of overestimation at the 3% level for dominant phyla, focusingon full-length and different hypervariable regions. Some error bars are toosmall to be seen.

FIG 6 Intergenomic and intragenomic variation of 16S rRNA genes for bac-teria. The variation rate was measured as the Shannon information entropy(with positions referring to the numbering for E. coli) (A) and subsequentlyaveraged by a 25-bp window (B). The mean intragenomic variation was cal-culated by averaging the information entropy at each position of all 1,882bacterial genomes studied (C).

FIG 7 Intergenomic and intragenomic variation of 16S rRNA genes for ar-chaea. The variation rate was measured as the Shannon information entropy(with positions referring to the numbering for Sulfolobus solfataricus) (A) andsubsequently averaged by a 25-bp window (B). The mean intragenomic vari-ation was calculated by averaging the information entropy at each position ofall 129 archaeal genomes studied (C).

Sun et al.

5966 aem.asm.org Applied and Environmental Microbiology

on March 21, 2020 by guest

http://aem.asm

.org/D

ownloaded from

Page 6: Intragenomic Heterogeneity of 16S rRNA Genes Causes ...erogeneity within 16S rRNA genes has been reported in many investigations and is believed to bias the estimation of prokaryotic

ogeneity between 16S rRNA gene sequences within a genome alsoexists (48.2% of all species), although 87.5% of the divergencedetected was below the 1% level. Notably, some extremophilesdemonstrated significant high intragenomic heterogeneity, asshown in Table S4 in the supplemental material. This was previ-ously noticed in individual strains like Thermomonospora chromo-gena (39), Haloarcula marismortui (22), and Thermobisporabispora (21). The variations in 16S rRNA genes in these extremo-philes may be a strategy to adapt to the environment, with differ-ent copies being functional under different environmental condi-tions. For example, divergent 16S rRNA genes from a halophilicarchaeon, Haloarcula marismortui, were shown to be preferen-tially expressed under different temperature conditions (41). Itwas also reported that horizontal transfer of the 16S rRNA genewas predicted on the basis of the close relationship of the 16SrRNA genes between Thermomonospora chromogena and Thermo-bispora bispora (39). The detection in the current study of largernumbers of halophiles and thermophiles with great intragenomicvariation further suggests that adverse environments may lead toadaptive changes in 16S rRNA genes in these extremophiles.

Among the mutation types detected, the insertion of trans-posase genes into 16S rRNA was found in five genomes, instead oftwo genomes previously reported (42). It is not possible for the16S rRNA gene to remain functional with such a long insertion.Considering that Micrococcus luteus NCTC 2665, Thermus scoto-ductus SA 01, and Fervidobacterium pennivorans DSM 9078 eachhave only two 16S rRNA gene copies, including the one with atransposase gene insertion, we assume that some strains with mul-tiple gene copies can still survive when only one of their 16S rRNAgenes is functional. However, the relative fitness of these strainsmay decline when any of the 16S rRNA gene copies is destroyed, ashas been experimentally proved in Escherichia coli (43). Althoughit is not the theme of this study, the reason why strains harbormultiple rRNA operons within a single genome and how differentoperons behave have always been interesting issues (41).

Given the widespread existence of 16S rRNA gene intrag-enomic heterogeneity, the overestimation of microbial diversity isinevitable when using 16S rRNA gene-based methods. The generalprobability that a species with significant (�1%) 16S rRNA geneintragenomic diversity will be encountered is 7.8% (94 in 1,212unique species), a result greater than the 4.2% identified in a pre-vious report (6). In microbial ecological studies, the 3% level wasmost commonly used for defining OTUs and further analysis ofpyrosequencing data (29, 44). Under this dissimilarity level, over-estimation was the greatest for the V6 region (12.9%) and the least

for the V4-V5 region (3.0%) for all prokaryotes. This suggests thatif 1,000 OTUs are clustered at the 3% dissimilarity level usingprimers targeting the V6 region, approximately 114 OTUs areactually overestimated due to intragenomic heterogeneity.

The hot spots plotted in Fig. 6C and 7C indicate increasedintragenomic heterogeneity rated at specific positions. AlthoughCase et al. mentioned that the intragenomic heterogeneity of the16S rRNA gene was concentrated in some helices of 16S rRNA (7),our results clearly show where and to what degree these hot spotsexist. Much previous ecological research has targeted the V6 re-gion because of its relatively short length and because it is rich inphylogenetic information (32, 45) and is more suitable for usewith the Illumina technology (46). Youssef et al. previously real-ized the overestimation in richness when focusing on the V6 re-gion or the V1-V2 region and recommended that the V4, V5-V6,and V6-V7 regions be used in pyrosequencing studies (47). How-ever, the intragenomic heterogeneity of the 16S rRNA gene wasnot thought to be one of the causes for the overestimation. Con-sidering the extremely high intragenomic variation rate in V6 re-gions revealed in bacteria (Fig. 6C), the V4-V5 region, which hasthe least intragenomic heterogeneity, is suggested to be used infuture studies, particularly for the study of samples in which Fir-micutes and Proteobacteria are dominant, like soil and fecal sam-ples. Meanwhile, it is suggested that the V1 and V6 regions not beincluded in comparative studies, to avoid possible overestimation.For the archaea, however, the currently limited data surprisinglysuggested that the V4-V5 region is slightly more intragenomicallyvariable (Fig. 6C), and different regions showed similar averageintragenomic variation rates (Table 2). This feature, together withother differences from bacteria (e.g., a more conserved V6 regionin archaea; compare Fig. 6B and 7B), may imply structural andfunctional differences between bacterial and archaeal 16S rRNA.Future studies will be needed to further reveal the intragenomicvariation profile of the archaea domain.

It has to be admitted that all the calculations and conclusionspresented above are based on published genomes mainly fromculturable microbes and species of ecological or medical signifi-cance (7). The actual overestimation estimated in future studiesmay vary from sample to sample. Nevertheless, the current work isthe most up-to-date overview of the diversity of 16S rRNA geneswithin prokaryotic genomes. It not only provides general guid-ance on how much overestimation due to intragenomic heteroge-neity was introduced when applying 16S rRNA gene-based meth-ods but also recommends that, for bacteria, this overestimation beminimized using primers targeting the V4-V5 region.

TABLE 3 Accumulative and average Shannon information entropy of different regions

Region

Shannon information entropy

Bacteria Archaea

Intergenomic heterogeneity Intragenomic heterogeneity Intergenomic heterogeneity Intragenomic heterogeneity

Accumulative Avga Accumulative Avg Accumulative Avg Accumulative Avg

V1-V3 392.307 0.801 2.12 0.004 353.86 0.773 2.743 0.006V3 129.764 0.811 0.411 0.003 94.521 0.829 0.733 0.006V4-V5 247.249 0.663 0.371 0.001 240.499 0.638 1.864 0.005V5-V6 184.065 0.748 0.954 0.004 136.211 0.558 1.056 0.004V6 89.69 1.495 0.709 0.012 40.35 0.761 0.313 0.006Full length 1,005.36 0.655 6.334 0.004 961.547 0.641 7.454 0.005a Average Shannon information entropy is the accumulative Shannon information entropy divided by the region length.

Intragenomic Heterogeneity of 16S rRNA Genes

October 2013 Volume 79 Number 19 aem.asm.org 5967

on March 21, 2020 by guest

http://aem.asm

.org/D

ownloaded from

Page 7: Intragenomic Heterogeneity of 16S rRNA Genes Causes ...erogeneity within 16S rRNA genes has been reported in many investigations and is believed to bias the estimation of prokaryotic

ACKNOWLEDGMENTS

This study was supported by the National Natural Science Foundation ofChina (grants 81290341 and 21077130).

We thank Yanbo Ye and Haizhou Liu for their help with Linux pro-gramming.

REFERENCES1. Woese CR. 1987. Bacterial evolution. Microbiol. Rev. 51:221–271.2. Woese CR, Kandler O, Wheelis ML. 1990. Towards a natural system of

organisms: proposal for the domains Archaea, Bacteria, and Eucarya.Proc. Natl. Acad. Sci. U. S. A. 87:4576 – 4579.

3. Van de Peer Y, Chapelle S, De Wachter R. 1996. A quantitative map ofnucleotide substitution rates in bacterial rRNA. Nucleic Acids Res. 24:3381–3391.

4. Liao DQ. 2000. Gene conversion drives within genic sequences: concertedevolution of ribosomal RNA genes in bacteria and archaea. J. Mol. Evol.51:305–317.

5. Fox GE, Wisotzkey JD, Jurtshuk P. 1992. How close is close: 16S rRNAsequence identity may not be sufficient to guarantee species identity. Int. J.Syst. Bacteriol. 42:166 –170.

6. Pei AY, Oberdorf WE, Nossa CW, Agarwal A, Chokshi P, Gerz EA, JinZD, Lee P, Yang LY, Poles M, Brown SM, Sotero S, DeSantis T, BrodieE, Nelson K, Pei ZH. 2010. Diversity of 16S rRNA genes within individualprokaryotic genomes. Appl. Environ. Microbiol. 76:3886 –3897.

7. Case RJ, Boucher Y, Dahllof I, Holmstrom C, Doolittle WF, KjellebergS. 2007. Use of 16S rRNA and rpoB genes as molecular markers for mi-crobial ecology studies. Appl. Environ. Microbiol. 73:278 –288.

8. Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, Peplies J,Glockner FO. 2013. The SILVA ribosomal RNA gene database project:improved data processing and web-based tools. Nucleic Acids Res. 41:D590 –D596. doi:10.1093/nar/gks1219.

9. Roesch LF, Fulthorpe RR, Riva A, Casella G, Hadwin AKM, Kent AD,Daroub SH, Camargo FAO, Farmerie WG, Triplett EW. 2007. Pyrose-quencing enumerates and contrasts soil microbial diversity. ISME J.1:283–290.

10. Fuhrman JA, Mccallum K, Davis AA. 1993. Phylogenetic diversity ofsubsurface marine microbial communities from the Atlantic and PacificOceans. Appl. Environ. Microbiol. 59:1294 –1302.

11. Eckburg PB, Bik EM, Bernstein CN, Purdom E, Dethlefsen L, SargentM, Gill SR, Nelson KE, Relman DA. 2005. Diversity of the humanintestinal microbial flora. Science 308:1635–1638.

12. Hong PY, Wheeler E, Cann IKO, Mackie RI. 2011. Phylogenetic analysisof the fecal microbial community in herbivorous land and marine iguanasof the Galapagos Islands using 16S rRNA-based pyrosequencing. ISME J.5:1461–1470.

13. Giovannoni SJ, Britschgi TB, Moyer CL, Field KG. 1990. Genetic diver-sity in Sargasso Sea bacterioplankton. Nature 345:60 – 63.

14. Li M, Wang BH, Zhang MH, Rantalainen M, Wang SY, Zhou HK,Zhang Y, Shen J, Pang XY, Zhang ML, Wei H, Chen Y, Lu HF, Zuo J,Su MM, Qiu YP, Jia W, Xiao CN, Smith LM, Yang SL, Holmes E, TangHR, Zhao GP, Nicholson JK, Li LJ, Zhao LP. 2008. Symbiotic gutmicrobes modulate human metabolic phenotypes. Proc. Natl. Acad. Sci.U. S. A. 105:2117–2122.

15. Muyzer G, Dewaal EC, Uitterlinden AG. 1993. Profiling of complexmicrobial populations by denaturing gradient gel electrophoresis analysisof polymerase chain reaction-amplified genes coding for 16S rRNA. Appl.Environ. Microbiol. 59:695–700.

16. Osborn AM, Moore ERB, Timmis KN. 2000. An evaluation of terminal-restriction fragment length polymorphism (T-RFLP) analysis for thestudy of microbial community structure and dynamics. Environ. Micro-biol. 2:39 –50.

17. Wei H, Dong L, Wang TT, Zhang MH, Hua WY, Zhang CH, Pang XY,Chen MJ, Su MM, Qiu YP, Zhou MM, Yang SL, Chen Z, RantalainenM, Nicholson JK, Jia W, Wu DZ, Zhao LP. 2010. Structural shifts of gutmicrobiota as surrogate endpoints for monitoring host health changesinduced by carcinogen exposure. FEMS Microbiol. Ecol. 73:577–586.

18. Martin-Laurent F, Philippot L, Hallet S, Chaussod R, Germon JC,Soulas G, Catroux G. 2001. DNA extraction from soils: old bias for newmicrobial diversity analysis methods. Appl. Environ. Microbiol. 67:2354 –2359.

19. Polz MF, Cavanaugh CM. 1998. Bias in template-to-product ratios inmultitemplate PCR. Appl. Environ. Microbiol. 64:3724 –3730.

20. Quince C, Lanzen A, Curtis TP, Davenport RJ, Hall N, Head IM, ReadLF, Sloan WT. 2009. Accurate determination of microbial diversity from454 pyrosequencing data. Nat. Methods 6:639 – 641.

21. Wang Y, Zwang ZS, Ramanan N. 1997. The actinomycete Thermobisporabispora contains two distinct types of transcriptionally active 16S rRNAgenes. J. Bacteriol. 179:3270 –3276.

22. Mylvaganam S, Dennis PP. 1992. Sequence heterogeneity between thetwo genes encoding 16S rRNA from the halophilic archaebacterium Ha-loarcula marismortui. Genetics 130:399 – 410.

23. Coenye T, Vandamme P. 2003. Intragenomic heterogeneity betweenmultiple 16S ribosomal RNA operons in sequenced bacterial genomes.FEMS Microbiol. Lett. 228:45– 49.

24. Acinas SG, Marcelino LA, Klepac-Ceraj V, Polz MF. 2004. Divergenceand redundancy of 16S rRNA sequences in genomes with multiple rrnoperons. J. Bacteriol. 186:2629 –2635.

25. Lagesen K, Hallin P, Rodland EA, Staerfeldt HH, Rognes T, UsseryDW. 2007. RNAmmer: consistent and rapid annotation of ribosomalRNA genes. Nucleic Acids Res. 35:3100 –3108.

26. Edgar RC. 2004. MUSCLE: multiple sequence alignment with high accu-racy and high throughput. Nucleic Acids Res. 32:1792–1797.

27. Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB,Lesniewski RA, Oakley BB, Parks DH, Robinson CJ, Sahl JW, Stres B,Thallinger GG, Van Horn DJ, Weber CF. 2009. Introducing mothur:open-source, platform-independent, community-supported software fordescribing and comparing microbial communities. Appl. Environ. Micro-biol. 75:7537–7541.

28. Martinez I, Wallace G, Zhang CM, Legge R, Benson AK, Carr TP,Moriyama EN, Walter J. 2009. Diet-induced metabolic improvements ina hamster model of hypercholesterolemia are strongly linked to alterationsof the gut microbiota. Appl. Environ. Microbiol. 75:4175– 4184.

29. Bowman JS, Rasmussen S, Blom N, Deming JW, Rysgaard S, Sicheritz-Ponten T. 2012. Microbial community structure of Arctic multiyear seaice and surface seawater by 454 sequencing of the 16S RNA gene. ISME J.6:11–20.

30. Quince C, Lanzen A, Davenport RJ, Turnbaugh PJ. 2011. Removingnoise from pyrosequenced amplicons. BMC Bioinformatics 12:38. doi:10.1186/1471-2105-12-38.

31. Lee OO, Wang Y, Yang JK, Lafi FF, Al-Suwailem A, Qian PY. 2011.Pyrosequencing reveals highly diverse and species-specific microbial com-munities in sponges from the Red Sea. ISME J. 5:650 – 664.

32. Huse SM, Dethlefsen L, Huber JA, Welch DM, Relman DA, Sogin ML.2008. Exploring microbial diversity and taxonomy using SSU rRNA hy-pervariable tag sequencing. PLoS Genet. 4:e1000255. doi:10.1371/journal.pgen.1000255.

33. Andersson AF, Lindberg M, Jakobsson H, Backhed F, Nyren P, Eng-strand L. 2008. Comparative analysis of human gut microbiota by bar-coded pyrosequencing. PLoS One 3:e2836. doi:10.1371/journal.pone.0002836.

34. Gantner S, Andersson AF, Alonso-Saez L, Bertilsson S. 2011. Novelprimers for 16S rRNA-based archaeal community analyses in environ-mental samples. J. Microbiol. Methods 84:12–18.

35. Klappenbach JA, Saxman PR, Cole JR, Schmidt TM. 2001. rrndb: theribosomal RNA operon copy number database. Nucleic Acids Res. 29:181–184.

36. Stackebrandt E, Ebers J. 2006. Taxonomic parameters revisited: tar-nished gold standards. Microbiol. Today 33:153–155.

37. Crosby LD, Criddle CS. 2003. Understanding bias in microbial commu-nity analysis techniques due to rrn operon copy number heterogeneity.Biotechniques 34:790 – 802.

38. Nubel U, Engelen B, Felske A, Snaidr J, Wieshuber A, Amann RI,Ludwig W, Backhaus H. 1996. Sequence heterogeneities of genes encod-ing 16S rRNAs in Paenibacillus polymyxa detected by temperature gradi-ent gel electrophoresis. J. Bacteriol. 178:5636 –5643.

39. Yap WH, Zhang ZS, Wang Y. 1999. Distinct types of rRNA operons existin the genome of the actinomycete Thermomonospora chromogena andevidence for horizontal transfer of an entire rRNA operon. J. Bacteriol.181:5201–5209.

40. Klappenbach JA, Dunbar JM, Schmidt TM. 2000. rRNA operon copynumber reflects ecological strategies of bacteria. Appl. Environ. Microbiol.66:1328 –1333.

41. Lopez-Lopez A, Benlloch S, Bonfa M, Rodriguez-Valera F, Mira A.2007. Intragenomic 16S rDNA divergence in Haloarcula marismortui is anadaptation to different temperatures. J. Mol. Evol. 65:687– 696.

Sun et al.

5968 aem.asm.org Applied and Environmental Microbiology

on March 21, 2020 by guest

http://aem.asm

.org/D

ownloaded from

Page 8: Intragenomic Heterogeneity of 16S rRNA Genes Causes ...erogeneity within 16S rRNA genes has been reported in many investigations and is believed to bias the estimation of prokaryotic

42. Lim K, Furuta Y, Kobayashi I. 2012. Large variations in bacterial ribo-somal RNA genes. Mol. Biol. Evol. 29:2937–2948.

43. Stevenson BS, Schmidt TM. 2004. Life history implications of rRNAgene copy number in Escherichia coli. Appl. Environ. Microbiol. 70:6670 – 6677.

44. Ling ZX, Kong JM, Jia P, Wei CC, Wang YZ, Pan ZW, Huang WJ, LiLJ, Chen H, Xiang C. 2010. Analysis of oral microbiota in children withdental caries by PCR-DGGE and barcoded pyrosequencing. Microb. Ecol.60:677– 690.

45. Sogin ML, Morrison HG, Huber JA, Mark Welch D, Huse SM, Neal PR,Arrieta JM, Herndl GJ. 2006. Microbial diversity in the deep sea and the

underexplored “rare biosphere.” Proc. Natl. Acad. Sci. U. S. A. 103:12115–12120.

46. Bennett S. 2004. Solexa Ltd. Pharmacogenomics 5:433– 438.47. Youssef N, Sheik CS, Krumholz LR, Najar FZ, Roe BA, Elshahed MS.

2009. Comparison of species richness estimates obtained using nearlycomplete fragments and simulated pyrosequencing-generated fragmentsin 16S rRNA gene-based environmental surveys. Appl. Environ. Micro-biol. 75:5227–5236.

48. Turner S, Pryer KM, Miao VPW, Palmer JD. 1999. Investigating deepphylogenetic relationships among cyanobacteria and plastids by smallsubmit rRNA sequence analysis. J. Eukaryot. Microbiol. 46:327–338.

Intragenomic Heterogeneity of 16S rRNA Genes

October 2013 Volume 79 Number 19 aem.asm.org 5969

on March 21, 2020 by guest

http://aem.asm

.org/D

ownloaded from