Upload
buidiep
View
219
Download
0
Embed Size (px)
Citation preview
1
Evaluation of GenBank, EzTaxon, and BIBI Services for the Molecular Identification of 1
Clinical Blood Culture Isolates that were Unidentifiable or Misidentified by 2
Conventional Methods 3
4
Kyung Sun Park1*, Chang-Seok Ki1*, Cheol-In Kang2, Yae-Jean Kim3, Doo Ryeon Chung2, Kyong-Ran Peck2, 5
Jae-Hoon Song2, Nam Yong Lee1 6
7
Department of Laboratory Medicine & Genetics1, Division of Infectious Diseases2, Department of Pediatrics3, 8
Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea 9
10
11
Running title: Evaluation of GenBank, EzTaxon, and BIBI: 16S rRNA 12
13
14
15
16
17
18
*The first two authors contributed equally to this work. 19
20
Corresponding authors: 21
Nam Yong Lee, M.D., Ph.D. 22
Department of Laboratory Medicine & Genetics, Samsung Medical Center, Sungkyunkwan 23
University School of Medicine, 50 Irwon-Dong, Gangnam-Gu, Seoul, South Korea, 135-710 24
Tel: +82-2-3410-2706, Fax: +82-2-3410-2719, E-mail: [email protected] 25
26
Copyright © 2012, American Society for Microbiology. All Rights Reserved.J. Clin. Microbiol. doi:10.1128/JCM.00081-12 JCM Accepts, published online ahead of print on 7 March 2012
on April 13, 2018 by guest
http://jcm.asm
.org/D
ownloaded from
2
Abstract 27
We compared the 16S rRNA gene sequencing results analyzed with GenBank, EzTaxon, 28
and BIBI databases for blood culture specimens where identifications were incomplete, 29
conflicting, or unidentifiable using conventional methods. Analyses performed using 30
GenBank combined with EzTaxon (kappa=0.79) were more discriminative than those using 31
other databases alone or in combination with a second database. 32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
on April 13, 2018 by guest
http://jcm.asm
.org/D
ownloaded from
3
The 16S rRNA gene is increasingly used to confirm the molecular identification of microbes, 52
but one major problem associated with 16S rRNA gene sequencing is difficulty in 53
interpretation. There are a number of public and commercial DNA sequencing databases 54
available for microbes. Public databases such as GenBank, which may be searched using the 55
National Center for Biotechnology Information Basic Local Alignment Search Tool (NCBI 56
BLAST), lack peer-reviewed sequences of type strains and sequences of non-type strains (3). 57
On the other hand, while commercial databases potentially contain high-quality filtered 58
sequence data, there are a limited number of reference sequences (5, 9, 11). 59
Recently, several freely available, quality-controlled, web-based public databases, such as 60
the EzTaxon database (http://www.eztaxon.org/) (1) and BIBI database (http://pbil.univ-61
lyon1.fr/bibi/) (8), have been developed for bacterial identification based on 16S rRNA gene 62
sequences. Despite the advances in 16S rRNA gene databases, these databases are rarely 63
evaluated or compared. 64
The aim of this study was to compare the 16S rRNA gene sequencing results analyzed with 65
GenBank using BLAST (GenBank), EzTaxon, and BIBI databases for blood culture 66
specimens where identifications were incomplete, conflicting, or unidentifiable using 67
conventional methods. 68
In our laboratory, 16S rRNA sequencing/or alternative DNA target genes such as gyrB, tuf, 69
secA1 or recA are used as an adjunct to established conventional methods for the 70
identification of difficult-to-identify or rarely encountered bacteria. From January 2010 to 71
April 2011, we encountered 41 consecutive cases of isolates from blood culture that were 72
conflicting, incomplete, or unidentified using conventional methods (37 cases of cultured 73
colonies from blood culture bottles and four cases directly from blood culture bottles). 74
First, the 16S rRNA gene sequences were analyzed with GenBank. The 16S rRNA gene 75
sequence analysis was compliant with the Clinical and Laboratory Standards Institute (CLSI) 76
on April 13, 2018 by guest
http://jcm.asm
.org/D
ownloaded from
4
guideline MM18-A (4). The same sequences were then compared with the EzTaxon and BIBI 77
database servers. A comparison of the characteristics of the GenBank, EzTaxon, and BIBI 78
databases is shown in Table 1. 79
The identification of microorganisms was considered final when there were two or more 80
concordant results from among three 16S rRNA databases, when the biochemical 81
characteristics of the identified strains were concordant with the known biochemical profiles 82
of the reference strains, or when the strain was identified by additional alternative target 83
genes according to CLSI guidelines. 84
Using our strategies, we correctly identified 30 (73.2%) of 41 strains as single species 85
(Table 2). However, three cases (strains 19, 27, and 36) were identified as a single genus with 86
multiple species. Four cases (strain 11, 18, 21, and 34) were identified at the genus level. We 87
were unable to identify two cases due to unsatisfactory quality. Another single case (strain 88
22) was identified using only the BIBI database and therefore was interpreted as not being 89
fully identified. 90
We used inter-rater agreement statistics (Kappa calculation) to evaluate the correlations 91
between 16S rRNA analyses using each database and combinations of two databases with 92
analyses using comprehensive identification, considering the 16S rRNA gene, and 93
biochemical characteristics or alternative target genes (Table 2). There were no databases 94
alone or in combination that had a kappa value greater than 0.80, which means there was a 95
very good correlation with comprehensive identification. These results imply that only the 96
16S rRNA gene analysis for unidentifiable or misidentified cases by conventional methods 97
has some limitations. The 16S rRNA analysis by GenBank (kappa=0.66) had a higher 98
correlation with analysis by comprehensive identification than analyses by other individual 99
databases. Furthermore, the analysis by GenBank combined with EzTaxon (kappa=0.79) 100
proved to be more discriminative than analysis by GenBank alone, another database alone or 101
on April 13, 2018 by guest
http://jcm.asm
.org/D
ownloaded from
5
combination of two other databases. 102
Of 39 isolates, not including the two cases with illegible sequences, we obtained 29 103
concordant results (74.4%) for the best-matched strains among the three databases (Table 3). 104
There were ten discordant results (25.6%) for the best-matched strains among the three 105
databases (Table 4). 106
The discrepancies among databases and differences in correlations with comprehensive 107
identifications indicate that there is a lack of consensus on a definite standard for the 108
necessary requirements for sequence databases and a lack of evaluation of the various 109
databases. 110
First, to some degree, these differences might result from the use of different software with 111
the various databases. A previous study (2) reported that, when the taxa being compared are 112
less closely related, the dendrogram relationships are more strongly affected by the program 113
used. We observed several cases (strains 11, 16, 19, 22, 28) where the best-matched strains in 114
one database were not used for analysis in another database because of their low similarity to 115
the query, even though they had the same GenBank accession number. There is a difficulty in 116
applying defined threshold values to determine genus and species according to CLSI 117
guidelines because different similarity results are obtained when the same sequence is 118
compared using different programs. Therefore, it is necessary to provide some guidelines or a 119
consensus regarding programs or parameters included in 16S rRNA gene sequence databases. 120
Furthermore, since we did not compare 16S rRNA analyses using other popular databases 121
such as greengenes (7), ribosomal data project (RDP) (6), or Ribosomal Differentiation of 122
Medical Microorganisms (10), more comparative studies using some of these other databases 123
are necessary. In addition, because our study has some bias because of the limited and 124
selected population of isolates, it is necessary to analyze more 16S rRNA sequences of 125
various organisms recovered from clinical specimens in those comparative studies. 126
on April 13, 2018 by guest
http://jcm.asm
.org/D
ownloaded from
6
Second, there are differences in the total number of 16S rRNA gene sequences in each 127
database. In the present study, the EzTaxon database had more unidentified results than other 128
databases because it contains only 16S rRNA gene sequences of type strains. 129
In conclusion, analysis of only the 16S rRNA gene is not sufficient for the molecular 130
identification of rare cases where conventional methods do not correctly identify the strain. 131
Based on our experience, we propose that 16S rRNA gene sequencing results should be 132
analyzed by two or more databases including GenBank, preferably analyzed using GenBank 133
at the start and confirmed using other peer-reviewed databases supplementally, because the 134
interpretation of 16S rRNA gene sequences depends on the program used by the database. 135
on April 13, 2018 by guest
http://jcm.asm
.org/D
ownloaded from
7
References 136 137 1. Chun, J., J. H. Lee, Y. Jung, M. Kim, S. Kim, B. K. Kim, and Y. W. Lim. 2007. 138
EzTaxon: a web-based tool for the identification of prokaryotes based on 16S 139
ribosomal RNA gene sequences. Int J Syst Evol Microbiol 57:2259-2261. 140
2. Clarridge, J. E., 3rd. 2004. Impact of 16S rRNA gene sequence analysis for 141
identification of bacteria on clinical microbiology and infectious diseases. Clin 142
Microbiol Rev 17:840-862, table of contents. 143
3. Clayton, R. A., G. Sutton, P. S. Hinkle, Jr., C. Bult, and C. Fields. 1995. 144
Intraspecific variation in small-subunit rRNA sequences in GenBank: why single 145
sequences may not adequately represent prokaryotic taxa. Int J Syst Bacteriol 45:595-146
599. 147
4. Clinical and Laboratory Standards Institute. 2008. Interpretive Criteria for 148
Identification of Bacteria and Fungi by DNA Target Sequencing; Approved Guideline. 149
MM18-A. Clinical and Laboratory Standards Institute. 150
5. Cloud, J. L., P. S. Conville, A. Croft, D. Harmsen, F. G. Witebsky, and K. C. 151
Carroll. 2004. Evaluation of partial 16S ribosomal DNA sequencing for identification 152
of nocardia species by using the MicroSeq 500 system with an expanded database. J 153
Clin Microbiol 42:578-584. 154
6. Cole, J. R., Q. Wang, E. Cardenas, J. Fish, B. Chai, R. J. Farris, A. S. Kulam-155
Syed-Mohideen, D. M. McGarrell, T. Marsh, G. M. Garrity, and J. M. Tiedje. 156
2009. The Ribosomal Database Project: improved alignments and new tools for rRNA 157
analysis. Nucleic Acids Res 37:D141-145. 158
7. DeSantis, T. Z., P. Hugenholtz, N. Larsen, M. Rojas, E. L. Brodie, K. Keller, T. 159
Huber, D. Dalevi, P. Hu, and G. L. Andersen. 2006. Greengenes, a chimera-checked 160
16S rRNA gene database and workbench compatible with ARB. Appl Environ 161
on April 13, 2018 by guest
http://jcm.asm
.org/D
ownloaded from
8
Microbiol 72:5069-5072. 162
8. Devulder, G., G. Perriere, F. Baty, and J. P. Flandrois. 2003. BIBI, a bioinformatics 163
bacterial identification tool. J Clin Microbiol 41:1785-1787. 164
9. Mellmann, A., J. L. Cloud, S. Andrees, K. Blackwood, K. C. Carroll, A. Kabani, 165
A. Roth, and D. Harmsen. 2003. Evaluation of RIDOM, MicroSeq, and Genbank 166
services in the molecular identification of Nocardia species. Int J Med Microbiol 167
293:359-370. 168
10. Turenne, C. Y., L. Tschetter, J. Wolfe, and A. Kabani. 2001. Necessity of quality-169
controlled 16S rRNA gene sequence databases: identifying nontuberculous 170
Mycobacterium species. J Clin Microbiol 39:3637-3648. 171
11. Woo, P. C. Y., K. H. L. Ng, S. K. P. Lau, K. t. Yip, A. M. Y. Fung, K. w. Leung, D. 172
M. W. Tam, T. l. Que, and K. y. Yuen. 2003. Usefulness of the MicroSeq 500 16S 173
Ribosomal DNA-Based Bacterial Identification System for Identification of Clinically 174
Significant Bacterial Isolates with Ambiguous Biochemical Profiles. Journal of 175
Clinical Microbiology 41:1996-2001. 176
177
178 179 180 181 182 183 184 185 186 187 188 189 190
on April 13, 2018 by guest
http://jcm.asm
.org/D
ownloaded from
10
Table 1. Comparison of characteristics of GenBank, EzTaxon, and BIBI databases 192 193
GenBank EzTaxon BIBI
Resources of reference
sequences
DNA DataBank of Japan (DDBJ) + the
European Molecular Biology Laboratory
(EMBL) + GenBank at NCBI
GenBank + sequences of strains
provided by authors
GenBank
Target genes All 16S rRNA 16S rRNA, gyrB, recA, sodA, rpoB, tmRNA, tuf,
groel2-hsp65
Curated sequences No Yes Yes
Updated nomenclature No Yes, from DSMZa Yes, from DSMZa
Origin of sequences All Only type strains All, But categorized analysis (type strains,
BacteriaArchaea_TS_SSU-rDNA-16S_stringent;
type strains + strains with validly published
name, BacteriaArchaea_SSU-rDNA-
16S_stringent; type strains + strains with/without
validly published name, BacteriaArchaea_SSU-
on April 13, 2018 by guest
http://jcm.asm
.org/D
ownloaded from
11
rDNA-16S_lax)
Search engines BLASTb BLASTb and FASTA,
then using pairwise global sequence
alignment (algorithm of Myers &
Miller)
BLASTb
Multiple sequence
alignment
No Yes, using Clustal W Yes, using Clustal W
Phylogenetic inference Neighbor-joining, fast minimum evolution
(using BLASTa pairwise alignment)
Neighbor-joining, maximum-
parsimony, maximum-likelihood (using
Clustal W)
Not described (using Clustal W)
a DSMZ, Deutsche Sammlung von Mikroorganismen und Zellkulturen (German Collection of Microorganisms and Cell Cultures) 194
b BLAST, basic local alignment search tool 195
196
197
198
199
on April 13, 2018 by guest
http://jcm.asm
.org/D
ownloaded from
12
Table 2. 16S rRNA gene analysis using GenBank, EzTaxon, and BIBI databases: Correlation with the comprehensive identification 200
considering 16S rRNA genes analysis and biochemical characteristics or alternative DNA target gene sequences 201
GenBank Kappa
(95% CI)
EzTaxon Kappa
(95% CI)
BIBI Kappa
(95% CI)
GenBank+
EzTaxona
Kappa
(95% CI)
GenBank
+BIBIa
Kappa
(95% CI)
EzTaxon
+BIBIa
Kappa
(95% CI)
Comprehensive
identification
Single species level 25 0.66
(0.45-0.88)
23 0.63
(0.41-0.85)
30 0.43
(0.18-0.68)
27 0.79
(0.60-0.98)
30 0.54
(0.28-0.81)
31 0.57
(0.31-0.82)
31
Single genus with
multiple species
10 9 3 7 4 3 3
Genus level, only 3 3 2 3 4 4 4
Unidentifiable 3 5 4 3 2 2 3
Misidentification 0 1 2 1 1 1 0
a When there were discrepant results between databases in the identification using the combination of two databases, a more specific outcome was considered as a 202
provisional result for the identification. In the unidentifiable cases using one database and identification using another, the identified results were regarded as provisional. 203
In the cases of identification of single species using one database and single genus with multiple species using another, the results for the single species were presumed to 204
be provisional. When the results of two databases were the same for genus but different for species, the provisional results were those at the genus level. 205
206
207
on April 13, 2018 by guest
http://jcm.asm
.org/D
ownloaded from
13
Table 3. 16S rRNA gene identification using GenBank, EzTaxon, and BIBI databases: 29 concordant results for the best-matched strains 208
209
Bacteria 16S rRNA-based identification Strain no. (N)
Gram-positive cocci Staphylococcus aureus 14,40 (2)
S. lugdunensis 5 (1)
Granulicatella adiacens 4, 13 (2)
Gemella morbillorum 35 (1)
Streptococcus pneumoniae 24 (1)
S. pneumoniae/S. pseudopneumoniae/S. mitisa 27, 36 (2)
S. mutans 33 (1)
Enterococcus faecalis 25, 30 (2)
Gram-negative cocci Neisseria species 18 (1)
Gram-negative bacilli Klebsiella pneumoniae 1 (1)
Burkholderia pseudomallei 7 (1)
Achromobacter xylosoxidans 16 (1)
Capnocytophaga sputigena 6 (1)
on April 13, 2018 by guest
http://jcm.asm
.org/D
ownloaded from
14
210
211
212
213
214
215
216
217
218
219
220
221
a The blood culture bottles presented positive signs, but these isolates were not cultured because of their autolytic tendencies. Failure to distinguish among 222
S. mitis, S. oralis, and S. pneumoniae is a well-known problem when performing gene sequencing. Even after several attempts to differentiate the species 223
using tuf, rpoB and recA genes, we could not obtain the correct sequencing results, presumably because of low DNA concentrations. 224
225
226
227
Eikenella corrodens 8 (1)
Leptotrichia trevisanii 3 (1)
Odoribacter splanchnicus 38 (1)
Gram-positive bacilli Mycobacterium chelonae/M. abscessus/M. massiliense/M. bolletii 41 (1)
Microbacterium paraoxydans 9 (1)
Lactobacillus salivarius 29, 32 (2)
L. paracasei 15 (1)
Bacillus circulans 39 (1)
Clostridium tertium 23, 31 (2)
C. symbiosum 12 (1) on April 13, 2018 by guest
http://jcm.asm
.org/D
ownloaded from
15
Table 4. 16S rRNA gene identification using GenBank, EzTaxon, and BIBI databases: Ten discordant results for the best-matched strains 228 229 230
Strain
No.
Identification by
conventional
methods
GenBank EzTaxon BIBIa Comprehensive
identification Best match (%
similarities, base
differences)
Second best
match (%
similarities, base
differences)
Third best match
(% similarities,
base differences)
Best match (%
similarities, base
differences)
Second best
match (%
similarities, base
differences)
Third best match
(% similarities,
base differences)
Best match Second best
match
19 Unidentified Streptococcus
mitis (99.44,
2/535)
Streptococcus
pseudopnuemoni
ae (99.43, 1/530),
Streptococcus
pneumoniae
(99.43, 1/530)
Streptococcus
orlais (99.25,
1/530)
Streptococcus
pseudopneumoni
ae (99.61, 2/510)
Streptococcus
mitis (99.43,
3/530)
Streptococcus
pneumoniae
(99.24, 4/526)
Streptococcus
mitis
Streptococcus
mitis/S.
pseudopnuemoni
ae/S.
pneumoniaeb
28 Colonies 1: E.
faecium/colonies
2: unidentified
Enterococcus
faecium (99.86,
0/698)
Enterococcus
durans (99.86,
1/699)
Enterococcus
faecium (99.86,
1/697)
Enterococcus
durans (99.71,
2/698)
Enterococcus
hirae (99.57,
3/698)
Enterococcus
durans
Enterococcus
faecium
Enterococcus
faecium
10 Sphingomonas
paucimobilis
(VITEK2-GN)/
Pasteurella
species (API-NE)
Aggregatibacter
aphrophilus
(99.31, 0/722)
Haemophilus
paraphrophilus
(99.03, 0/719)
Unidentified Aggregatibacter
aphrophilus
Aggregatibacter
aphrophilus
on April 13, 2018 by guest
http://jcm.asm
.org/D
ownloaded from
16
11 Pasteurella canis
(VITEK2-GN)/
Acinetobacter
lwofii
(MicroScan-O/N
combo 44)
Acinetobacter
species (99.05,
3/739)
Acinetobacter
parvus (99.18,
6/729)
Unidentified
(stringent),
Acinetobacter
species (lax)
Acinetobacter
species
37 Unidentified Moraxella
nonliquefaciens
(99.58, 2/715)
Moraxella
nonliquefaciens
(99.72, 2/712)
Unidentified
(stringent),
Moraxella
nonliquefaciens
(lax)
Moraxella
nonliquefaciens
20 Unidentified Microbacterium
aurum (99.82,
0/544)
Unidentified Microbacterium
aurum
Microbacterium
aurum
21 Unidentified Microbacterium
oxydans (99.60,
0/497)
Microbacterium
paraoxydans
(99.20, 1/498)
Microbacterium
paraoxydans
(98.19, 9/497)
Microbacterium
luteolum (98.05,
9/461)
Microbacterium
thalassium
(97.96, 10/498)
Microbacterium
phyllosphaerae
Microbacterium
species
26 Clostridium
bifermentans
(VITEK2-ANC)
Catabacter
hongkongensis
(99.12, 6/678)
Unidentified Catabacter
hongkongensis
Catabacter
hongkongensis
on April 13, 2018 by guest
http://jcm.asm
.org/D
ownloaded from
17
34 Unidentified Fontibacillus
aquaticus (97.11,
6/691)
Fontibacillus
aquaticus (97.23,
19/687)
Unidentified
(stringent, lax)
Fontibacillus
species
22 Unidentified Unidentified Unidentified Oscillibacter
valericigenes
Unidentified
a The % similarities and base differences are not reported for the BIBI database because they are not listed in the BIBI database . 231 b Described in Table 3. 232 233 234 235 236 237
on April 13, 2018 by guest
http://jcm.asm
.org/D
ownloaded from