20
Supplementary material A metagenome of a full-scale microbial community carrying out Enhanced Biological Phosphorus Removal Mads Albertsen, Lea Benedicte Skov Hansen, Aaron Marc Saunders, Per Halkjær Nielsen, and Kåre Lehmann Nielsen Department of Biotechnology, Chemistry and Environmental Engineering, Aalborg University, Sohngaardsholmsvej 49, DK- 9000 Aalborg, Denmark Supplementary Text 1 of 1. An investigation of the reads mapping to the ppk1 gene of Accumulibacter was conducted to evaluate the sensitivity and specificity of the reference assembly against the Accumulibacter genome. 87 ppk1 sequences were obtained from NCBI and five ppk1 genes of closely related species were included. All ppk1 sequences were trimmed to the length of the smallest ppk1 fragments (1073 bp) and clustered using cdhit-est v.4.2.1 (Li and Godzik, 2006) with the following parameters; -c 0.99 – r 1. A BLAST database was created from the resulting 68 non- redundant sequences. The ppk1 sequences were assigned to different nodes in the phylogenetic tree using MEGAN. As MEGAN assigns reads to nodes based on the species information in the BLAST hits, the header of the individual ppk1 sequences were changed to reflect the topology of the phylogenetic tree. 1 of 20 5 10 15 20 25

media.nature.com  · Web viewThese sequences were matched to the ppk1 database using BLASTn with default parameters except –word_size = 7, –outfmt 5 and –evalue 1e-5

Embed Size (px)

Citation preview

Page 1: media.nature.com  · Web viewThese sequences were matched to the ppk1 database using BLASTn with default parameters except –word_size = 7, –outfmt 5 and –evalue 1e-5

Supplementary material

A metagenome of a full-scale microbial community carrying out Enhanced Biological

Phosphorus Removal

Mads Albertsen, Lea Benedicte Skov Hansen, Aaron Marc Saunders, Per Halkjær Nielsen,

and Kåre Lehmann Nielsen

Department of Biotechnology, Chemistry and Environmental Engineering, Aalborg

University, Sohngaardsholmsvej 49, DK-9000 Aalborg, Denmark

Supplementary Text 1 of 1.

An investigation of the reads mapping to the ppk1 gene of Accumulibacter was conducted to

evaluate the sensitivity and specificity of the reference assembly against the Accumulibacter

genome. 87 ppk1 sequences were obtained from NCBI and five ppk1 genes of closely related

species were included.

All ppk1 sequences were trimmed to the length of the smallest ppk1 fragments (1073 bp) and

clustered using cdhit-est v.4.2.1 (Li and Godzik, 2006) with the following parameters; -c 0.99

–r 1. A BLAST database was created from the resulting 68 non-redundant sequences. The

ppk1 sequences were assigned to different nodes in the phylogenetic tree using MEGAN. As

MEGAN assigns reads to nodes based on the species information in the BLAST hits, the

header of the individual ppk1 sequences were changed to reflect the topology of the

phylogenetic tree.

The metagenomic reads that matched the extracted region of the ppk1 gene in the

Accumulibacter genome in the original reference assembly, were extracted to investigate the

specificity of the reference mapping (inclusion of other bacteria in the mapping). These

sequences were matched to the ppk1 database using BLASTn with default parameters except

–word_size = 7, –outfmt 5 and –evalue 1e-5. The output was analysed in MEGAN.

In order to investigate the sensitivity (inclusion of most Accumulibacter clades in the

mapping) a reference assembly was conducted against the 68 Accumulibacter ppk1 genes

using CLCs reference mapping function requiring min. 85% identity over 70% of the read

length. Only reads with a minimum length of 60 bp were used. Otherwise the analysis was

conducted as the specificity analysis.

1 of 14

5

10

15

20

25

30

Page 2: media.nature.com  · Web viewThese sequences were matched to the ppk1 database using BLASTn with default parameters except –word_size = 7, –outfmt 5 and –evalue 1e-5

The high resolution of the diversity within the genus using the ppk1 gene was used as a test

case to validate the specificity (false positive matches) and the sensitivity (ability to recruit

reads from other Accumulibacter species) of the reference mapping. A total of 138 ppk1

genes were used to construct a phylogenetic tree (Figure S5) and the phylogenetic position of

each sequence was mimicked in MEGAN for the assignment of individual reads to different

nodes on the tree. The specificity analysis showed that only 10 of the 268 ppk1 reads assigned

to the Accumulibacter IIA str. UW-1 ppk1 gene had a better match to non-Accumulibacter

ppk1 sequences (Figure S7A). However, the sensitivity analysis showed that although we

were able to recruit most clade IIA ppk1 reads using the clade IIA str. UW-1 ppk1 gene, we

were not able to recruit more than approximately 30-50% of the reads from other

Accumulibacter species (Figure S7C).

2 of 14

35

40

Page 3: media.nature.com  · Web viewThese sequences were matched to the ppk1 database using BLASTn with default parameters except –word_size = 7, –outfmt 5 and –evalue 1e-5

Supplementary Figure 1 of 7.

Supplementary Figure 1. Histogram of length distribution of the de novo assembled contigs.

Contigs ≥ 300 bp were used for further analysis (blue bars).

3 of 14

45

5

Page 4: media.nature.com  · Web viewThese sequences were matched to the ppk1 database using BLASTn with default parameters except –word_size = 7, –outfmt 5 and –evalue 1e-5

Supplementary Figure 2 of 7.

Supplementary Figure 2. Histogram of the length distribution of ORFs with a significant

BLAST hit (e-value ≤ 1e-5) compared to ORFs where no significant hit could be found. The

“double curved” plots are due to the minimum contig size of 300 bp (100 amino acids).

4 of 14

50

Page 5: media.nature.com  · Web viewThese sequences were matched to the ppk1 database using BLASTn with default parameters except –word_size = 7, –outfmt 5 and –evalue 1e-5

Supplementary Figure 3 of 7.

Supplementary Figure 3. A) Species abundance curve. “Best hit” represent species

assignment based on best BLASTP hit. “10% Bitscore filter” represent species assignment if

the best BLASTP hit had a bitscore that is >10% higher than the second best BLASTP hit.

The graph only shows species with more than 100 ORFs assigned (100 ORFs ≈ 0.05% of all

ORFs). B) Species abundance chart. The 20 most abundant species are shown in the legend in

decreasing abundance. ORFs were assigned based on best BLASTP hit. C) Rarefaction

curves. The rarefaction function of MEGAN was used to create rarefaction curves at different

phylogenetic levels. The assignment is based on a 10% bitscore filter and minimum 5 ORFs

assigned.

5 of 14

55

60

65

10

Page 6: media.nature.com  · Web viewThese sequences were matched to the ppk1 database using BLASTn with default parameters except –word_size = 7, –outfmt 5 and –evalue 1e-5

Supplementary Figure 4 of 7.

Supplementary Figure 4. Annotation of ORFs in the largest contig (32884 bp). A yellow

ORF denote a significant blast hit (e-value ≤ 1e-5) whereas brown denotes no significant hit.

6 of 14

70

Page 7: media.nature.com  · Web viewThese sequences were matched to the ppk1 database using BLASTn with default parameters except –word_size = 7, –outfmt 5 and –evalue 1e-5

Supplementary Figure 5 of 7.

Supplementary Figure 5. Phylogenetic tree of ppk1 sequences. Sequences from Aalborg

East have been marked in red and the ppk1 sequence from “Candidatus Accumulibacter

phosphatis” clade IIA str. UW-1 has been marked in blue. In addition clade assignments have

been added. A putative new clade has been marked as IIx. The tree was first created on the

basis of 87 general ppk1 genes and only selected representative sequences are shown in the

final tree. The outgroup sequences (not shown on the tree) were Ralstonia eutropha

YP_300029, Ralstonia eutropha YP_729175 and Stenotrophomonas maltophilia K279a

CAQ44540.

7 of 14

75

80

Page 8: media.nature.com  · Web viewThese sequences were matched to the ppk1 database using BLASTn with default parameters except –word_size = 7, –outfmt 5 and –evalue 1e-5

Supplementary Figure 6 of 7.

Supplementary Figure 6. Comparison of genes prevalent in the different read pools based on

a reference mapping to the Accumulibacter genome. The percent read length covered was

used to compare presence or absence of genes. High identity reads (>95% identical at nt level,

x-axis) was compared with the rest of the read pool (≤95% identical at nt level, y-axis). Each

dot represents one gene. In order to compare which genes that differed between the high

(>95%) and low (≤95%) identity read pools, the read pool size of the low-identity group was

normalized (by subsampling) to the same size (179 741 reads) as the high-identity read pool,

thereby effectively comparing the prevalent genes in both read pools.

8 of 14

85

90

15

Page 9: media.nature.com  · Web viewThese sequences were matched to the ppk1 database using BLASTn with default parameters except –word_size = 7, –outfmt 5 and –evalue 1e-5

Supplementary Figure 7 of 7.

Supplementary Figure 7. Investigation of the specificity and sensitivity of the mapping of

metagenome reads to the genome of Accumulibacter clade IIA (NC_013194). MEGAN was

used to visualize the BLASTn results. A 10% bitscore difference was used to assign reads to

nodes. A) Investigation of the specificity of the mapping of the metagenome reads to the

Accumulibacter clade IIA ppk1 gene. The metagenome reads mapping to the clade IIA ppk1

gene were extracted and mapped to 68 non-redundant accumulibacter ppk1 genes and 5 ppk1

genes from closely related species. Few reads had best match to other species than

Accumulibacter. B) Investigation of the ability to include other Accumulibacter clades by the

use of the Accumulibacter clade IIA genome. The metagenome reads were mapped to 68 non-

redundant Accumulibacter ppk1 genes and the extracted read pool was searched (BLASTn)

against all 68+5 ppk1 genes and visualised using MEGAN. C) The combination of panel A

and B reveals that most clade IIA reads are extractable using the clade IIA genome, however

only approximately 30% of reads matching other clades are extracted.

9 of 14

95

100

105

Page 10: media.nature.com  · Web viewThese sequences were matched to the ppk1 database using BLASTn with default parameters except –word_size = 7, –outfmt 5 and –evalue 1e-5

Supplementary Table 1 of 2.

10 of 14

110

20

Page 11: media.nature.com  · Web viewThese sequences were matched to the ppk1 database using BLASTn with default parameters except –word_size = 7, –outfmt 5 and –evalue 1e-5

Supplementary references for Table S1.

Crocetti GR., Hugenholtz P, Bond PL, Schuler A, Keller J, Jenkins D, Blackall LL (2000). Identification of polyphosphate-accumulating organisms and design of 16S rRNA-directed probes for their detection and quantitation. Appl Environ Microbiol 66:1175-1182.

Daims H, Nielsen JL, Nielsen PH, Schleifer KH, Wagner M (2001). In situ characterization of Nitrospira-like nitrite- oxidizing bacteria active in wastewater treatment plants. Appl Environ Microbiol 67:5273-5284.

Daims H, Bruhl A, Amann R, Schleifer K-H,Wagner M (1999). The domain-specific probe EUB338 is insufficient for the detection of all bacteria: development and evaluation of a more comprehensive probe set. Syst Appl Microbiol 22: 434–444.

Erhart R, Bradford D, Seviour RJ, Amann R, Blackall LL (1997). Development and use of fluorescent in situ hybridization probes for the detection and identification of ‘Microthrix parvicella’ in activated sludge. Systematic Appl Microbiol 20:310-318.

Flowers J, He S, Carvalho G, Peterson SB, Lopez C, Yilmaz S, Zilles JL, Morgenroth E, Lemos PC, Reis MAM, Crespo MTB, Noguera DR, McMahon KD (2008). Ecological differentiation of Accumulibacter in EBPR reactors. In: Proceedings of the Water Environment Federation, WEFTEC 2008 (12):31-42.

Gieseke A, Purkhold U, Wagner M, Amann R, Schramm A (2001). Community structure and activity dynamics of nitrifying bacteria in a phosphate-removing biofilm. Appl Environ Microbiol 67:1351-1362.

Giuliano L, De Domenico M, De Domenico E, Hofle MG, Yakimov MM (1999). Identification of culturable oligotrophic bacteria within naturally occurring bacterioplankton communities of the Ligurian sea by 16S rRNA sequencing and probing. Micro Ecol 37:77-85.

Hess A, Zarda B, Hahn D, Haner A, Stax D, Hohener P, Zeyer J (1997). In situ analysis of denitrifying toluene- and m-xylene-degrading bacteria in a diesel fuel-contaminated laboratory aquifer column. Appl Environ Microbiol 63:2136-2141.

Hugenholtz P, Tyson GW, Webb RI, Wagner AM, Blackall LL (2001). Investigation of Candidate division TM7, a recently recognizedmajor lineage of the domain bacteriawith no known pure-culture representatives. Appl Environ Microbiol 67:411-419.

Kanagawa T, Kamagata Y, Aruga S, Kohno T, Horn M, Wagner M (2000). Phylogenetic analysis of and oligonucleotide probe development for Eikelboom type 021N filamentous bacteria isolated from bulking activated sludge. Appl Environ Microbiol 66:5043-5052.

Kong YH, Beer M, Rees GN, Seviour RJ (2002). Functional analysis of microbial communities in aerobiceanaerobic sequencing batch reactors fed with different phosphorus/ carbon (P/C) ratios. Microbiology-Sgm 148:2299-2307.

11 of 14

115

120

125

130

135

140

145

Page 12: media.nature.com  · Web viewThese sequences were matched to the ppk1 database using BLASTn with default parameters except –word_size = 7, –outfmt 5 and –evalue 1e-5

Kong Y, Nielsen JL, Nielsen PH (2005). Identity and ecophysiology of uncultured actinobacterial polyphosphate- accumulating organisms in full-scale enhanced biological phosphorus removal plants. Appl Environ Microbiol 71:4076-4085.

Kragelund C, Levantesi C, Borger A, Thelen K, Eikelboom D, Tandoi V, Kong Y, Krooneman J, Larsen P, Thomsen TR, Nielsen PH (2008). Identity, abundance and ecophysiology of filamentous bacteria belonging to the Bacteroidetes present in activated sludge plants. Microbiology 154:886-894.

Lajoie CA, Layton AC, Gregory IR, Sayler GS, Taylor DE, Meyers AJ (2000). Zoogloeal clusters and sludge dewatering potential in an industrial activated-sludge wastewater treatment plant. Water Environ Research 72:56-64.

Levantesi C, Rossetti S, Thelen K, Kragelund C, Krooneman J, Eikelboom D, Nielsen PH, Tandoi V (2006). Phylogeny, physiology and distribution of ‘Candidatus Microthrix calida’, anew Microthrix species isolated from industrial activated sludge wastewater treatment plants. Environ Microbiol 8:1552-1563.

Maixner F, Noguera DR, Anneser B, Stoecker K, Wegl G, Wagner M, Daims H (2006). Nitrite concentration influences the population structure of Nitrospira-like bacteria. Environ Microbiol 8:1487-1495.

Mobarry BK, Wagner M, Urbain V, Rittmann BE, Stahl DA (1996). Phylogenetic probes for analyzing abundance and spatial organization of nitrifying bacteria. Appl Environ Microbiol 62:2156-2162.

Nguyen HTT, Le VQ, Hansen AA, Nielsen JL, Nielsen PH (2011). High diversity and abundance of putative polyphosphate-accumulating Tetrasphaera-related bacteria in activated sludge systems. FEMS Microbiol Ecol 76:256-267.

Rossello-Mora RA, Wagner M, Amann R, Schleifer KH (1995). The abundance of Zoogloea ramigera in sewage treatment plants. Appl Environ Microbiol 61:702-707.

Schauer M, Hahn MW (2005). Diversity and phylogenetic affiliations of morphologically conspicuous large filamentous bacteria occurring in the pelagic zones of a broad spectrumof freshwater habitats. Appl Environ Microbiol 71:1931-1940.

Thomsen TR, Nielsen JL, Ramsing NB, Nielsen PH (2004). Micromanipulation and further identification of FISH-labelled microcolonies of a dominant denitrifying bacterium in activated sludge. Environ Microbiol 6:470-479.

Trebesius K, Leitritz L, Adler K, Schubert S, Autenrieth IB, Heesemann J (2000). Culture independent and rapid identification of bacterial pathogens in necrotising fasciitis and streptococcal toxic shock syndrome by fluorescence in situ hybridisation. Medical Microbiol Immun 188:169-175.

12 of 14

150

155

160

165

170

175

180

Page 13: media.nature.com  · Web viewThese sequences were matched to the ppk1 database using BLASTn with default parameters except –word_size = 7, –outfmt 5 and –evalue 1e-5

Supplementary Table 2 of 2

Table S2 Selected reference genomes from Dinsdale et al., (2008b) used for comparison with the metagenome obtained in the current study. In addition a metagenome from a non-EPBR wastewater treatment plant was included (Sanapareddy et al., 2009).

Metagenome Name Environment MG-RAST ID ReferenceSoudan Red Stuff Subterranean 4440281 Edwards et al., 2006Soudan Black Stuff Subterranean 4440282 Edwards et al., 2006Low Saltern microbes Hyper-Saline 4440437 Rodriguez-Brito et al 2009Medium Saltern Microbes (MB1110) Hyper-Saline 4440435 Rodriguez-Brito et al 2009Medium saltern microbes (MB1111) Hyper-Saline 4440434 Rodriguez-Brito et al 2009Low saltern pond plasmids (TT) Hyper-Saline 4440090 Rodriguez-Brito et al 2009High saltern microbial (HB1128) Hyper-Saline 4440419 Rodriguez-Brito et al 2009Salton Sea Bacteria 1 Hyper-Saline 4440329 Swan et al., 2010Medium salinity microbial (MB1116) Hyper-Saline 4440425 Rodriguez-Brito et al 2009Low salinity microbial (LB1128) Hyper-Saline 4440426 Rodriguez-Brito et al 2009Line Islands Kingman Reef B2 bacteria Marine 4440037 Dinsdale et al., 2008aLine Islands Christmas Reef B3 bacteria

Marine 4440041 Dinsdale et al., 2008aLine Islands Palmyra F8 Bacteria Marine 4440039 Dinsdale et al., 2008aDMSP 1 (MAM.1) Marine 4440364 Mou et al., 2008DMSP 2 (MAM.2) Marine 4440360 Mou et al., 2008VAN 2 (MAM 4) Marine 4440363 Mou et al., 2008Tilapia pond microbes Freshwater 4440440 Rodriguez-Brito et al 2009Healthy Tilapia pond microbes Freshwater 4440413 Rodriguez-Brito et al 2009Healthy Prebead tank microbes Freshwater 4440411 Rodriguez-Brito et al 2009Tpond microbe 3 Freshwater 4440422 Rodriguez-Brito et al 2009Rios Mesquites Stromatolites bacteria Microbialites 4440060 Breitbart et al., 2009Pozas Azule II stromatolite microbes Microbialites 4440067 Desnues et al., 2008Healthy slime bacteria Fish 4440059 Angly et al., 2009Morbid slime bacteria Fish 4440066 Angly et al., 2009Healthy gut bacteria Fish 4440055 Angly et al., 2009Morbid gut bacteria Fish 4440056 Angly et al., 2009Non-EBPR wastewater treatment plant WWTP N/A Sanapareddy et al., 2009

Supplementary references for Table S1.

Angly FE, Willner D, Prieto-Davó A, Edwards RA, Schmieder R, Vega-Thurber R, et al. (2009). The GAAS metagenomic tool and its estimations of viral and microbial average genome size in four major biomes. PLoS Computational Biology 5:e1000593.

Breitbart M, Hoare A, Nitti A, Siefert J, Haynes M, Dinsdale E, et al. (2009). Metagenomic and stable isotopic analyses of modern freshwater microbialites in Cuatro Ciénegas, Mexico. Environ Microbiol 11:16-34.

13 of 14

185

190

25

Page 14: media.nature.com  · Web viewThese sequences were matched to the ppk1 database using BLASTn with default parameters except –word_size = 7, –outfmt 5 and –evalue 1e-5

Desnues C, Rodriguez-Brito B, Rayhawk S, Kelley S, Tran T, Haynes M, et al. (2008). Biodiversity and biogeography of phages in modern stromatolites and thrombolites. Nature 452:340-343.

Dinsdale EA, Pantos O, Smriga S, Edwards RA, Angly F, Wegley L, et al. (2008a). Microbial ecology of four coral atolls in the Northern Line Islands. PloS one 3:e1584.

Dinsdale EA, Edwards RA, Hall D, Angly F, Breitbart M, Brulc JM, et al. (2008b). Functional metagenomic profiling of nine biomes. Nature 452:629–632.

Edwards RA, Rodriguez-Brito B, Wegley L, Haynes M, Breitbart M, Peterson DM, et al. (2006). Using pyrosequencing to shed light on deep mine microbial ecology. BMC genomics 7:57.

Mou X, Sun S, Edwards RA, Hodson RE, Moran MA. (2008). Bacterial carbon processing by generalist species in the coastal ocean. Nature 451:708-711.

Rodriguez-Brito B, Li L, Wegley L, Furlan M, Angly F, Breitbart M, et al. (2010). Viral and microbial community dynamics in four aquatic environments. The ISME J 4:739-751.

Sanapareddy N, Hamp TJ, Gonzalez LC, Hilger HA, Fodor AA, Clinton SM. (2009). Molecular diversity of a North Carolina wastewater treatment plant as revealed by pyrosequencing. Appl Environ Microbiol 75:1688-1696.

Swan BK, Ehrhardt CJ, Reifel KM, Moreno LI, Valentine DL. (2010). Archaeal and bacterial communities respond differently to environmental gradients in anoxic sediments of a California hypersaline lake, the Salton Sea. Appl Environ Microbiol 76:757-768.

14 of 14

195

200

205

210

215