1
Identification of Viral Biomarkers for Healthy Water Mitchell Webb 1* , Miguel Uyaguari-Diaz 1 , Matthew Croxen 1,2 , Natalie Prystajecky 1,2 , Judy Isaac- Renton 1,2 and Patrick Tang 1,2 . 1- University of British Columbia, 2- BCCDC Public Health Microbiology & Reference Laboratory Abstract Using a case-control model for study, viral markers were investigated for their use in predicting water health and contamination source. Samples were taken from three separate watersheds: Agricultural, Urban, and a Reference. From each of these, sub-samples were taken with respect to an identified contamination source: up-stream, down-stream, and at the site of contamination. Each sample was filtered in order to isolate viral particles and viral genetic material (DNA and RNA) was shotgun sequenced using the MiSeq bench top sequencer. Data was quality filtered and matched to a database in order to identify the viruses from which these reads came. Samples were compared to one another in order to identify significant differences in viral communities. This work is funded by Genome Canada, Genome British Columbia, Simon Fraser University, and the Public Health Agency of Canada. This work is carried out with co-investigators at University of British Columbia, Simon Fraser University, University of Saskatchewan, University of McGill, and Boreal Genomics. The authors thank the staff at Environmental Microbiology Water, and Molecular Services laboratories (BC-CDC Public Health Microbiology and Reference Laboratory). We also thank Joe Pennimpede (Capital Regional District), James Hibbert (University of South Carolina), Jan Finke (University of British Columbia) for sample collection, GIS assistance, and flow cytometry analysis, respectively. Introduction Materials and Methods Result s Future Research Acknowledgements CONTACT INFORMATION [email protected] [email protected] http://www.watersheddiscovery.ca/ Reference Watershed Urban Watershed Rural Watershed Amaral-Zettler, L. A., McCliment, E. A., Ducklow, H. W., and S. M. Huse. 2009. A method for studying protistan diversity using massively parallel sequencing of V9 hypervariable regions of small-subunit ribosomal RNA genes. PLoS One 4(7): e6372. File, J., Ttart, F., Suttle, C. A., and H. M. Krisch. 2005. Marine T4-type bacteriophages, a ubiquitous component of the dark matter of the biosphere. PNAS 102(45): 12471-12476. Brussaard, C.P.D. 2004. Optimization procedures for counting viruses by flow cytometry. Applied and Environmental Microbiology 70(3): 1506- 1513. Caporaso, J. G., Lauber, C. L., Walters, W. A., Berg-Lyons, D., Lozupone, C. A., Turnbaugh, P. J., Fierer, N., and R. Knight. 2011. Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample. PNAS 108(1): 4516-4522. Chen, F., and C. A. Suttle. 1995. Amplification of DNA Polymerase Gene Fragments from Viruses Infecting Microalgae. Applied and Environmental Microbiology 61(4): 1274-1278. Culley, A. I., Lang, A. S., and C. A. Suttle. 2006. Metagenomic Analysis of Coastal RNA Virus Communities. Science 312(5781): 1795- 1798. Hill, J. E., Town, J. R., and S. M. Hemmingsen. 2006. Improved template representation in cpn60 polymerase chain reaction (PCR) product libraries generated from complex templates by application of a specific mixture of PCR primers. Environmental microbiology 8(4): 741-746. White, T. J., T. Bruns, S. Lee, and J. W. Taylor. 1990. Amplification and direct sequencing of fungal ribosomal RNA genes for phylogenetics. Pp. 315-322 In: PCR Protocols: A guide to methods and Applications, eds. Innis, M. A., D. H. Gelfand, J. J. Sninsky, and T. J. White. Academic Press, Inc., New York. References The field of metagenomics is rapidly developing. With the continued sequencing of genomes and collaboration among researchers, databases are maturing and their utility in identifying micro- organisms continues to increase. However, organism-read matching may prove to be an insurmountable task to perform to any degree of practical use. Instead, in the context of water sample analysis, it may be more practical to simply match reads according to organizational taxonomic units (OTUs). This technique would permit data that was lost during the database match step of this work flow. Additionally, further sampling and characterization of the microbial fingerprint of healthy water samples and contamination sources will offer researches better insight into predictive patterns of water health. This research focused only on viral markers. However, bacterial and eukaryotic kingdoms have the potential to offer valuable insight into water health and potential contamination source. This work aims to integrate metagenomic profiles with physical, chemical and biological indicator data to identify A) novel markers of watershed health and B) novel microbial pollution profiles, suggestive of pollution source. Rural Urban1 Urban2 Reference 0 2000 4000 6000 8000 10000 12000 14000 Upstream Polluted Downstream genus Rural Up Rural Pol Rural Dw n Urban Pol 1 Urban Pol 2 Urban Dw n 1 Urban Dw n 2 RefUp RefDw n Betacoronavirus 560 364 506 351 369 410 616 77 681 Alphacoronavirus 467 275 402 389 288 290 448 163 508 Gammacoronavirus 282 227 277 102 213 184 281 25 376 Siphoviridae 116 94 77 449 152 356 40 607 40 Viruses_unclassified 197 211 223 134 269 185 158 231 114 T4-like viruses 210 141 176 151 197 166 272 71 324 Potyvirus 185 140 160 162 144 162 181 44 119 Flavivirus 97 105 177 166 109 144 84 192 67 Endornavirus 136 98 95 46 89 77 130 85 92 Sobemovirus 7 641 133 0 0 0 0 2 0 Coronavirinae 103 104 111 40 71 65 89 9 136 Podoviridae 48 41 50 20 140 41 24 211 15 Torovirus 69 29 57 21 49 32 89 70 99 Viruses 41 46 35 49 66 33 90 16 99 Bafinivirus 53 30 50 35 55 47 50 45 45 Varicellovirus 29 32 30 25 105 24 22 111 7 Simplexvirus 30 36 31 10 119 13 26 97 9 Coronavirinae_unclassified 62 39 60 16 28 27 42 2 42 Betabaculovirus 1 1 2 205 4 39 3 58 4 Closterovirus 16 13 9 26 16 39 6 163 6 Tospovirus 27 14 46 19 30 61 38 3 46 Okavirus 33 36 21 16 31 26 44 13 44 Cyprinivirus 11 22 18 19 55 13 16 54 14 Pestivirus 23 20 17 22 28 26 29 8 19 N4-like viruses 12 18 17 21 14 11 21 50 11 Iflavirus 14 21 5 19 14 33 18 10 23 Nairovirus 15 11 18 17 13 16 22 8 30 Dianthovirus 4 108 29 0 0 0 0 4 0 Caudovirales_unclassified 11 16 20 5 31 11 6 27 3 Carlavirus 11 5 5 28 4 32 5 31 1 Tobamovirus 1 2 5 9 1 90 0 4 1 Alphavirus 17 5 4 16 15 10 5 38 2 Potexvirus 2 20 65 3 5 8 4 4 1 Tombusvirus 0 5 2 4 0 98 0 0 0 Ipomovirus 7 6 15 26 15 4 13 5 17 Ascovirus 4 0 1 87 6 1 2 3 0 Arterivirus 15 12 5 25 11 9 7 11 1 Caudovirales 4 5 7 8 26 1 3 34 7 Hepacivirus 10 17 8 19 10 8 4 17 0 Inovirus 0 0 0 2 1 1 0 88 1 Myoviridae 7 8 8 17 15 8 8 9 7 Tymovirus 6 4 2 14 13 18 14 11 5 Betaherpesvirinae 4 9 9 10 11 3 7 33 0 Hypovirus 10 3 4 7 11 15 5 21 9 Lymphocryptovirus 5 8 8 10 11 16 6 16 4 Alphaherpesvirinae_unclassified 4 7 16 0 24 6 5 19 0 Arenavirus 11 8 10 10 12 9 14 3 3 Cytomegalovirus 2 2 2 6 9 1 7 50 0 Rhadinovirus 6 10 8 7 12 6 12 11 6 Nepovirus 10 2 10 14 4 25 7 0 1 Herpesviridae_unclassified 5 12 18 1 16 1 4 14 0 Marafivirus 6 7 5 4 8 26 6 7 2 Muromegalovirus 2 4 0 30 11 4 3 12 1 Tritimovirus 12 11 6 8 5 7 4 1 12 Crinivirus 9 4 7 9 5 11 9 1 9 Iltovirus 4 1 5 12 10 8 2 17 3 Aphthovirus 1 0 0 4 3 3 1 49 0 Herpesvirales_unclassified 3 5 10 2 24 2 2 12 1 Rymovirus 3 4 5 20 6 8 3 7 2 Mardivirus 7 2 1 11 8 9 5 7 5 Coccolithovirus 2 2 2 2 6 0 12 0 28 Batrachovirus 6 5 5 7 11 5 6 5 3 Waikavirus 9 5 3 6 4 10 9 1 6 Badnavirus 7 6 3 14 3 0 9 1 8 Capripoxvirus 8 1 1 8 2 26 2 0 2 Lam bda-like viruses 4 7 0 1 12 2 2 18 4 Coronaviridae_unclassified 6 7 3 2 3 4 5 0 18 phiKZ-like viruses 1 2 3 7 6 2 7 20 0 Betaretrovirus 1 0 0 2 0 1 0 43 0 Alphabaculovirus 4 7 0 7 5 10 1 6 6 Avipoxvirus 2 4 16 3 9 2 6 0 4 Tenuivirus 7 2 6 8 3 6 8 0 6 Ophiovirus 5 3 4 3 7 4 11 1 7 Cripavirus 4 2 1 4 4 3 4 17 4 Enterovirus 6 2 3 9 6 5 3 0 2 Cytorhabdovirus 2 6 2 2 1 18 1 2 1 Ampelovirus 2 0 0 11 2 12 5 2 0 Kobuvirus 3 1 3 6 2 4 9 4 2 Molluscipoxvirus 9 3 1 4 1 4 0 12 0 Chlorovirus 5 0 0 4 4 13 1 2 1 Orthobunyavirus 4 0 12 0 4 0 6 0 4 c2-like viruses 3 1 1 13 0 9 0 1 1 Hepatovirus 0 2 1 5 0 18 1 2 0 Poacevirus 2 0 5 4 1 6 4 4 2 Cardiovirus 0 2 2 10 0 5 0 8 1 Brambyvirus 3 3 5 2 3 2 4 3 1 Fabavirus 2 3 0 15 2 3 1 0 0 I3-like viruses 2 5 4 0 7 1 0 6 1 Rubivirus 1 1 1 7 7 0 0 9 0 Cosavirus 4 1 1 5 4 10 0 0 0 Orthopoxvirus 4 1 0 2 13 1 1 0 3 Flaviviridae 3 1 2 7 3 1 0 7 0 Iridovirus 4 1 1 0 2 0 5 7 4 Percavirus 3 2 3 3 6 1 1 5 0 Pneumovirus 3 0 8 0 0 1 5 0 7 Paraturdivirus 6 1 1 4 0 10 1 0 0 Enterococcus 2 1 1 3 5 1 2 3 4 Hantavirus 4 1 2 2 1 2 6 0 4 Respirovirus 0 2 4 2 3 0 6 0 5 T7-like viruses 2 1 1 4 1 2 7 0 3 Phycodnaviridae 1 4 1 4 1 4 4 1 0 Aparavirus 5 3 7 1 0 0 2 0 1 Leporipoxvirus 0 1 0 4 1 0 1 11 1 Bpp-1-like viruses 1 0 2 2 4 0 2 7 0 Phlebovirus 1 1 1 3 1 5 4 1 0 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 Clean water is a tremendous resource for both Canadian health and the economy. In addition, water quality plays a particularly important role in the general health of our many coastal ecosystems. Unfortunately, urbanization and agricultural land use threatens the cleanliness of water and thus increases the importance of appropriate treatment and testing. However, the current culture-based approach for water quality assessment could be improved. It is lacking in sensitivity, as a large proportion of pathogens cannot be cultured and are expensive to look for, and it is reactive, only testing positive after contamination has occurred. In order to more thoroughly explore these microbiomes, researchers have begun to apply high-throughput sequencing technology in the developing field of metagenomics. Metagenomics is defined as the simultaneous study of all genetic material recovered directly from a sample. In this way, a community of viruses, bacteria, and protists can be analyzed as a microbial fingerprint. In the present research, we use metagenomics to Samplin g Work Flow Database Match: USEARCH Sample site combination: Sample data, containing the identified organisms from all water samples, was compared using a python computer script Viral Community Heat Map Populati on Watershed Site Workflow Attrition Viral Population per Watershed Correlation Coefficient Matrix - Up stream - Contamination - Down stream Quality Filter: Raw data produced by Illumina’s MiSeq genetic sequencer was analyzed using a nucleotide- based quality filtering script written in python. 40 L Sample Viral Retentate Algorithm: This algorithm is faster than simple BLAST-ing by orders of magnitude. It exploits common sequences, called kmers, and uses them to perform a preliminary list of possible matches. Once the list is compiled, a refining match chooses the best result. Shotgun Sequencing

BCCDC Watershed Metagenomics Project: Viral Biomarkers 2013

Embed Size (px)

Citation preview

Page 1: BCCDC Watershed Metagenomics Project: Viral Biomarkers 2013

Identification of Viral Biomarkers for Healthy WaterMitchell Webb1*, Miguel Uyaguari-Diaz1, Matthew Croxen1,2, Natalie Prystajecky1,2, Judy Isaac-Renton1,2 and Patrick Tang1,2.

1- University of British Columbia, 2- BCCDC Public Health Microbiology & Reference Laboratory

AbstractUsing a case-control model for study, viral markers were investigated for their use in predicting water health and contamination source. Samples were taken from three separate watersheds: Agricultural, Urban, and a Reference. From each of these, sub-samples were taken with respect to an identified contamination source: up-stream, down-stream, and at the site of contamination. Each sample was filtered in order to isolate viral particles and viral genetic material (DNA and RNA) was shotgun sequenced using the MiSeq bench top sequencer. Data was quality filtered and matched to a database in order to identify the viruses from which these reads came. Samples were compared to one another in order to identify significant differences in viral communities.

This work is funded by Genome Canada, Genome British Columbia, Simon Fraser University, and the Public Health Agency of Canada. This work is carried out with co-investigators at University of British Columbia, Simon Fraser University, University of Saskatchewan, University of McGill, and Boreal Genomics. The authors thank the staff at Environmental Microbiology Water, and Molecular Services laboratories (BC-CDC Public Health Microbiology and Reference Laboratory). We also thank Joe Pennimpede (Capital Regional District), James Hibbert (University of South Carolina), Jan Finke (University of British Columbia) for sample collection, GIS assistance, and flow cytometry analysis, respectively.

Introduction

Materials and Methods Results

Future Research

AcknowledgementsCONTACT INFORMATION

[email protected]

[email protected]

http://www.watersheddiscovery.ca/

Reference Watershed

Urban Watershed

Rural Watershed

Amaral-Zettler, L. A., McCliment, E. A., Ducklow, H. W., and S. M. Huse. 2009. A method for studying protistan diversity using massively parallel sequencing of V9 hypervariable regions of small-subunit ribosomal RNA genes. PLoS One 4(7): e6372. Filee, J., Tetart, F., Suttle, C. A., and H. M. Krisch. 2005. Marine T4-type bacteriophages, a ubiquitous component of the dark matter of the biosphere. PNAS 102(45): 12471-12476.Brussaard, C.P.D. 2004. Optimization procedures for counting viruses by flow cytometry. Applied and Environmental Microbiology 70(3): 1506-1513. Caporaso, J. G., Lauber, C. L., Walters, W. A., Berg-Lyons, D., Lozupone, C. A., Turnbaugh, P. J., Fierer, N., and R. Knight. 2011. Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample. PNAS 108(1): 4516-4522.Chen, F., and C. A. Suttle. 1995. Amplification of DNA Polymerase Gene Fragments from Viruses Infecting Microalgae. Applied and Environmental Microbiology 61(4): 1274-1278.Culley, A. I., Lang, A. S., and C. A. Suttle. 2006. Metagenomic Analysis of Coastal RNA Virus Communities. Science 312(5781): 1795-1798.Hill, J. E., Town, J. R., and S. M. Hemmingsen. 2006. Improved template representation in cpn60 polymerase chain reaction (PCR) product libraries generated from complex templates by application of a specific mixture of PCR primers. Environmental microbiology 8(4): 741-746.White, T. J., T. Bruns, S. Lee, and J. W. Taylor. 1990. Amplification and direct sequencing of fungal ribosomal RNA genes for phylogenetics. Pp. 315-322 In: PCR Protocols: A guide to methods and Applications, eds. Innis, M. A., D. H. Gelfand, J. J. Sninsky, and T. J. White. Academic Press, Inc., New York.

ReferencesThe field of metagenomics is rapidly developing. With the continued sequencing of genomes and collaboration among researchers, databases are maturing and their utility in identifying micro-organisms continues to increase. However, organism-read matching may prove to be an insurmountable task to perform to any degree of practical use. Instead, in the context of water sample analysis, it may be more practical to simply match reads according to organizational taxonomic units (OTUs). This technique would permit data that was lost during the database match step of this work flow.

Additionally, further sampling and characterization of the microbial fingerprint of healthy water samples and contamination sources will offer researches better insight into predictive patterns of water health. This research focused only on viral markers. However, bacterial and eukaryotic kingdoms have the potential to offer valuable insight into water health and potential contamination source.

This work aims to integrate metagenomic profiles with physical, chemical and biological indicator data to identify A) novel markers of watershed health and B) novel microbial pollution profiles, suggestive of pollution source.

Rural Urban1 Urban2 Reference0

2000

4000

6000

8000

10000

12000

14000

UpstreamPollutedDownstream

genus Rural Up Rural Pol Rural Dwn Urban Pol 1 Urban Pol 2 Urban Dwn 1Urban Dwn 2Ref Up Ref DwnBetacoronavirus 560 364 506 351 369 410 616 77 681Alphacoronavirus 467 275 402 389 288 290 448 163 508Gammacoronavirus 282 227 277 102 213 184 281 25 376Siphoviridae 116 94 77 449 152 356 40 607 40Viruses_unclassified 197 211 223 134 269 185 158 231 114T4-like viruses 210 141 176 151 197 166 272 71 324Potyvirus 185 140 160 162 144 162 181 44 119Flavivirus 97 105 177 166 109 144 84 192 67Endornavirus 136 98 95 46 89 77 130 85 92Sobemovirus 7 641 133 0 0 0 0 2 0Coronavirinae 103 104 111 40 71 65 89 9 136Podoviridae 48 41 50 20 140 41 24 211 15Torovirus 69 29 57 21 49 32 89 70 99Viruses 41 46 35 49 66 33 90 16 99Bafinivirus 53 30 50 35 55 47 50 45 45Varicellovirus 29 32 30 25 105 24 22 111 7Simplexvirus 30 36 31 10 119 13 26 97 9Coronavirinae_unclassified 62 39 60 16 28 27 42 2 42Betabaculovirus 1 1 2 205 4 39 3 58 4Closterovirus 16 13 9 26 16 39 6 163 6Tospovirus 27 14 46 19 30 61 38 3 46Okavirus 33 36 21 16 31 26 44 13 44Cyprinivirus 11 22 18 19 55 13 16 54 14Pestivirus 23 20 17 22 28 26 29 8 19N4-like viruses 12 18 17 21 14 11 21 50 11Iflavirus 14 21 5 19 14 33 18 10 23Nairovirus 15 11 18 17 13 16 22 8 30Dianthovirus 4 108 29 0 0 0 0 4 0Caudovirales_unclassified 11 16 20 5 31 11 6 27 3Carlavirus 11 5 5 28 4 32 5 31 1Tobamovirus 1 2 5 9 1 90 0 4 1Alphavirus 17 5 4 16 15 10 5 38 2Potexvirus 2 20 65 3 5 8 4 4 1Tombusvirus 0 5 2 4 0 98 0 0 0Ipomovirus 7 6 15 26 15 4 13 5 17Ascovirus 4 0 1 87 6 1 2 3 0Arterivirus 15 12 5 25 11 9 7 11 1Caudovirales 4 5 7 8 26 1 3 34 7Hepacivirus 10 17 8 19 10 8 4 17 0Inovirus 0 0 0 2 1 1 0 88 1Myoviridae 7 8 8 17 15 8 8 9 7Tymovirus 6 4 2 14 13 18 14 11 5Betaherpesvirinae 4 9 9 10 11 3 7 33 0Hypovirus 10 3 4 7 11 15 5 21 9Lymphocryptovirus 5 8 8 10 11 16 6 16 4Alphaherpesvirinae_unclassified 4 7 16 0 24 6 5 19 0Arenavirus 11 8 10 10 12 9 14 3 3Cytomegalovirus 2 2 2 6 9 1 7 50 0Rhadinovirus 6 10 8 7 12 6 12 11 6Nepovirus 10 2 10 14 4 25 7 0 1Herpesviridae_unclassified 5 12 18 1 16 1 4 14 0Marafivirus 6 7 5 4 8 26 6 7 2Muromegalovirus 2 4 0 30 11 4 3 12 1Tritimovirus 12 11 6 8 5 7 4 1 12Crinivirus 9 4 7 9 5 11 9 1 9Iltovirus 4 1 5 12 10 8 2 17 3Aphthovirus 1 0 0 4 3 3 1 49 0Herpesvirales_unclassified 3 5 10 2 24 2 2 12 1Rymovirus 3 4 5 20 6 8 3 7 2Mardivirus 7 2 1 11 8 9 5 7 5Coccolithovirus 2 2 2 2 6 0 12 0 28Batrachovirus 6 5 5 7 11 5 6 5 3Waikavirus 9 5 3 6 4 10 9 1 6Badnavirus 7 6 3 14 3 0 9 1 8Capripoxvirus 8 1 1 8 2 26 2 0 2Lambda-like viruses 4 7 0 1 12 2 2 18 4Coronaviridae_unclassified 6 7 3 2 3 4 5 0 18phiKZ-like viruses 1 2 3 7 6 2 7 20 0Betaretrovirus 1 0 0 2 0 1 0 43 0Alphabaculovirus 4 7 0 7 5 10 1 6 6Avipoxvirus 2 4 16 3 9 2 6 0 4Tenuivirus 7 2 6 8 3 6 8 0 6Ophiovirus 5 3 4 3 7 4 11 1 7Cripavirus 4 2 1 4 4 3 4 17 4Enterovirus 6 2 3 9 6 5 3 0 2Cytorhabdovirus 2 6 2 2 1 18 1 2 1Ampelovirus 2 0 0 11 2 12 5 2 0Kobuvirus 3 1 3 6 2 4 9 4 2Molluscipoxvirus 9 3 1 4 1 4 0 12 0Chlorovirus 5 0 0 4 4 13 1 2 1Orthobunyavirus 4 0 12 0 4 0 6 0 4c2-like viruses 3 1 1 13 0 9 0 1 1Hepatovirus 0 2 1 5 0 18 1 2 0Poacevirus 2 0 5 4 1 6 4 4 2Cardiovirus 0 2 2 10 0 5 0 8 1Brambyvirus 3 3 5 2 3 2 4 3 1Fabavirus 2 3 0 15 2 3 1 0 0I3-like viruses 2 5 4 0 7 1 0 6 1Rubivirus 1 1 1 7 7 0 0 9 0Cosavirus 4 1 1 5 4 10 0 0 0Orthopoxvirus 4 1 0 2 13 1 1 0 3Flaviviridae 3 1 2 7 3 1 0 7 0Iridovirus 4 1 1 0 2 0 5 7 4Percavirus 3 2 3 3 6 1 1 5 0Pneumovirus 3 0 8 0 0 1 5 0 7Paraturdivirus 6 1 1 4 0 10 1 0 0Enterococcus 2 1 1 3 5 1 2 3 4Hantavirus 4 1 2 2 1 2 6 0 4Respirovirus 0 2 4 2 3 0 6 0 5T7-like viruses 2 1 1 4 1 2 7 0 3Phycodnaviridae 1 4 1 4 1 4 4 1 0Aparavirus 5 3 7 1 0 0 2 0 1Leporipoxvirus 0 1 0 4 1 0 1 11 1Bpp-1-like viruses 1 0 2 2 4 0 2 7 0Phlebovirus 1 1 1 3 1 5 4 1 0

0 0.02 0.04 0.06 0.08 0.1 0.12 0.14

Clean water is a tremendous resource for both Canadian health and the economy. In addition, water quality plays a particularly important role in the general health of our many coastal ecosystems. Unfortunately, urbanization and agricultural land use threatens the cleanliness of water and thus increases the importance of appropriate treatment and testing. However, the current culture-based approach for water quality assessment could be improved. It is lacking in sensitivity, as a large proportion of pathogens cannot be cultured and are expensive to look for, and it is reactive, only testing positive after contamination has occurred. In order to more thoroughly explore these microbiomes, researchers have begun to apply high-throughput sequencing technology in the developing field of metagenomics. Metagenomics is defined as the simultaneous study of all genetic material recovered directly from a sample. In this way, a community of viruses, bacteria, and protists can be analyzed as a microbial fingerprint. In the present research, we use metagenomics to identify novel biomarkers of watershed health and to develop a tool for matching the microbial fingerprint of a contaminated site to a specific source.

Sampling

Work Flow

Database Match: USEARCH

Sample site combination: Sample data, containing the identified organisms from all water samples, was compared using a python computer script

Viral Community Heat Map

Population

Watershed Site

Workflow Attrition

Viral Population per WatershedCorrelation Coefficient Matrix

- Up stream

- Contamination

- Down stream

Quality Filter: Raw data produced by Illumina’s MiSeq genetic sequencer was analyzed using a nucleotide-based quality filtering script written in python.

40 L SampleViral Retentate

Algorithm: This algorithm is faster than simple BLAST-ing by orders of magnitude. It exploits common sequences, called kmers, and uses them to perform a preliminary list of possible matches. Once the list is compiled, a refining match chooses the best result.

Shotgun Sequencing