Microbial16SrRNATaxonomicAnalysis QUICK DESIGN GUIDE …userweb.eng.gla.ac.uk/umer.ijaz/TAXO.pdf · 2012-11-24 · size of the final poster. All text and graphics will be printed

QUICK TIPS (--THIS SECTION DOES NOT PRINT--)

This PowerPoint template requires basic PowerPoint (version 2007 or newer) skills. Below is a list of commonly asked questions specific to this template. If you are using an older version of PowerPoint some template features may not work properly.

Using the template

Verifying the quality of your graphics Go to the VIEW menu and click on ZOOM to set your preferred magnification. This template is at 100% the size of the final poster. All text and graphics will be printed at 100% their size. To see what your poster will look like when printed, set the zoom to 100% and evaluate the quality of all your graphics before you submit your poster for printing. Using the placeholders To add text to this template click inside a placeholder and type in or paste your text. To move a placeholder, click on it once (to select it), place your cursor on its frame and your cursor will change to this symbol: Then, click once and drag it to its new location where you can resize it as needed. Additional placeholders can be found on the left side of this template. Modifying the layout This template has four different column layouts. Right-click your mouse on the background and click on “Layout” to see the layout options. The columns in the provided layouts are fixed and cannot be moved but advanced users can modify any layout by going to VIEW and then SLIDE MASTER. Importing text and graphics from external sources TEXT: Paste or type your text into a pre-existing placeholder or drag in a new placeholder from the left side of the template. Move it anywhere as needed. PHOTOS: Drag in a picture placeholder, size it first, click in it and insert a photo from the menu. TABLES: You can copy and paste a table from an external document onto this poster template. To adjust the way the text fits within the cells of a table that has been pasted, right-click on the table, click FORMAT SHAPE then click on TEXT BOX and change the INTERNAL MARGIN values to 0.25 Modifying the color scheme To change the color scheme of this template go to the “Design” menu and click on “Colors”. You can choose from the provide color combinations or you can create your own.

QUICK DESIGN GUIDE (--THIS SECTION DOES NOT PRINT--)

This PowerPoint 2007 template produces an A0 size professional poster. It will save you valuable time placing titles, subtitles, text, and graphics. Use it to create your presentation. Then send it to PosterPresentations.com for premium quality, same day affordable printing. We provide a series of online tutorials that will guide you through the poster design process and answer your poster production questions. View our online tutorials at: http://bit.ly/Poster_creation_help (copy and paste the link into your web browser). For assistance and to order your printed poster call PosterPresentations.com at 1.866.649.3004

Object Placeholders

Use the placeholders provided below to add new elements to your poster: Drag a placeholder onto the poster area, size it, and click it to edit. Section Header placeholder Move this preformatted section header placeholder to the poster area to add another section header. Use section headers to separate topics or concepts within your presentation. Text placeholder Move this preformatted text placeholder to the poster to add a new body of text. Picture placeholder Move this graphic placeholder onto your poster, size it first, and then click it to add a picture to the poster.

RESEARCH POSTER PRESENTATION DESIGN © 2012

www.PosterPresentations.com

© 2012 PosterPresenta.ons.com 2117 Fourth Street , Unit C Berkeley CA 94710 [email protected]

Student discounts are available on our Facebook page. Go to PosterPresentations.com and click on the FB icon.

We are developing a taxonomic analysis pipeline for multivariate analysis of microbial community structure in an environmental context. Microbial diversity is measured by sequencing homologous genes, typically the 16S rRNA, through the next-generation sequencing platforms. We extract the abundances of the observed taxa by classifying the sequences and then investigate the correlations between diversity patterns and environmental parameters.

Abstract

R Scripts

Some results

generateCCAAdonis.R

1University of Glasgow, 2University of Amsterdam

Umer Zeeshan Ijaz1, Rob Van Son2, and Christopher Quince1

Microbial 16S rRNA Taxonomic Analysis Pipeline

ninput SPE.csv ENV.csv noutput <results_generated_here> nscripts _generateColors_.R generateCLUSPlot.R generateNMDSPlot.R generateBivariatePlot.R generateDissimilarityPlot.R generateOrdistep.R generateCCAAdonis.R generateDiversityIndices.R generateReadsBarPlot.R generateCCABIOENV.R generateEnvHeapMaps.R generateRichnessPlot.R generateCCAPlot.R generateNMDSEnvPlot.R transposeCSV.sh

Directory Structure

Code: http://userweb.eng.gla.ac.uk/umer.ijaz/Taxonomic_Scripts.tar.gz!User Manual: http://userweb.eng.gla.ac.uk/umer.ijaz/taxonomic_scripts_manual.pdf!

This script uses analysis of variance using distance matrices to find the best set of environmental parameters that describe the community structure. We have used adonis() func.on from the vegan library which fits linear models to distance matrices and uses a permuta.on test with pseudo F-‐ra.os. It also draws a CCA plot with only those environmental variables that are below a cut off P-‐value. Addi.onally, most abundant taxa are drawn on top of the CCA plot.

generateCCABIOENV.R This script is an extension of vegan library's bioenv() func.on and finds the best set of environmental parameters with maximum (rank) correla.on with the community dissimilari.es and plots them on CCA. It also finds the best subset of species and along with environmental parameters, plot them on on NMDS plots.

generateNMDSEnvPlot.R This script generates the NMDS plot with environmental parameters drawn on top as contours.

_generateColors_.R This is a general purpose parser for coloring sites. If the samplenames have underscores in them, then the names are separated on these underscores and the colors are then assigned automa.cally based on the uniqueness of string literals in a par.cular column before and ager the underscore. For example, given the following sample names NW_1_1 NW_1_2 NW_2_1 if one chooses colorColumn<-‐2 in the parameter sec.ons of the respec.ve scripts, then the color indices will be (1,1,2), if one choose colorColumn<-‐3 then the color indices will be (1,2,1).

generateReadsBarPlot.R This script generates a bar plot of reads for each sample in the species abundance file.

generateRichnessPlot.R This script generates mul.ple subplots for all the environmental parameters against species richness in a single plot. The species richness is rarefied to the minimum sample numbers and a correla.on test is performed between the rarefied richness and the environmental parameters. The resul.ng correla.on and their significance is drawn on top of each subplot. Currently, it has support for three correla.on measures: Pearson; Spearman; and Kendall. Furthermore this script also generates a *_RICHNESS_LOG.txt file that contains the summary stats of regression of rarefied richness against environmental parameters. The last column contains the P-‐values and if significant, it indicates that the richness is affected by this par.cular environmental parameter

generateBivariatePlot.R This script generates bivariate plots with histograms on the diagonals, scaler plots with smooth curves below the diagonals and correla.ons with significance levels above diagonals. Moreover, the variables are reordered in the plots with any two consecu.ve variables on the diagonal being most similar.

generateDissimilarityPlot.R

This script generates plots of a given dissimilarity measure between samples. Magenta is high similarity, and cyan is high dissimilarity. In the current version of the program, you can use the following dissimilarity measures: Bray-‐Cur.s dissimilarity matrix on raw species data; Bray-‐Cur.s dissimilarity matrix on log-‐transformed abundances; chord distance matrix; Hellinger distance matrix; and Chi-‐square pre-‐transforma.on followed by Euclidean distance.

generateNMDSPlot.R

This script generates the non-‐metric distance scaling (NMDS) plot for the species abundance file. It finds a non-‐parameteric monotonic rela.onship between the dissimilari.es in the samples matrix, and the loca.on of each item in a low-‐dimensional space.

generateCCAPlot.R This script performs canonical correspondence analysis (CCA) to find the rela.onship between species and their environment. The method extracts environmental gradients and then use them for describing and visualizing the preference of taxa/sample on an ordina.on diagram.

generateCLUSPlot.R This script generates the hierarchical clustering plot by using Bray-‐Cur.s as a dissimilarity index between samples.

generateDiversityIndices.R This script generates the ecological diversity indices and rarefac.on species richness. The following indices are supported: Shanon index; Simpson index; inverse Simpson’s index; Fisher’s logarithmic series’ alpha parameters; and Pielou’s evenness. Furthermore, it generates a csv file *_div.csv for these indices.

generateEnvHeapMaps.R This script generates the heap maps for the species abundance file with OTU/Taxa names on the x-‐axis and samples on the y-‐axis. Furthermore, each generated image is ordered by an environmental parameter (increases in value if you go down). As you move from leg to right the abundance of taxa decreases. The script automa.cally splits the images on 100 most abundant taxa.

generateOrdiStep.R This script is useful for iden.fying the environmental parameters that describe the community composi.on. It produces three text files: -‐*_STEP_automa.c_permua.on_LOG.txt: Automa.c model building based on Akaike informa.on criteria but based on permuta.on test using step func.on -‐*_ORDISTEP_automa.c_pvalue_LOG.txt: Automa.c model building based on Akaike informa.on criteria but based on permuta.on of P-‐values -‐*_ORDISTEP_manual_LOG.txt: Manual modeling

Figure 1: An example of species abundance file for mul.ple samples that is generated by denoising 16S rRNA sequences using AmpliconNoise (Quince et al. 2011) and classified using RDP classifier

Figure 2: An example of environmental parameters for mul.ple samples

Figure 3: Richness plot Figure 4: Dissimilarity plot

Figure 5: Bivariate plot Figure 6: Hierarchical clustering plot

Figure 7: Heap map

Figure 8: Diversity indices

Figure 9: NMDS plot with best subset of taxa and environmental parameters

Figure 10: CCA plot with all environmental parameters

Figure 11: NMDS plot with environmental parameter contours

Input to the pipeline

Further Developments The final aim of this work is to provide a web-‐based front-‐end by pre-‐packaging the scripts with a perl-‐CGI servlet that has the ability to run the scripts in the background and to display the results in a web brower. A preliminary version of the pipeline is hosted at hlp://quince-‐srv1.eng.gla.ac.uk:8080 with the interface as follows:

[email protected]

Acknowledgments This work is supported by a Technology Strategy Board (TSB) and Unilever funded research grant “Development of instrumental and bioinforma.c pipelines to accelerate commercial applica.ons of metagenomics approaches”.

Figure 12: Microbial Taxonomic Analysis Pipeline v0.2

Documents

Microbial16SrRNATaxonomicAnalysis QUICK DESIGN GUIDE …userweb.eng.gla.ac.uk/umer.ijaz/TAXO.pdf · 2012-11-24 · size of the final poster. All text and graphics will be printed