1
RESEARCH POSTER PRESENTATION DESIGN © 2012 www.PosterPresentations.com We are developing a taxonomic analysis pipeline for multivariate analysis of microbial community structure in an environmental context. Microbial diversity is measured by sequencing homologous genes, typically the 16S rRNA, through the next-generation sequencing platforms. We extract the abundances of the observed taxa by classifying the sequences and then investigate the correlations between diversity patterns and environmental parameters. Abstract R Scripts Some results generateCCAAdonis.R 1 University of Glasgow, 2 University of Amsterdam Umer Zeeshan Ijaz 1 , Rob Van Son 2 , and Christopher Quince 1 Microbial 16S rRNA Taxonomic Analysis Pipeline input SPE.csv ENV.csv output <results_generated_here> scripts _generateColors_.R generateCLUSPlot.R generateNMDSPlot.R generateBivariatePlot.R generateDissimilarityPlot.R generateOrdistep.R generateCCAAdonis.R generateDiversityIndices.R generateReadsBarPlot.R generateCCABIOENV.R generateEnvHeapMaps.R generateRichnessPlot.R generateCCAPlot.R generateNMDSEnvPlot.R transposeCSV.sh Directory Structure Code : http://userweb.eng.gla.ac.uk/umer.ijaz/Taxonomic_Scripts.tar.gz User Manual : http://userweb.eng.gla.ac.uk/umer.ijaz/taxonomic_scripts_manual.pdf This script uses analysis of variance using distance matrices to find the best set of environmental parameters that describe the community structure. We have used adonis() func.on from the vegan library which fits linear models to distance matrices and uses a permuta.on test with pseudo Fra.os. It also draws a CCA plot with only those environmental variables that are below a cut off Pvalue. Addi.onally, most abundant taxa are drawn on top of the CCA plot. generateCCABIOENV.R This script is an extension of vegan library's bioenv() func.on and finds the best set of environmental parameters with maximum (rank) correla.on with the community dissimilari.es and plots them on CCA. It also finds the best subset of species and along with environmental parameters, plot them on on NMDS plots. generateNMDSEnvPlot.R This script generates the NMDS plot with environmental parameters drawn on top as contours. _generateColors_.R This is a general purpose parser for coloring sites. If the samplenames have underscores in them, then the names are separated on these underscores and the colors are then assigned automa.cally based on the uniqueness of string literals in a par.cular column before and ager the underscore. For example, given the following sample names NW_1_1 NW_1_2 NW_2_1 if one chooses colorColumn<2 in the parameter sec.ons of the respec.ve scripts, then the color indices will be (1,1,2), if one choose colorColumn<3 then the color indices will be (1,2,1). generateReadsBarPlot.R This script generates a bar plot of reads for each sample in the species abundance file. generateRichnessPlot.R This script generates mul.ple subplots for all the environmental parameters against species richness in a single plot. The species richness is rarefied to the minimum sample numbers and a correla.on test is performed between the rarefied richness and the environmental parameters. The resul.ng correla.on and their significance is drawn on top of each subplot. Currently, it has support for three correla.on measures: Pearson; Spearman; and Kendall. Furthermore this script also generates a *_RICHNESS_LOG.txt file that contains the summary stats of regression of rarefied richness against environmental parameters. The last column contains the Pvalues and if significant, it indicates that the richness is affected by this par.cular environmental parameter generateBivariatePlot.R This script generates bivariate plots with histograms on the diagonals, scaler plots with smooth curves below the diagonals and correla.ons with significance levels above diagonals. Moreover, the variables are reordered in the plots with any two consecu.ve variables on the diagonal being most similar. generateDissimilarityPlot.R This script generates plots of a given dissimilarity measure between samples. Magenta is high similarity, and cyan is high dissimilarity. In the current version of the program, you can use the following dissimilarity measures: BrayCur.s dissimilarity matrix on raw species data; BrayCur.s dissimilarity matrix on logtransformed abundances; chord distance matrix; Hellinger distance matrix; and Chisquare pretransforma.on followed by Euclidean distance. generateNMDSPlot.R This script generates the nonmetric distance scaling (NMDS) plot for the species abundance file. It finds a nonparameteric monotonic rela.onship between the dissimilari.es in the samples matrix, and the loca.on of each item in a lowdimensional space. generateCCAPlot.R This script performs canonical correspondence analysis (CCA) to find the rela.onship between species and their environment. The method extracts environmental gradients and then use them for describing and visualizing the preference of taxa/sample on an ordina.on diagram. generateCLUSPlot.R This script generates the hierarchical clustering plot by using BrayCur.s as a dissimilarity index between samples. generateDiversityIndices.R This script generates the ecological diversity indices and rarefac.on species richness. The following indices are supported: Shanon index; Simpson index; inverse Simpson’s index; Fisher’s logarithmic series’ alpha parameters; and Pielou’s evenness. Furthermore, it generates a csv file *_div.csv for these indices. generateEnvHeapMaps.R This script generates the heap maps for the species abundance file with OTU/Taxa names on the xaxis and samples on the yaxis. Furthermore, each generated image is ordered by an environmental parameter (increases in value if you go down). As you move from leg to right the abundance of taxa decreases. The script automa.cally splits the images on 100 most abundant taxa. generateOrdiStep.R This script is useful for iden.fying the environmental parameters that describe the community composi.on. It produces three text files: *_STEP_automa.c_permua.on_LOG.txt: Automa.c model building based on Akaike informa.on criteria but based on permuta.on test using step func.on *_ORDISTEP_automa.c_pvalue_LOG.txt: Automa.c model building based on Akaike informa.on criteria but based on permuta.on of Pvalues *_ORDISTEP_manual_LOG.txt: Manual modeling Figure 1: An example of species abundance file for mul.ple samples that is generated by denoising 16S rRNA sequences using AmpliconNoise (Quince et al. 2011) and classified using RDP classifier Figure 2: An example of environmental parameters for mul.ple samples Figure 3: Richness plot Figure 4: Dissimilarity plot Figure 5: Bivariate plot Figure 6: Hierarchical clustering plot Figure 7: Heap map Figure 8: Diversity indices Figure 9: NMDS plot with best subset of taxa and environmental parameters Figure 10: CCA plot with all environmental parameters Figure 11: NMDS plot with environmental parameter contours Input to the pipeline Further Developments The final aim of this work is to provide a webbased frontend by prepackaging the scripts with a perlCGI servlet that has the ability to run the scripts in the background and to display the results in a web brower. A preliminary version of the pipeline is hosted at hlp://quince srv1.eng.gla.ac.uk:8080 with the interface as follows: [email protected] Acknowledgments This work is supported by a Technology Strategy Board (TSB) and Unilever funded research grant “Development of instrumental and bioinforma.c pipelines to accelerate commercial applica.ons of metagenomics approaches”. Figure 12: Microbial Taxonomic Analysis Pipeline v0.2

Microbial16SrRNATaxonomicAnalysis QUICK DESIGN GUIDE …userweb.eng.gla.ac.uk/umer.ijaz/TAXO.pdf · 2012-11-24 · size of the final poster. All text and graphics will be printed

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Microbial16SrRNATaxonomicAnalysis QUICK DESIGN GUIDE …userweb.eng.gla.ac.uk/umer.ijaz/TAXO.pdf · 2012-11-24 · size of the final poster. All text and graphics will be printed

QUICK TIPS (--THIS SECTION DOES NOT PRINT--)

This PowerPoint template requires basic PowerPoint (version 2007 or newer) skills. Below is a list of commonly asked questions specific to this template. If you are using an older version of PowerPoint some template features may not work properly.

Using the template

Verifying the quality of your graphics Go to the VIEW menu and click on ZOOM to set your preferred magnification. This template is at 100% the size of the final poster. All text and graphics will be printed at 100% their size. To see what your poster will look like when printed, set the zoom to 100% and evaluate the quality of all your graphics before you submit your poster for printing. Using the placeholders To add text to this template click inside a placeholder and type in or paste your text. To move a placeholder, click on it once (to select it), place your cursor on its frame and your cursor will change to this symbol: Then, click once and drag it to its new location where you can resize it as needed. Additional placeholders can be found on the left side of this template. Modifying the layout This template has four different column layouts. Right-click your mouse on the background and click on “Layout” to see the layout options. The columns in the provided layouts are fixed and cannot be moved but advanced users can modify any layout by going to VIEW and then SLIDE MASTER. Importing text and graphics from external sources TEXT: Paste or type your text into a pre-existing placeholder or drag in a new placeholder from the left side of the template. Move it anywhere as needed. PHOTOS: Drag in a picture placeholder, size it first, click in it and insert a photo from the menu. TABLES: You can copy and paste a table from an external document onto this poster template. To adjust the way the text fits within the cells of a table that has been pasted, right-click on the table, click FORMAT SHAPE then click on TEXT BOX and change the INTERNAL MARGIN values to 0.25 Modifying the color scheme To change the color scheme of this template go to the “Design” menu and click on “Colors”. You can choose from the provide color combinations or you can create your own.

QUICK DESIGN GUIDE (--THIS SECTION DOES NOT PRINT--)

This PowerPoint 2007 template produces an A0 size professional poster. It will save you valuable time placing titles, subtitles, text, and graphics. Use it to create your presentation. Then send it to PosterPresentations.com for premium quality, same day affordable printing. We provide a series of online tutorials that will guide you through the poster design process and answer your poster production questions. View our online tutorials at: http://bit.ly/Poster_creation_help (copy and paste the link into your web browser). For assistance and to order your printed poster call PosterPresentations.com at 1.866.649.3004

Object Placeholders

Use the placeholders provided below to add new elements to your poster: Drag a placeholder onto the poster area, size it, and click it to edit. Section Header placeholder Move this preformatted section header placeholder to the poster area to add another section header. Use section headers to separate topics or concepts within your presentation. Text placeholder Move this preformatted text placeholder to the poster to add a new body of text. Picture placeholder Move this graphic placeholder onto your poster, size it first, and then click it to add a picture to the poster.

RESEARCH POSTER PRESENTATION DESIGN © 2012

www.PosterPresentations.com

©  2012  PosterPresenta.ons.com          2117  Fourth  Street  ,  Unit  C          Berkeley  CA  94710          [email protected]  

Student discounts are available on our Facebook page. Go to PosterPresentations.com and click on the FB icon.

We are developing a taxonomic analysis pipeline for multivariate analysis of microbial community structure in an environmental context. Microbial diversity is measured by sequencing homologous genes, typically the 16S rRNA, through the next-generation sequencing platforms. We extract the abundances of the observed taxa by classifying the sequences and then investigate the correlations between diversity patterns and environmental parameters.

Abstract  

R  Scripts  

Some  results  

generateCCAAdonis.R

1University  of  Glasgow,  2University  of  Amsterdam  

Umer  Zeeshan  Ijaz1,  Rob  Van  Son2,  and  Christopher  Quince1    

Microbial  16S  rRNA  Taxonomic  Analysis  Pipeline  

ninput                SPE.csv                ENV.csv  noutput                <results_generated_here>  nscripts                _generateColors_.R                generateCLUSPlot.R                generateNMDSPlot.R                generateBivariatePlot.R                generateDissimilarityPlot.R                generateOrdistep.R                generateCCAAdonis.R                generateDiversityIndices.R                generateReadsBarPlot.R                generateCCABIOENV.R                generateEnvHeapMaps.R                generateRichnessPlot.R                generateCCAPlot.R                generateNMDSEnvPlot.R                transposeCSV.sh  

Directory  Structure  

Code:  http://userweb.eng.gla.ac.uk/umer.ijaz/Taxonomic_Scripts.tar.gz!User  Manual:  http://userweb.eng.gla.ac.uk/umer.ijaz/taxonomic_scripts_manual.pdf!

This  script  uses  analysis  of  variance  using  distance  matrices  to  find  the  best  set  of  environmental  parameters  that  describe  the  community  structure.  We  have  used  adonis()  func.on  from  the  vegan  library  which  fits  linear  models  to  distance  matrices  and  uses  a  permuta.on  test  with  pseudo  F-­‐ra.os.  It  also  draws  a  CCA  plot  with  only  those  environmental  variables  that  are  below  a  cut  off  P-­‐value.  Addi.onally,  most  abundant  taxa  are  drawn  on  top  of  the  CCA  plot.    

generateCCABIOENV.R This  script  is  an  extension  of  vegan  library's  bioenv()  func.on  and  finds  the  best  set  of  environmental  parameters  with  maximum  (rank)  correla.on  with  the  community  dissimilari.es  and  plots  them  on  CCA.  It  also  finds  the  best  subset  of  species  and  along  with  environmental  parameters,  plot  them  on  on  NMDS  plots.  

generateNMDSEnvPlot.R This  script  generates  the  NMDS  plot  with  environmental  parameters  drawn  on  top  as  contours.  

_generateColors_.R This  is  a  general  purpose  parser  for  coloring  sites.  If  the  samplenames  have  underscores  in  them,  then  the  names  are  separated  on  these  underscores  and  the  colors  are  then  assigned  automa.cally  based  on  the  uniqueness  of  string  literals  in  a  par.cular  column  before  and  ager  the  underscore.  For  example,  given  the  following  sample  names              NW_1_1              NW_1_2              NW_2_1  if  one  chooses  colorColumn<-­‐2    in  the  parameter  sec.ons  of  the  respec.ve  scripts,  then  the  color  indices  will  be  (1,1,2),  if  one  choose  colorColumn<-­‐3  then  the  color  indices  will  be  (1,2,1).  

generateReadsBarPlot.R This  script  generates  a  bar  plot  of  reads  for  each  sample  in  the  species  abundance  file.  

generateRichnessPlot.R This  script  generates  mul.ple  subplots  for  all  the  environmental  parameters  against  species  richness  in  a  single  plot.  The  species  richness  is  rarefied  to  the  minimum   sample   numbers   and   a   correla.on   test   is   performed  between   the   rarefied   richness   and   the   environmental   parameters.   The   resul.ng  correla.on  and  their   significance   is  drawn  on  top  of  each  subplot.  Currently,   it  has  support   for   three  correla.on  measures:  Pearson;  Spearman;  and  Kendall.  Furthermore  this  script  also  generates  a    *_RICHNESS_LOG.txt  file  that  contains  the  summary  stats  of  regression  of  rarefied  richness  against  environmental   parameters.   The   last   column   contains   the   P-­‐values   and   if   significant,   it   indicates   that   the   richness   is   affected   by   this   par.cular  environmental  parameter  

generateBivariatePlot.R This   script   generates  bivariate  plots  with  histograms  on   the  diagonals,   scaler  plots  with   smooth   curves  below   the  diagonals   and   correla.ons  with  significance  levels  above  diagonals.  Moreover,  the  variables  are  reordered  in  the  plots  with  any  two  consecu.ve  variables  on  the  diagonal  being  most  similar.  

generateDissimilarityPlot.R

This  script  generates  plots  of  a  given  dissimilarity  measure  between  samples.  Magenta  is  high  similarity,  and  cyan  is  high  dissimilarity.  In  the  current  version  of  the  program,  you  can  use  the  following  dissimilarity  measures:  Bray-­‐Cur.s  dissimilarity  matrix  on  raw  species  data;  Bray-­‐Cur.s  dissimilarity  matrix   on   log-­‐transformed   abundances;   chord   distance  matrix;  Hellinger   distance  matrix;   and  Chi-­‐square   pre-­‐transforma.on   followed  by   Euclidean  distance.  

generateNMDSPlot.R

This  script  generates   the  non-­‐metric  distance  scaling   (NMDS)  plot   for   the  species  abundance  file.   It  finds  a  non-­‐parameteric  monotonic   rela.onship  between  the  dissimilari.es  in  the  samples  matrix,  and  the  loca.on  of  each  item  in  a  low-­‐dimensional  space.    

generateCCAPlot.R This  script  performs  canonical  correspondence  analysis  (CCA)  to  find  the  rela.onship  between  species  and  their  environment.  The  method  extracts  environmental  gradients  and  then  use  them  for  describing  and  visualizing  the  preference  of  taxa/sample  on  an  ordina.on  diagram.  

generateCLUSPlot.R This  script  generates  the  hierarchical  clustering  plot  by  using  Bray-­‐Cur.s  as  a  dissimilarity  index  between  samples.  

generateDiversityIndices.R This  script  generates  the  ecological  diversity  indices  and  rarefac.on  species  richness.  The  following  indices  are  supported:  Shanon  index;  Simpson  index;  inverse  Simpson’s   index;  Fisher’s   logarithmic  series’  alpha  parameters;  and  Pielou’s  evenness.  Furthermore,   it  generates  a  csv  file  *_div.csv  for  these  indices.  

generateEnvHeapMaps.R This  script  generates  the  heap  maps  for  the  species  abundance  file  with  OTU/Taxa  names  on  the  x-­‐axis  and  samples  on  the  y-­‐axis.  Furthermore,  each  generated  image  is  ordered  by  an  environmental  parameter  (increases  in  value  if  you  go  down).  As  you  move  from  leg  to  right  the  abundance  of  taxa  decreases.  The  script  automa.cally  splits  the  images  on  100  most  abundant  taxa.  

generateOrdiStep.R This  script  is  useful  for  iden.fying  the  environmental  parameters  that  describe  the  community  composi.on.  It    produces  three  text  files:  -­‐*_STEP_automa.c_permua.on_LOG.txt:   Automa.c  model   building  based  on  Akaike   informa.on   criteria   but   based  on  permuta.on   test   using   step  func.on  -­‐*_ORDISTEP_automa.c_pvalue_LOG.txt:  Automa.c  model  building  based  on  Akaike  informa.on  criteria  but  based  on  permuta.on  of  P-­‐values  -­‐*_ORDISTEP_manual_LOG.txt:  Manual  modeling  

Figure   1:   An   example   of   species  abundance  file   for  mul.ple   samples   that  is   generated   by   denoising   16S   rRNA  sequences   using   AmpliconNoise   (Quince  et   al.   2011)   and   classified   using   RDP  classifier  

Figure  2:  An  example  of  environmental  parameters  for  mul.ple  samples  

Figure  3:  Richness  plot  Figure  4:  Dissimilarity  plot  

Figure  5:  Bivariate  plot  Figure  6:  Hierarchical  clustering  plot  

Figure  7:  Heap  map  

Figure  8:  Diversity  indices  

Figure  9:  NMDS  plot  with  best  subset  of  taxa  and  environmental  parameters  

Figure  10:  CCA  plot  with  all  environmental  parameters  

Figure  11:  NMDS  plot  with  environmental  parameter  contours  

Input  to  the  pipeline  

Further  Developments  The  final  aim  of  this  work  is  to  provide  a  web-­‐based  front-­‐end  by    pre-­‐packaging  the  scripts  with  a  perl-­‐CGI  servlet  that  has  the  ability  to  run  the  scripts  in  the  background  and  to  display  the  results  in  a  web  brower.    A  preliminary  version  of  the  pipeline  is  hosted  at  hlp://quince-­‐srv1.eng.gla.ac.uk:8080  with  the  interface  as  follows:    

[email protected]  

Acknowledgments  This  work  is  supported  by  a  Technology  Strategy  Board  (TSB)  and  Unilever  funded  research  grant  “Development  of  instrumental  and  bioinforma.c  pipelines  to  accelerate  commercial  applica.ons  of  metagenomics  approaches”.  

Figure  12:  Microbial  Taxonomic  Analysis  Pipeline  v0.2