Upload
others
View
26
Download
0
Embed Size (px)
Citation preview
Workshop11:MetagenomicsAnalysis
Shi,BaochenDepartmentofPharmacology,UCLA
Flowchart
(c)
Flowchart
1. SFF(raw454data,op1onal)2. fasta/qualfiles3. demul1plexing/qualityfiltering
4. OTUpicking5. representa1vesequences6. taxonomicassignments/treebuilding
7. OTUtableanddownstreamprocessing
(b)Sequencedataprepara1on
(c)Opera1onalTaxonomicUnits(OTU)picking,Taxonomicassignment&inferringphylogeny (d)microbiomediversityanalyses
c)OTUanalysis Thisworkflowconsistsofthefollowingsteps:
OTUpicking,Taxonomicassignment
c1)PickOTUsbasedonsequencesimilaritywithinthereads(pick_otus.py)c2)Pickarepresenta1vesequenceforeachOTU(pick_rep_set.py)c3)AssigntaxonomytoOTUrepresenta1vesequences(assign_taxonomy.py)Inferringphylogenyc4)AlignOTUrepresenta1vesequences(align_seqs.py)c5)Filterthealignment(filter_alignment.py)c6)Buildaphylogene1ctree(make_phylogeny.py)SummarizetheOTUtablec7)MaketheOTUtable(make_otu_table.py)
c1)PickOTUsbasedonsequencesimilaritywithinthereadsTheOTUpickingstepassignssimilarsequencestoopera1onaltaxonomicunits(OTUs)byclusteringsequencesbasedonauser-definedsimilaritythreshold.Sequenceswhicharesimilaratorabovethethresholdlevelaretakentorepresentthepresenceofataxonomicunitinthesequencecollec1on.
DenovoOTUpickingbasedonsequencesimilaritywithinthereads(pick_otus.py)
-s97%sequencesimilarity&–zenablereversestrandmatching
e.g.,agenus,whenthesimilaritythresholdissetat0.94;oraspecies,whenthesimilaritythresholdissetat0.97
Theoutputfile:
pick_otus.py -i split_library_output/seqs.fna -o picked_otus_97_percent_rev/ -s 0.97 -z
c)OTUanalysis
DenovoOTUpickingvsClosed-referenceOTUpicking
OTUpickingstrategiesinQIIME
DenovoOTUpickingYoumustusedenovoOTUpickingif:Youdonothaveareferencesequencecollec1ontoclusteragainst,forexamplebecauseyou’reworkingwithaninfrequentlyusedmarkergene.YoucannotusedenovoOTUpickingif:1)comparingnon-overlappingamplicons,suchasV2andV4regionsof16SrRNA.2)workingwithverylargedatasets,HiSeq(Technically,youcan,butmightwaitamonth)Pros:Noreferenceneeded.AllreadsareclusteredCons:Speed.Doesnotruninparallel
OTUpickingstrategiesinQIIME
Closed-referenceOTUpickingreadsareclusteredagainstareferencesequencecollec1onandanyreadswhichdonothitasequenceinthereferencesequencecollec1onareexcludedfromdownstreamanalyses.Youmustuseclosed-referenceOTUpickingif:comparingnon-overlappingamplicons,suchasV2andV4regionsofthe16SrRNA.Yourreferencesequencesmustspanbothoftheregionsbeingsequenced.Youcannotuseclosed-referenceOTUpickingif:donothaveareferencesequencecollec1on.Pros:Speed,usefulforextremelylargedatasets.Begertreesandtaxonomy.OTUsaredefinedinyourreference,youmayhaveatreeandataxonomythatyoutrust.Cons:Inabilitytodetectnoveldiversitywithrespecttoyourreferencesequencecollec1on.
OTUpickingstrategiesinQIIME
Closed-referenceOTUpickingIftheuserprovidestaxonomicassignmentsforsequencesinthereferencedatabase,thoseareassignedtoOTUs.-s,--assign_taxonomyAssigntaxonomytoeachsequence-a,--parallelRuninparallelwhereavailable
OTUpickingstrategiesinQIIME
pick_closed_reference_otus.py –i seqs.fna –r ref/refseqs.fna –o dir-out/ -t ref/taxa.txt
Open-referenceOTUpickingreadsareclusteredagainstareferencesequencecollec1onandanyreadswhichdonothitthereferencesequencecollec1onaresubsequentlyclustereddenovo.Youcannotuseopen-referenceOTUpickingif:comparingnon-overlappingamplicons.ordonothaveareferencesequencecollec1ontoclusteragainst.Pros:Allreadsareclustered.Speed.FasterthandenovoOUTpickingCons:Speed.Somestepsofthisworkflowdos1llrunserially.Fordatasetswithalotofnoveldiversitywithrespecttothereferencesequencecollec1on,thiscans1lltakedaystorun.
OTUpickingstrategiesinQIIME
Open-referenceOTUpickingpreferredstrategyforOTUpicking:-a,--parallelRuninparallelwhereavailable
OTUpickingstrategiesinQIIME
pick_open_reference_otus.py –i seqs.fna –r refseqs.fna –o dir/
Thisworkflowconsistsofthefollowingsteps:
c1)PickOTUsbasedonsequencesimilaritywithinthereads(pick_otus.py)c2)Pickarepresenta1vesequenceforeachOTU(pick_rep_set.py)AherpickingOTUs,youcanpickarepresenta1vesetofsequences.ForeachOTU,youwillendupwithonesequencethatcanbeusedinsubsequentanalyses. pick_rep_set.py-ipicked_otus_97_percent_rev/seqs_otus.txt-fsplit_library_output/seqs.fna-orep_set.fna
c)OTUanalysis
Thisworkflowconsistsofthefollowingsteps:
c1)PickOTUsbasedonsequencesimilaritywithinthereads(pick_otus.py)c2)Pickarepresenta1vesequenceforeachOTU(pick_rep_set.py)c3)AssigntaxonomytoOTUrepresenta1vesequences(assign_taxonomy.py)AssignmentwiththeRDPClassifier:TheRDPClassifierassignstaxonomiesusingNaiveBayesclassifica1on.
-c,--confidenceMinimumconfidencetorecordanassignmentAssignmentwiththeconsensustaxonomyassigner(default,uclust)Referenceandid-to-taxonomyfor16SrRNAsequencesfromGreengenes(gg_13_5.fasta&gg_13_5_taxonomy.txt)(hgp://greengenes.secondgenome.com/downloads/database/13_5)
Theoutput:sequenceid(1stcolumn),taxonomy(2ndcolumn)andqualityscore(3rdcolumn)rdp_assigned_taxonomy/rdp_assigned_taxonomy/rep_set_tax_assignments.txt
assign_taxonomy.py -i rep_set.fna -r gg_13_5.fasta-t gg_13_5_taxonomy.txt
assign_taxonomy.py -i rep_set.fna -m rdp -c 0.80
denovo367 k__Bacteria; p__Bacteroidetes; c__Bacteroidia; o__Bacteroidales; f__S24-7; g__; s__ 1.00
c)OTUanalysis
Thisworkflowconsistsofthefollowingsteps:
c1)PickOTUsbasedonsequencesimilaritywithinthereads(pick_otus.py)c2)Pickarepresenta1vesequenceforeachOTU(pick_rep_set.py)c3)AssigntaxonomytoOTUrepresenta1vesequences(assign_taxonomy.py)AlignmentoftheOTUrepresenta1vesequencesandphylogenyinferenceisnecessaryifphylogene1cmetricssuchasUniFracwillbeusedinmicrobiomediversityanalyses.c4)AlignOTUrepresenta1vesequences(align_seqs.py)c5)Filterthealignment(filter_alignment.py)c6)Buildaphylogene1ctree(make_phylogeny.py)c7)MaketheOTUtable(make_otu_table.py)
c)OTUanalysis
Thisworkflowconsistsofthefollowingsteps:
c1)PickOTUsbasedonsequencesimilaritywithinthereads(pick_otus.py)c2)Pickarepresenta1vesequenceforeachOTU(pick_rep_set.py)c3)AssigntaxonomytoOTUrepresenta1vesequences(assign_taxonomy.py)c4)AlignOTUrepresenta1vesequences(align_seqs.py)PyNAST-Thedefaultalignment,implementa1onoftheNASTalignmentalgorithm.TheNASTalignseachcandidatesequencetothebest-matchingsequenceinapre-aligneddatabaseofsequences(“template”sequence).Candidatesequencesarenotpermigedtointroducenewgapintotemplatedatabase,thealgorithmintroduceslocalmis-alignmentstopreservetheexis1ngtemplatesequence.Theoutput:pynast_aligned/rep_set_aligned.fasta
align_seqs.py-irep_set.fna-t/u/local/apps/qiime_data/core_set_aligned.fasta.imputed
c)OTUanalysis
Forma1onofchimericsequencesduringPCR.AnabortedextensionproductfromanearliercycleofPCRcanfunc1onasaprimerinasubsequentPCRcycle.IfthisabortedextensionproductannealstoandprimesDNAsynthesisfromanimpropertemplate,achimericmoleculeisformed.
ChimeracheckingsequenceswithQIIME
ChimeracheckingsequenceswithQIIME
JumpstartConsor1umHumanMicrobiomeProjectDataGenera1onWorkingGroup.PLoSONE(2012)
ApplyingChimeraSlayerinhoffman2:/u/local/apps/microbiomeu1l/Itwouldtake15minutesremovechimericsequences(1stcolumn)fromyouralignmentusingyourchimericsequencelist
iden1fy_chimeric_seqs.py-mChimeraSlayer-ipynast_aligned/rep_set_aligned.fasta–a/u/local/apps/microbiomeu1l/2010-04-29/RESOURCES/rRNA16S.gold.NAST_ALIGNED.fasta-ochimeric_seqs.txt
ChimeracheckingsequenceswithQIIME
filter_fasta.py-fpynast_aligned/rep_set_aligned.fasta-onon_chimeric_rep_set_aligned.fasta-schimeric_seqs.txt-n
Thisworkflowconsistsofthefollowingsteps:
c1)PickOTUsbasedonsequencesimilaritywithinthereads(pick_otus.py)c2)Pickarepresenta1vesequenceforeachOTU(pick_rep_set.py)c3)AssigntaxonomytoOTUrepresenta1vesequences(assign_taxonomy.py)c4)AlignOTUrepresenta1vesequences(align_seqs.py)c5)Filterthealignment(filter_alignment.py)Thisscriptwillremoveposi1onswhicharegapsineverysequence(notcoveredbyamplicon).Addi1onally,thiswillremovenon-conservedposi1ons,whichareuninforma1vefortreebuilding.
filter_alignment.py -i non_chimeric_rep_set_aligned.fasta-o filtered_alignment/
c)OTUanalysis
Thisworkflowconsistsofthefollowingsteps:
c1)PickOTUsbasedonsequencesimilaritywithinthereads(pick_otus.py)c2)Pickarepresenta1vesequenceforeachOTU(pick_rep_set.py)c3)AssigntaxonomytoOTUrepresenta1vesequences(assign_taxonomy.py)c4)AlignOTUrepresenta1vesequences(align_seqs.py)c5)Filterthealignment(filter_alignment.py)c6)Buildaphylogene1ctree(make_phylogeny.py)
make_phylogeny.py -i filtered_alignment/non_chimeric_rep_set_aligned_pfiltered.fasta -o rep_phylo.tre
c)OTUanalysis
Thisworkflowconsistsofthefollowingsteps:
c1)PickOTUsbasedonsequencesimilaritywithinthereads(pick_otus.py)c2)Pickarepresenta1vesequenceforeachOTU(pick_rep_set.py)c3)AssigntaxonomytoOTUrepresenta1vesequences(assign_taxonomy.py)AlignmentoftheOTUrepresenta1vesequencesandphylogenyinferenceisnecessaryifphylogene1cmetricssuchasUniFracwillbeusedinmicrobiomediversityanalyses.c4)AlignOTUrepresenta1vesequences(align_seqs.py)c5)Filterthealignment(filter_alignment.py)c6)Buildaphylogene1ctree(make_phylogeny.py)SummarizetheOTUtablec7)MaketheOTUtable(make_otu_table.py)
c)OTUanalysis
Thisworkflowconsistsofthefollowingsteps:
c1)PickOTUsbasedonsequencesimilaritywithinthereads(pick_otus.py)c2)Pickarepresenta1vesequenceforeachOTU(pick_rep_set.py)c3)AssigntaxonomytoOTUrepresenta1vesequences(assign_taxonomy.py)c4)AlignOTUrepresenta1vesequences(align_seqs.py)c5)Filterthealignment(filter_alignment.py)c6)Buildaphylogene1ctree(make_phylogeny.py)c7)MaketheOTUtable(make_otu_table.py)Thescripttabulatesthenumberof1mesanOTUisfoundineachsample,andaddsthetaxonomicpredic1onsforeachOTU-eremovechimericOTU
make_otu_table.py-ipicked_otus_97_percent_rev/seqs_otus.txt-trdp_assigned_taxonomy/rep_set_tax_assignments.txt-ootu_table.biom-echimeric_seqs.txt
c)OTUanalysis
SummarizetheOTUtableConverttablefromBIOMtotab-separatedtextformat
biomsummarize-table-iotu_table.biom-otable_summary.txt
Num samples: 9 Total count: 1337 Counts/sample summary: Min: 146.0 Max: 150.0 Median: 149.000 Mean: 148.556 Std. dev.: 1.257 Counts/sample detail: PC.481: 146.0 PC.355: 147.0 PC.636: 148.0 ………………..
c)OTUanalysis
biom convert -i otu_table.biom -o otu_table.txt -b
Summarizecommuni1esbytaxonomiccomposi1onsummarize_taxa_through_plots.py-iotu_table.biom-otaxa_summary-mFas1ng_Map.txt
#OTU ID PC.636 PC.635 PC.356 PC.481 PC.354 PC.593 PC.355 PC.607 PC.634 k__Bacteria;Other;Other 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00671140939597 0.0 k__Bacteria;p__Actinobacteria;c__Coriobacteriia 0.00675675675676 0.0 0.0 0.00684931506849 0.0 0.0 0.0 0.0134228187919 0.0133333333333 k__Bacteria;p__Bacteroidetes;c__Bacteroidia 0.675675675676 0.530201342282 0.2 0.143835616438 0.0805369
therela1veabundancesoftaxa(atthedifferentlevel)withineachsampleL2toL6:Phylum,Class,Order,Family,Genus
c)OTUanalysis
Summarizecommuni@esbytaxonomiccomposi@on
Makeataxonomyheatmapmake_otu_heatmap_html.py-iotu_table.biom-ootu_table_heatmap.pdf
c)OTUanalysis
MakeanOTUnetworkmake_otu_network.py-mFas1ng_Map.txt-iotu_table.biom-onetwork
redcirclerepresentsasampleandwhitesquarerepresentsanOTU.ThelinesrepresenttheOTUspresentinapar1cularsample(Cytoscape)
c)OTUanalysis
c)OTUanalysis Thisworkflowconsistsofthefollowingsteps:
OTUpicking,Taxonomicassignment
c1)PickOTUsbasedonsequencesimilaritywithinthereads(pick_otus.py)c2)Pickarepresenta1vesequenceforeachOTU(pick_rep_set.py)c3)AssigntaxonomytoOTUrepresenta1vesequences(assign_taxonomy.py)Inferringphylogenyc4)AlignOTUrepresenta1vesequences(align_seqs.py)c5)Filterthealignment(filter_alignment.py)c6)Buildaphylogene1ctree(make_phylogeny.py)SummarizetheOTUtablec7)MaketheOTUtable(make_otu_table.py)
Flowchart
(c)