29
Workshop 11: Metagenomics Analysis Shi, Baochen Department of Pharmacology, UCLA

Workshop 11: Metagenomics Analysis · 2017-12-05 · c2) Pick a representave sequence for each OTU (pick_rep_set.py) c3) Assign taxonomy to OTU representave sequences (assign_taxonomy.py)

  • Upload
    others

  • View
    26

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Workshop 11: Metagenomics Analysis · 2017-12-05 · c2) Pick a representave sequence for each OTU (pick_rep_set.py) c3) Assign taxonomy to OTU representave sequences (assign_taxonomy.py)

Workshop11:MetagenomicsAnalysis

Shi,BaochenDepartmentofPharmacology,UCLA

Page 2: Workshop 11: Metagenomics Analysis · 2017-12-05 · c2) Pick a representave sequence for each OTU (pick_rep_set.py) c3) Assign taxonomy to OTU representave sequences (assign_taxonomy.py)

Flowchart

(c)

Page 3: Workshop 11: Metagenomics Analysis · 2017-12-05 · c2) Pick a representave sequence for each OTU (pick_rep_set.py) c3) Assign taxonomy to OTU representave sequences (assign_taxonomy.py)

Flowchart

1.  SFF(raw454data,op1onal)2.  fasta/qualfiles3.  demul1plexing/qualityfiltering

4.  OTUpicking5.  representa1vesequences6.  taxonomicassignments/treebuilding

7.  OTUtableanddownstreamprocessing

(b)Sequencedataprepara1on

(c)Opera1onalTaxonomicUnits(OTU)picking,Taxonomicassignment&inferringphylogeny (d)microbiomediversityanalyses

Page 4: Workshop 11: Metagenomics Analysis · 2017-12-05 · c2) Pick a representave sequence for each OTU (pick_rep_set.py) c3) Assign taxonomy to OTU representave sequences (assign_taxonomy.py)

c)OTUanalysis Thisworkflowconsistsofthefollowingsteps:

OTUpicking,Taxonomicassignment

c1)PickOTUsbasedonsequencesimilaritywithinthereads(pick_otus.py)c2)Pickarepresenta1vesequenceforeachOTU(pick_rep_set.py)c3)AssigntaxonomytoOTUrepresenta1vesequences(assign_taxonomy.py)Inferringphylogenyc4)AlignOTUrepresenta1vesequences(align_seqs.py)c5)Filterthealignment(filter_alignment.py)c6)Buildaphylogene1ctree(make_phylogeny.py)SummarizetheOTUtablec7)MaketheOTUtable(make_otu_table.py)

Page 5: Workshop 11: Metagenomics Analysis · 2017-12-05 · c2) Pick a representave sequence for each OTU (pick_rep_set.py) c3) Assign taxonomy to OTU representave sequences (assign_taxonomy.py)

c1)PickOTUsbasedonsequencesimilaritywithinthereadsTheOTUpickingstepassignssimilarsequencestoopera1onaltaxonomicunits(OTUs)byclusteringsequencesbasedonauser-definedsimilaritythreshold.Sequenceswhicharesimilaratorabovethethresholdlevelaretakentorepresentthepresenceofataxonomicunitinthesequencecollec1on.

DenovoOTUpickingbasedonsequencesimilaritywithinthereads(pick_otus.py)

-s97%sequencesimilarity&–zenablereversestrandmatching

e.g.,agenus,whenthesimilaritythresholdissetat0.94;oraspecies,whenthesimilaritythresholdissetat0.97

Theoutputfile:

pick_otus.py -i split_library_output/seqs.fna -o picked_otus_97_percent_rev/ -s 0.97 -z

c)OTUanalysis

Page 6: Workshop 11: Metagenomics Analysis · 2017-12-05 · c2) Pick a representave sequence for each OTU (pick_rep_set.py) c3) Assign taxonomy to OTU representave sequences (assign_taxonomy.py)

DenovoOTUpickingvsClosed-referenceOTUpicking

OTUpickingstrategiesinQIIME

Page 7: Workshop 11: Metagenomics Analysis · 2017-12-05 · c2) Pick a representave sequence for each OTU (pick_rep_set.py) c3) Assign taxonomy to OTU representave sequences (assign_taxonomy.py)

DenovoOTUpickingYoumustusedenovoOTUpickingif:Youdonothaveareferencesequencecollec1ontoclusteragainst,forexamplebecauseyou’reworkingwithaninfrequentlyusedmarkergene.YoucannotusedenovoOTUpickingif:1)comparingnon-overlappingamplicons,suchasV2andV4regionsof16SrRNA.2)workingwithverylargedatasets,HiSeq(Technically,youcan,butmightwaitamonth)Pros:Noreferenceneeded.AllreadsareclusteredCons:Speed.Doesnotruninparallel

OTUpickingstrategiesinQIIME

Page 8: Workshop 11: Metagenomics Analysis · 2017-12-05 · c2) Pick a representave sequence for each OTU (pick_rep_set.py) c3) Assign taxonomy to OTU representave sequences (assign_taxonomy.py)

Closed-referenceOTUpickingreadsareclusteredagainstareferencesequencecollec1onandanyreadswhichdonothitasequenceinthereferencesequencecollec1onareexcludedfromdownstreamanalyses.Youmustuseclosed-referenceOTUpickingif:comparingnon-overlappingamplicons,suchasV2andV4regionsofthe16SrRNA.Yourreferencesequencesmustspanbothoftheregionsbeingsequenced.Youcannotuseclosed-referenceOTUpickingif:donothaveareferencesequencecollec1on.Pros:Speed,usefulforextremelylargedatasets.Begertreesandtaxonomy.OTUsaredefinedinyourreference,youmayhaveatreeandataxonomythatyoutrust.Cons:Inabilitytodetectnoveldiversitywithrespecttoyourreferencesequencecollec1on.

OTUpickingstrategiesinQIIME

Page 9: Workshop 11: Metagenomics Analysis · 2017-12-05 · c2) Pick a representave sequence for each OTU (pick_rep_set.py) c3) Assign taxonomy to OTU representave sequences (assign_taxonomy.py)

Closed-referenceOTUpickingIftheuserprovidestaxonomicassignmentsforsequencesinthereferencedatabase,thoseareassignedtoOTUs.-s,--assign_taxonomyAssigntaxonomytoeachsequence-a,--parallelRuninparallelwhereavailable

OTUpickingstrategiesinQIIME

pick_closed_reference_otus.py –i seqs.fna –r ref/refseqs.fna –o dir-out/ -t ref/taxa.txt

Page 10: Workshop 11: Metagenomics Analysis · 2017-12-05 · c2) Pick a representave sequence for each OTU (pick_rep_set.py) c3) Assign taxonomy to OTU representave sequences (assign_taxonomy.py)

Open-referenceOTUpickingreadsareclusteredagainstareferencesequencecollec1onandanyreadswhichdonothitthereferencesequencecollec1onaresubsequentlyclustereddenovo.Youcannotuseopen-referenceOTUpickingif:comparingnon-overlappingamplicons.ordonothaveareferencesequencecollec1ontoclusteragainst.Pros:Allreadsareclustered.Speed.FasterthandenovoOUTpickingCons:Speed.Somestepsofthisworkflowdos1llrunserially.Fordatasetswithalotofnoveldiversitywithrespecttothereferencesequencecollec1on,thiscans1lltakedaystorun.

OTUpickingstrategiesinQIIME

Page 11: Workshop 11: Metagenomics Analysis · 2017-12-05 · c2) Pick a representave sequence for each OTU (pick_rep_set.py) c3) Assign taxonomy to OTU representave sequences (assign_taxonomy.py)

Open-referenceOTUpickingpreferredstrategyforOTUpicking:-a,--parallelRuninparallelwhereavailable

OTUpickingstrategiesinQIIME

pick_open_reference_otus.py –i seqs.fna –r refseqs.fna –o dir/

Page 12: Workshop 11: Metagenomics Analysis · 2017-12-05 · c2) Pick a representave sequence for each OTU (pick_rep_set.py) c3) Assign taxonomy to OTU representave sequences (assign_taxonomy.py)

Thisworkflowconsistsofthefollowingsteps:

c1)PickOTUsbasedonsequencesimilaritywithinthereads(pick_otus.py)c2)Pickarepresenta1vesequenceforeachOTU(pick_rep_set.py)AherpickingOTUs,youcanpickarepresenta1vesetofsequences.ForeachOTU,youwillendupwithonesequencethatcanbeusedinsubsequentanalyses. pick_rep_set.py-ipicked_otus_97_percent_rev/seqs_otus.txt-fsplit_library_output/seqs.fna-orep_set.fna

c)OTUanalysis

Page 13: Workshop 11: Metagenomics Analysis · 2017-12-05 · c2) Pick a representave sequence for each OTU (pick_rep_set.py) c3) Assign taxonomy to OTU representave sequences (assign_taxonomy.py)

Thisworkflowconsistsofthefollowingsteps:

c1)PickOTUsbasedonsequencesimilaritywithinthereads(pick_otus.py)c2)Pickarepresenta1vesequenceforeachOTU(pick_rep_set.py)c3)AssigntaxonomytoOTUrepresenta1vesequences(assign_taxonomy.py)AssignmentwiththeRDPClassifier:TheRDPClassifierassignstaxonomiesusingNaiveBayesclassifica1on.

-c,--confidenceMinimumconfidencetorecordanassignmentAssignmentwiththeconsensustaxonomyassigner(default,uclust)Referenceandid-to-taxonomyfor16SrRNAsequencesfromGreengenes(gg_13_5.fasta&gg_13_5_taxonomy.txt)(hgp://greengenes.secondgenome.com/downloads/database/13_5)

Theoutput:sequenceid(1stcolumn),taxonomy(2ndcolumn)andqualityscore(3rdcolumn)rdp_assigned_taxonomy/rdp_assigned_taxonomy/rep_set_tax_assignments.txt

assign_taxonomy.py -i rep_set.fna -r gg_13_5.fasta-t gg_13_5_taxonomy.txt

assign_taxonomy.py -i rep_set.fna -m rdp -c 0.80

denovo367 k__Bacteria; p__Bacteroidetes; c__Bacteroidia; o__Bacteroidales; f__S24-7; g__; s__ 1.00

c)OTUanalysis

Page 14: Workshop 11: Metagenomics Analysis · 2017-12-05 · c2) Pick a representave sequence for each OTU (pick_rep_set.py) c3) Assign taxonomy to OTU representave sequences (assign_taxonomy.py)

Thisworkflowconsistsofthefollowingsteps:

c1)PickOTUsbasedonsequencesimilaritywithinthereads(pick_otus.py)c2)Pickarepresenta1vesequenceforeachOTU(pick_rep_set.py)c3)AssigntaxonomytoOTUrepresenta1vesequences(assign_taxonomy.py)AlignmentoftheOTUrepresenta1vesequencesandphylogenyinferenceisnecessaryifphylogene1cmetricssuchasUniFracwillbeusedinmicrobiomediversityanalyses.c4)AlignOTUrepresenta1vesequences(align_seqs.py)c5)Filterthealignment(filter_alignment.py)c6)Buildaphylogene1ctree(make_phylogeny.py)c7)MaketheOTUtable(make_otu_table.py)

c)OTUanalysis

Page 15: Workshop 11: Metagenomics Analysis · 2017-12-05 · c2) Pick a representave sequence for each OTU (pick_rep_set.py) c3) Assign taxonomy to OTU representave sequences (assign_taxonomy.py)

Thisworkflowconsistsofthefollowingsteps:

c1)PickOTUsbasedonsequencesimilaritywithinthereads(pick_otus.py)c2)Pickarepresenta1vesequenceforeachOTU(pick_rep_set.py)c3)AssigntaxonomytoOTUrepresenta1vesequences(assign_taxonomy.py)c4)AlignOTUrepresenta1vesequences(align_seqs.py)PyNAST-Thedefaultalignment,implementa1onoftheNASTalignmentalgorithm.TheNASTalignseachcandidatesequencetothebest-matchingsequenceinapre-aligneddatabaseofsequences(“template”sequence).Candidatesequencesarenotpermigedtointroducenewgapintotemplatedatabase,thealgorithmintroduceslocalmis-alignmentstopreservetheexis1ngtemplatesequence.Theoutput:pynast_aligned/rep_set_aligned.fasta

align_seqs.py-irep_set.fna-t/u/local/apps/qiime_data/core_set_aligned.fasta.imputed

c)OTUanalysis

Page 16: Workshop 11: Metagenomics Analysis · 2017-12-05 · c2) Pick a representave sequence for each OTU (pick_rep_set.py) c3) Assign taxonomy to OTU representave sequences (assign_taxonomy.py)

Forma1onofchimericsequencesduringPCR.AnabortedextensionproductfromanearliercycleofPCRcanfunc1onasaprimerinasubsequentPCRcycle.IfthisabortedextensionproductannealstoandprimesDNAsynthesisfromanimpropertemplate,achimericmoleculeisformed.

ChimeracheckingsequenceswithQIIME

Page 17: Workshop 11: Metagenomics Analysis · 2017-12-05 · c2) Pick a representave sequence for each OTU (pick_rep_set.py) c3) Assign taxonomy to OTU representave sequences (assign_taxonomy.py)

ChimeracheckingsequenceswithQIIME

JumpstartConsor1umHumanMicrobiomeProjectDataGenera1onWorkingGroup.PLoSONE(2012)

Page 18: Workshop 11: Metagenomics Analysis · 2017-12-05 · c2) Pick a representave sequence for each OTU (pick_rep_set.py) c3) Assign taxonomy to OTU representave sequences (assign_taxonomy.py)

ApplyingChimeraSlayerinhoffman2:/u/local/apps/microbiomeu1l/Itwouldtake15minutesremovechimericsequences(1stcolumn)fromyouralignmentusingyourchimericsequencelist

iden1fy_chimeric_seqs.py-mChimeraSlayer-ipynast_aligned/rep_set_aligned.fasta–a/u/local/apps/microbiomeu1l/2010-04-29/RESOURCES/rRNA16S.gold.NAST_ALIGNED.fasta-ochimeric_seqs.txt

ChimeracheckingsequenceswithQIIME

filter_fasta.py-fpynast_aligned/rep_set_aligned.fasta-onon_chimeric_rep_set_aligned.fasta-schimeric_seqs.txt-n

Page 19: Workshop 11: Metagenomics Analysis · 2017-12-05 · c2) Pick a representave sequence for each OTU (pick_rep_set.py) c3) Assign taxonomy to OTU representave sequences (assign_taxonomy.py)

Thisworkflowconsistsofthefollowingsteps:

c1)PickOTUsbasedonsequencesimilaritywithinthereads(pick_otus.py)c2)Pickarepresenta1vesequenceforeachOTU(pick_rep_set.py)c3)AssigntaxonomytoOTUrepresenta1vesequences(assign_taxonomy.py)c4)AlignOTUrepresenta1vesequences(align_seqs.py)c5)Filterthealignment(filter_alignment.py)Thisscriptwillremoveposi1onswhicharegapsineverysequence(notcoveredbyamplicon).Addi1onally,thiswillremovenon-conservedposi1ons,whichareuninforma1vefortreebuilding.

filter_alignment.py -i non_chimeric_rep_set_aligned.fasta-o filtered_alignment/

c)OTUanalysis

Page 20: Workshop 11: Metagenomics Analysis · 2017-12-05 · c2) Pick a representave sequence for each OTU (pick_rep_set.py) c3) Assign taxonomy to OTU representave sequences (assign_taxonomy.py)

Thisworkflowconsistsofthefollowingsteps:

c1)PickOTUsbasedonsequencesimilaritywithinthereads(pick_otus.py)c2)Pickarepresenta1vesequenceforeachOTU(pick_rep_set.py)c3)AssigntaxonomytoOTUrepresenta1vesequences(assign_taxonomy.py)c4)AlignOTUrepresenta1vesequences(align_seqs.py)c5)Filterthealignment(filter_alignment.py)c6)Buildaphylogene1ctree(make_phylogeny.py)

make_phylogeny.py -i filtered_alignment/non_chimeric_rep_set_aligned_pfiltered.fasta -o rep_phylo.tre

c)OTUanalysis

Page 21: Workshop 11: Metagenomics Analysis · 2017-12-05 · c2) Pick a representave sequence for each OTU (pick_rep_set.py) c3) Assign taxonomy to OTU representave sequences (assign_taxonomy.py)

Thisworkflowconsistsofthefollowingsteps:

c1)PickOTUsbasedonsequencesimilaritywithinthereads(pick_otus.py)c2)Pickarepresenta1vesequenceforeachOTU(pick_rep_set.py)c3)AssigntaxonomytoOTUrepresenta1vesequences(assign_taxonomy.py)AlignmentoftheOTUrepresenta1vesequencesandphylogenyinferenceisnecessaryifphylogene1cmetricssuchasUniFracwillbeusedinmicrobiomediversityanalyses.c4)AlignOTUrepresenta1vesequences(align_seqs.py)c5)Filterthealignment(filter_alignment.py)c6)Buildaphylogene1ctree(make_phylogeny.py)SummarizetheOTUtablec7)MaketheOTUtable(make_otu_table.py)

c)OTUanalysis

Page 22: Workshop 11: Metagenomics Analysis · 2017-12-05 · c2) Pick a representave sequence for each OTU (pick_rep_set.py) c3) Assign taxonomy to OTU representave sequences (assign_taxonomy.py)

Thisworkflowconsistsofthefollowingsteps:

c1)PickOTUsbasedonsequencesimilaritywithinthereads(pick_otus.py)c2)Pickarepresenta1vesequenceforeachOTU(pick_rep_set.py)c3)AssigntaxonomytoOTUrepresenta1vesequences(assign_taxonomy.py)c4)AlignOTUrepresenta1vesequences(align_seqs.py)c5)Filterthealignment(filter_alignment.py)c6)Buildaphylogene1ctree(make_phylogeny.py)c7)MaketheOTUtable(make_otu_table.py)Thescripttabulatesthenumberof1mesanOTUisfoundineachsample,andaddsthetaxonomicpredic1onsforeachOTU-eremovechimericOTU

make_otu_table.py-ipicked_otus_97_percent_rev/seqs_otus.txt-trdp_assigned_taxonomy/rep_set_tax_assignments.txt-ootu_table.biom-echimeric_seqs.txt

c)OTUanalysis

Page 23: Workshop 11: Metagenomics Analysis · 2017-12-05 · c2) Pick a representave sequence for each OTU (pick_rep_set.py) c3) Assign taxonomy to OTU representave sequences (assign_taxonomy.py)

SummarizetheOTUtableConverttablefromBIOMtotab-separatedtextformat

biomsummarize-table-iotu_table.biom-otable_summary.txt

Num samples: 9 Total count: 1337 Counts/sample summary: Min: 146.0 Max: 150.0 Median: 149.000 Mean: 148.556 Std. dev.: 1.257 Counts/sample detail: PC.481: 146.0 PC.355: 147.0 PC.636: 148.0 ………………..

c)OTUanalysis

biom convert -i otu_table.biom -o otu_table.txt -b

Page 24: Workshop 11: Metagenomics Analysis · 2017-12-05 · c2) Pick a representave sequence for each OTU (pick_rep_set.py) c3) Assign taxonomy to OTU representave sequences (assign_taxonomy.py)

Summarizecommuni1esbytaxonomiccomposi1onsummarize_taxa_through_plots.py-iotu_table.biom-otaxa_summary-mFas1ng_Map.txt

#OTU ID PC.636 PC.635 PC.356 PC.481 PC.354 PC.593 PC.355 PC.607 PC.634 k__Bacteria;Other;Other 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00671140939597 0.0 k__Bacteria;p__Actinobacteria;c__Coriobacteriia 0.00675675675676 0.0 0.0 0.00684931506849 0.0 0.0 0.0 0.0134228187919 0.0133333333333 k__Bacteria;p__Bacteroidetes;c__Bacteroidia 0.675675675676 0.530201342282 0.2 0.143835616438 0.0805369

therela1veabundancesoftaxa(atthedifferentlevel)withineachsampleL2toL6:Phylum,Class,Order,Family,Genus

c)OTUanalysis

Page 25: Workshop 11: Metagenomics Analysis · 2017-12-05 · c2) Pick a representave sequence for each OTU (pick_rep_set.py) c3) Assign taxonomy to OTU representave sequences (assign_taxonomy.py)

Summarizecommuni@esbytaxonomiccomposi@on

Page 26: Workshop 11: Metagenomics Analysis · 2017-12-05 · c2) Pick a representave sequence for each OTU (pick_rep_set.py) c3) Assign taxonomy to OTU representave sequences (assign_taxonomy.py)

Makeataxonomyheatmapmake_otu_heatmap_html.py-iotu_table.biom-ootu_table_heatmap.pdf

c)OTUanalysis

Page 27: Workshop 11: Metagenomics Analysis · 2017-12-05 · c2) Pick a representave sequence for each OTU (pick_rep_set.py) c3) Assign taxonomy to OTU representave sequences (assign_taxonomy.py)

MakeanOTUnetworkmake_otu_network.py-mFas1ng_Map.txt-iotu_table.biom-onetwork

redcirclerepresentsasampleandwhitesquarerepresentsanOTU.ThelinesrepresenttheOTUspresentinapar1cularsample(Cytoscape)

c)OTUanalysis

Page 28: Workshop 11: Metagenomics Analysis · 2017-12-05 · c2) Pick a representave sequence for each OTU (pick_rep_set.py) c3) Assign taxonomy to OTU representave sequences (assign_taxonomy.py)

c)OTUanalysis Thisworkflowconsistsofthefollowingsteps:

OTUpicking,Taxonomicassignment

c1)PickOTUsbasedonsequencesimilaritywithinthereads(pick_otus.py)c2)Pickarepresenta1vesequenceforeachOTU(pick_rep_set.py)c3)AssigntaxonomytoOTUrepresenta1vesequences(assign_taxonomy.py)Inferringphylogenyc4)AlignOTUrepresenta1vesequences(align_seqs.py)c5)Filterthealignment(filter_alignment.py)c6)Buildaphylogene1ctree(make_phylogeny.py)SummarizetheOTUtablec7)MaketheOTUtable(make_otu_table.py)

Page 29: Workshop 11: Metagenomics Analysis · 2017-12-05 · c2) Pick a representave sequence for each OTU (pick_rep_set.py) c3) Assign taxonomy to OTU representave sequences (assign_taxonomy.py)

Flowchart

(c)