Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Paul J. McMurdie Second Genome, South San Francisco, CA, USA !Formerly: Statistics Department Stanford University
Lecture 01: Introduction - Statistical analysis of Microbiome Count Data
1
Goals for the day.
• Unit 1 - morning - Interactive/Exploratory Analyses - Alpha/Beta Diversity - Distances - Transformations - Ordination (Dimensional Reduction Techniques) - Perm. Multivariate ANOVA testing (adonis) • Lunch • Unit II - afternoon - Multiple Testing, Mixture Models • We can’t cover everything in one day! - An Introduction to Statistical Learning: - http://www-bcf.usc.edu/~gareth/ISL/
Link to NIOO Statistics-Day Materials
2
• A population of a single species/strain is a culture, extremely rare outside of lab, some infections
• A microbiome is a mixed population of different microbial species (microbial ecosystem)
What is a microbiome?The totality of microbes in a defined environment, especially their genomes and interactions with each other and surrounding environment.
3
• Universal Gene census (amplicon)
• Shotgun Metagenome Sequencing
• Transcriptomics (shotgun mRNA)
• Proteomics (protein fragments)
• Metabolomics (excreted chemicals)
Number of Species Counted
Methods for characterizing microbiomes
4
Data Analysis Tools of the Trade: R - RStudio - Rmarkdown
Data analysis should be: • Modular - Save time, protect mistakes, share • Repeatable - Confirmable by others • Flexible - Analysis not predefined • There is no “write paper” button
• Interactive - Answers create new questions !Recommendation: R - RStudio - Rmarkdown (and Bioconductor for this crowd) Lab exercises today will be Rmarkdown based, and I will encourage you to save your work this way
5
markdown (code + console) +
figures
phyloseq + ggplot2 + etc.
# Main title!This is an [R Markdown](my.link.com) document of my recent analysis.!## Subsection: some codeHere is some import code, etc.```{r}library("phyloseq")library("ggplot2")physeq = import_biom(“datafile.biom”)plot_richness(physeq)```
source.Rmd
Complete HTML5
knitr::knit2html()
microbiome data
Our Goal with Collaborators: Reproducible analysis workflow with R-markdown
Better Reproducibility
6
Paul J. McMurdie !Statistics Department & CEHG Stanford University !with contributions from Prof Susan Holmes
Microbiome data heterogeneity and processing
http://joey711.github.io/phyloseq/
phyloseq
7
phyloseq
http://joey711.github.io/phyloseq/
phyloseq Home Page
8
apepackage
OTU Abundanceotu_table
Sample Variablessample_data
Taxonomy TabletaxonomyTable
Phylogenetic Treephylo
otu_table sample_data tax_table phy_tree
otu_table sample_data tax_table
read.treeread.nexusread_tree
as as as
import
phyloseqconstructor:
Biostringspackage
Reference Seq.XStringSet
DNAStringSet RNAStringSet
AAStringSet
phyloseq
Experiment Data
otu_table,sam_data,tax_table,phy_treerefseq
Accessors:get_taxaget_samplesget_variablensamplesntaxarank_namessample_namessample_sumssample_variablestaxa_namestaxa_sums
Processors:filter_taxamerge_phyloseqmerge_samplesmerge_taxaprune_samplesprune_taxasubset_taxasubset_samplestip_glomtax_glom
matrix matrixdata.frame
optional
refseq
data
data structure & APIphyloseq
http://joey711.github.io/phyloseq/
9
phyloseq
Preprocessing
Import
Direct Plots
plot_network plot_heatmap plot_ordination
distance ordinate
Summary / ExploratoryGraphics
filter_taxafilterfun_samplegenefilter_sampleprune_taxaprune_samplessubset_taxasubset_samplestransform_sample_counts
import_biomimport_mothurimport_pyrotaggerimport_qiimeimport_RDP
plot_tree
plot_richness
plot_bar
bootstrappermutation testsregressiondiscriminant analysismultiple testinggap statisticclusteringprocrustes
Inference, Testing
sample data
OTU cluster output
Input
raw
phyloseqprocessed
work flowphyloseq
10
graphics
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●●
●● ●
●●
●
●
●
●
●
●
●●
●
●●
●
●●●●
−0.3
−0.2
−0.1
0.0
0.1
0.2
0.3
−0.4 −0.2 0.0 0.2 0.4NMDS1
NM
DS2
SampleType●●●●●●●●●●●●●●●●●●
FecesFreshwaterFreshwater (creek)MockOceanSediment (estuary)SkinSoilTongue
plot_ordination, NMDS, wUF
FreshwaterFreshwater (creek)FreshwaterFreshwater (creek)Freshwater (creek)SoilSoilSoilSkinSkinSkinM
ockM
ockM
ockFecesFecesFecesFecesSedim
ent (estuary)TongueTongueO
ceanO
ceanO
ceanSedim
ent (estuary)Sedim
ent (estuary)
SampleType
OTU
1
100
10000
Abundance
plot_heatmap; bray−curtis, NMDS
●
●
●
●●
●
●●
●●●●
●
●
●●●●●
●
●
●
●
●
●
●●
●●●●●●
●
●
●
●
●
●●●
●
●●
●●●●●●●
●
●
●●
●
●
●
●●
●
●
●●●
●
●● ●●
●
●
●
●●●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●● ●
●●
●
SeqTech●
●
●
IlluminaPyro454Sanger
Enterotype● 1
23
plot_network; Enterotype data, bray−curtis, max.dist=0.25
●●
●
●
●
● ●●
●●
●
●
●●●●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●●●
●
●●
●
●
●
●
●●
●●
●
●
●●
●
●
●●
●●
●●
●
●
●
●●
●
●●
●●
●
●
●●
●●
●
●
●●
●●
●●
●●
●
●
●●
●●
●
●
●●
●●
●
●●
●
●
●●
● ●● ●
●●
●●
●
●
●
●
●
●
●●
●●
●
●●
●●
●
●
●
●
●
●●
● ●● ●
●
●●
●●
●
CytophagaEmticicia
Sphingobacterium
Segetibacter
Haliscomenobacter
Pedobacter
Bacteroides
Alistipes
Bacteroides
Cytophaga
Porphyromonas
Prevotella
Parabacteroides
Algoriphagus
Odoribacter
CandidatusAquirestis
Capnocytophaga
Porphyromonas
Spirosoma
Prevotella
Balneola
Prevotella
Hymenobacter
Prevotella
●
●
●
●●
●●
●●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
76
73
75
75
79
67
81
84
84
82
75
Abundance●
●
●
●
12562515625
SampleType●
●
●
●
●
●
●
●
●
FecesFreshwaterFreshwater (creek)MockOceanSediment (estuary)SkinSoilTongue
Order● Bacteroidales
FlavobacterialesSphingobacteriales
plot_tree; Bacteroidetes−only. Merged samples, tip_glom=0.1
0e+00
2e+05
4e+05
6e+05
Feces
Freshwater
Freshwater (creek)
Mock
Ocean
Sediment (estuary)
Skin
Soil
Tongue
SampleType
Abun
danc
e
FamilyBacteroidaceaeBalneolaceaeCryomorphaceaeCyclobacteriaceaeFlavobacteriaceaeFlexibacteraceaePorphyromonadaceaePrevotellaceaeRikenellaceaeSaprospiraceaeSphingobacteriaceae
plot_bar; Bacteroidetes−only
●
S.obs S.chao1 S.ACE
2000
4000
6000
8000
FALSE TRUE FALSE TRUE FALSE TRUEHuman Associated Samples
Num
ber o
f OTU
s
SampleTypeFecesFreshwaterFreshwater (creek)MockOceanSediment (estuary)SkinSoilTongue
plot_ordination()
plot_network()
plot_bar()
plot_heatmap()
plot_tree()
plot_richness()
phyloseq
11
graphics
plot_ordination()
−1.0
−0.5
0.0
0.5
1.0
1.5
−1 0 1CA1
CA2
SampleTypeFecesFreshwaterFreshwater (creek)MockOceanSediment (estuary)SkinSoilTongue
Samples Only; type="samples"
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●
●
●
●
●
●
●
●
●
●●●●●
●
●●●●●●●●
●●●
●●●●●●●
●
●●
●●●
● ●●●
−1.0
−0.5
0.0
0.5
1.0
1.5
−1 0 1CA1
CA2
Class●
●
FlavobacteriaSphingobacteriaBacteroidiasamples
SampleType●
●
●
●
●
●
●
●
●
●
FecesFreshwaterFreshwater (creek)MockOceanSediment (estuary)SkinSoilTonguetaxa
type
●●
samplestaxa
Biplot; type="biplot"
●
●●●●●
●
●●●●
●
●●●●●●●●●●●●●
●
●
●
●
●●●●●
●●●
●
●
●●●●
●
●
●●●●●●●●●
●
●
●
●●
●
Bacteroidia Flavobacteria Sphingobacteria
−1.0
−0.5
0.0
0.5
1.0
1.5
−1 0 1 −1 0 1 −1 0 1CA1
CA2
Class
●
BacteroidiaFlavobacteriaSphingobacteria
Taxa Only; type="taxa"
samples taxa
−1.0
−0.5
0.0
0.5
1.0
1.5
−1 0 1 −1 0 1CA1
CA2
ClassFlavobacteriaSphingobacteriaBacteroidiasamples
SampleTypeFecesFreshwaterFreshwater (creek)MockOceanSediment (estuary)SkinSoilTonguetaxa
Split Plot; type="split"
type="scree"
split
taxa-only
biplot
samples-only
phyloseq
12
http://joey711.github.io/phyloseq/tutorials-index
13