53
QIIME Workshop Get started by opening: http://bit.ly/mbe-qiime2012 and read up at: www.qiime.org Greg Caporaso [email protected]

Caporaso sloan qiime_workshop_slides_18_oct2012

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Caporaso sloan qiime_workshop_slides_18_oct2012

QIIME Workshop

Get started by opening:http://bit.ly/mbe-qiime2012

and read up at: www.qiime.org

Greg [email protected]

Page 2: Caporaso sloan qiime_workshop_slides_18_oct2012

Extract DNA and amplify marker gene with barcoded primers Pool amplicons and

sequence

Assign millions of sequences from thousands

of samples to OTUs

Compute UniFrac distances and compare samples

www.qiime.org

Assign reads to samples

>GCACCTGAGGACAGGCATGAGGAA…>GCACCTGAGGACAGGGGAGGAGGA…>TCACATGAACCTAGGCAGGACGAA…>CTACCGGAGGACAGGCATGAGGAT…>TCACATGAACCTAGGCAGGAGGAA…>GCACCTGAGGACACGCAGGACGAC…>CTACCGGAGGACAGGCAGGAGGAA…>CTACCGGAGGACACACAGGAGGAA…>GAACCTTCACATAGGCAGGAGGAT…>TCACATGAACCTAGGGGCAAGGAA…>GCACCTGAGGACAGGCAGGAGGAA…

Page 3: Caporaso sloan qiime_workshop_slides_18_oct2012

>5000 samples in analysis pipeline

• Stream and lake water• Marine water, sediment and reef• Soil (forest, farm, peatland, tundra, …)• Air• Coalbed• Arctic ice core• Insect-associated• Human-associated (gut, mouth, skin)

http://www.earthmicrobiome.org/

Page 4: Caporaso sloan qiime_workshop_slides_18_oct2012

>5000 samples analyzed to date

Page 5: Caporaso sloan qiime_workshop_slides_18_oct2012

Alpha diversity by environment type

Page 6: Caporaso sloan qiime_workshop_slides_18_oct2012

Where do we look for new diversity?

* As determined by no hit to Greengenes database.

Page 7: Caporaso sloan qiime_workshop_slides_18_oct2012
Page 8: Caporaso sloan qiime_workshop_slides_18_oct2012

http://analytics.google.com

Page 9: Caporaso sloan qiime_workshop_slides_18_oct2012

Running QIIME

Native installation on OS X or Linux (laptops through 16,416-core compute cluster*)

Ubuntu Linux Virtual Box

Amazon Web Services (EC2)

* http://ncar.janus.rc.colorado.edu/

Page 10: Caporaso sloan qiime_workshop_slides_18_oct2012

IPython notebook

Page 11: Caporaso sloan qiime_workshop_slides_18_oct2012

Moving Pictures of the Human Microbiome

• Two subjects sampled daily, one for six months, one for 18 months

• Four body sites: tongue, palm of left hand, palm of right hand, and gut (via fecal swabs).

Page 12: Caporaso sloan qiime_workshop_slides_18_oct2012

Moving Pictures of the Human Microbiome

• Investigate the relative temporal variability of body sites.

• Is there a temporal core microbiome?• Technical points: do we observe the same

conclusions on 454 and Illumina data?

Page 13: Caporaso sloan qiime_workshop_slides_18_oct2012

Moving Pictures of the Human Microbiome: QIIME tutorial

• A small subset of the full data set to facilitate short run time: ~0.1% of the full sequence collection.

• Sequenced across six Illumina GAIIx lanes, with a subset of the samples also sequenced on 454.

• The online tutorial contains details on all of the steps: go back and read that text.

Page 14: Caporaso sloan qiime_workshop_slides_18_oct2012

Key QIIME files

• Mapping file: per sample meta-data, user-defined

• Input sequence file• OTU table: sample x OTU matrix, central to

downstream analyses [now in biom format]• Parameters file: defines analyses, for use

with the ‘workflow’ scripts (optional)

Page 15: Caporaso sloan qiime_workshop_slides_18_oct2012

Mapping file

Page 16: Caporaso sloan qiime_workshop_slides_18_oct2012

Mapping file: always run check_id_map.py

= required field

Page 17: Caporaso sloan qiime_workshop_slides_18_oct2012

Sequences file

Page 18: Caporaso sloan qiime_workshop_slides_18_oct2012

>[sampleID_seqID] description

Barcodes have been removed!!

Page 19: Caporaso sloan qiime_workshop_slides_18_oct2012

>[sampleID_seqID] description

Barcodes have been removed!!

Page 20: Caporaso sloan qiime_workshop_slides_18_oct2012

Sequences file: can be user-provided, or generated by split_libraries.py

Page 21: Caporaso sloan qiime_workshop_slides_18_oct2012

OTU table (classic format)

sample x OTU matrix

Page 22: Caporaso sloan qiime_workshop_slides_18_oct2012

OTU identifiers

OTU table (classic format)

sample x OTU matrix

Page 23: Caporaso sloan qiime_workshop_slides_18_oct2012

Sample identifiers

OTU table (classic format)

sample x OTU matrix

Page 24: Caporaso sloan qiime_workshop_slides_18_oct2012

Optional per OTU taxonomic information

OTU table (classic format)

sample x OTU matrix

Page 25: Caporaso sloan qiime_workshop_slides_18_oct2012

http://biom-format.org

OTU tables are now in biological observation matrix (.biom) format

(QIIME 1.4.0-dev and later)Google: “biom format”

See convert_biom.pyfor translating between classic and biom otu tables

Page 26: Caporaso sloan qiime_workshop_slides_18_oct2012

sample x observation contingency matrix

Observationcounts

Page 27: Caporaso sloan qiime_workshop_slides_18_oct2012

sample x observation contingency matrix

Observationcounts

Page 28: Caporaso sloan qiime_workshop_slides_18_oct2012

sample x observation contingency matrix

Observationcounts

Page 29: Caporaso sloan qiime_workshop_slides_18_oct2012

sample x observation contingency matrix

Markergene (e.g., 16S)surveys

Comparativegenomics

Markergene (e.g., 16S)surveys

Metagenomics

MetatranscriptomicsMetabolomics . . .

Page 30: Caporaso sloan qiime_workshop_slides_18_oct2012

http://www.biom-format.org

The Biological Observation Matrix (BIOM) Format or: How I Learned To Stop Worrying and Love the Ome-ome

JSON-based format for representing arbitrary sample x observation contingency tables with optional metadata

McDonald et al., GigaScience (2012).

Page 31: Caporaso sloan qiime_workshop_slides_18_oct2012

Comparative genomic (B) and metagenome analysis (C) with QIIME

Page 32: Caporaso sloan qiime_workshop_slides_18_oct2012

Working with OTU tables

• single_rarefaction.py: even sampling (very important if you have different numbers of seqs/sample!)

• filter_otus_from_otu_table.py• filter_samples_from_otu_table.py• per_library_stats.py

Page 33: Caporaso sloan qiime_workshop_slides_18_oct2012

OTU picking: terminology

Page 34: Caporaso sloan qiime_workshop_slides_18_oct2012

OTU picking

• De Novo – Reads are clustered based on similarity to one

another.• Reference-based– Closed reference: any reads which don’t hit a

reference sequence are discarded– Open reference: any reads which don’t hit a

reference sequence are clustered de novo

Page 35: Caporaso sloan qiime_workshop_slides_18_oct2012

De novo OTU picking

• Pros– All reads are clustered

• Cons– Not parallelizable– OTUs may be defined by erroneous reads

Page 36: Caporaso sloan qiime_workshop_slides_18_oct2012

Closed-reference OTU picking

• Pros– Built-in quality filter– Easily parallelizable– OTUs are defined by high-quality, trusted

sequences• Cons– Reads that don’t hit reference dataset are

excluded, so you can never observe new OTUs

Page 37: Caporaso sloan qiime_workshop_slides_18_oct2012

Percentage of reads that do not hit the reference collection, by environment type.

Page 38: Caporaso sloan qiime_workshop_slides_18_oct2012

Open-reference OTU picking

• Pros– All reads are clustered– Partially parallelizable

• Cons– Only partially parallelizable– Mix of high quality sequences defining OTUs (i.e.,

the database sequences) and possible low quality sequences defining OTUs (i.e., the sequencing reads)

Page 39: Caporaso sloan qiime_workshop_slides_18_oct2012

Considerations in analysis

Page 40: Caporaso sloan qiime_workshop_slides_18_oct2012

Variation in sampling depth is an important consideration

Human skin, colored by individual, at 500 sequence/sample

Image/analysis credit: Justin Kuczynski

Data reference:Forensic identification using skin bacterial communities. Fierer N, Lauber CL, Zhou N, McDonald D, Costello EK, Knight R. Proc Natl Acad Sci U S A. 2010 Apr 6;107(14):6477-81.

Page 41: Caporaso sloan qiime_workshop_slides_18_oct2012

Image/analysis credit: Justin Kuczynski

Data reference:Forensic identification using skin bacterial communities. Fierer N, Lauber CL, Zhou N, McDonald D, Costello EK, Knight R. Proc Natl Acad Sci U S A. 2010 Apr 6;107(14):6477-81.

Variation in sampling depth is an important consideration

Human skin, colored by sampling depth, at either 50 or 500 sequences/sample

Page 42: Caporaso sloan qiime_workshop_slides_18_oct2012

Human skin, colored by sampling depth, at either 50 (blue) or 500 (red) sequences/sample

Image/analysis credit: Justin Kuczynski

Data reference:Forensic identification using skin bacterial communities. Fierer N, Lauber CL, Zhou N, McDonald D, Costello EK, Knight R. Proc Natl Acad Sci U S A. 2010 Apr 6;107(14):6477-81.

Variation in sampling depth is an important consideration

Page 43: Caporaso sloan qiime_workshop_slides_18_oct2012

How deep is deep enough?

It depends on the question…– Differences between community types: not many

sequences.– Rare biosphere: more (but be careful about

sequencing noise!)

Page 44: Caporaso sloan qiime_workshop_slides_18_oct2012

100 sequences/sample 10 sequences/sample 1 sequence/sample

Direct sequencing of the human microbiome readily reveals community differences.J Kuczynski et al. Genome Biology (2011).

How deep is deep enough?

Page 45: Caporaso sloan qiime_workshop_slides_18_oct2012

Figure 1

Page 46: Caporaso sloan qiime_workshop_slides_18_oct2012

Can we get accurate taxonomic assignment from short reads?

Page 47: Caporaso sloan qiime_workshop_slides_18_oct2012
Page 48: Caporaso sloan qiime_workshop_slides_18_oct2012
Page 49: Caporaso sloan qiime_workshop_slides_18_oct2012

Extra slides

Page 50: Caporaso sloan qiime_workshop_slides_18_oct2012
Page 51: Caporaso sloan qiime_workshop_slides_18_oct2012
Page 52: Caporaso sloan qiime_workshop_slides_18_oct2012
Page 53: Caporaso sloan qiime_workshop_slides_18_oct2012

This work is licensed under the Creative Commons Attribution 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.

Feel free to use or modify these slides, but please credit me by placing the following attribution information where you feel that it makes sense: Greg Caporaso, www.caporaso.us.