22
SHAMAN : SH iny A pplication for M etagenomic AN alysis Stevenn Volant, Amine Ghozlane Hub Bioinformatique et Biostatistique – C3BI, USR 3756 IP CNRS Biomics – CITECH

Metagenomic ANalysis

  • Upload
    others

  • View
    19

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Metagenomic ANalysis

SHAMAN : SHiny Application for Metagenomic ANalysisStevenn Volant, Amine GhozlaneHub Bioinformatique et Biostatistique – C3BI, USR 3756 IP CNRSBiomics – CITECH

Page 2: Metagenomic ANalysis

Image from Alberts Molecular Biology of the Cell 5th

Ribosome

ITS (1) : located between 18S and 5.8S rRNA genes

2 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016

Page 3: Metagenomic ANalysis

Quantitative metagenomics pipeline

Mapping

Merging Clustering500

1000

300

Samples

1 N

OTU 1

OTU 2

OTU 3

16S/18S/ITS

3 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016

Page 4: Metagenomic ANalysis

HUB – 16S/18S/ITS pipeline

A A

A A

A

A

Dereplication

Chimera filteringGold database

Amplicon sequences

Clustering

Taxonomic Annotation

Mapping

OTU

Countmatrix

PhylogenyMafft/Fastree

Vsearch3

1.Genomics, 102(5), 500-506. 2.Bioinformatics Vol.27 no. 21 2011, p2957-2963. 3.https://github.com/torognes/vsearch

Sample

Amplicon

FastQC

FLASH2

Silva SSU 123RDP

Greengenes13,4

Filter Human PhiX reads

Alientrimmer1

26 min

100 samples50 pairs30 min

Findley

UNITE150 min

4 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016

Readprocessing

OTU building

Page 5: Metagenomic ANalysis

Uparse/Vsearch

Integrated in QIIME, MOTHUR and LotuS

Nature methods, 10(10), 996-998.

5 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016

Page 6: Metagenomic ANalysis

Quantitative metagenomics pipeline

Mapping

Merging Clustering500

1000

300

Samples

1 N

OTU 1

OTU 2

OTU 3

16S/18S/ITS

6 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016

« être éclairé » (Toungouse - Sibérie)

SHAMAN

Page 7: Metagenomic ANalysis

SHAMAN : shaman.c3bi.pasteur.fr« There is no disputing the importance of statistical analysis in biological research, but too often it is 

considered only after an experiment is completed, when it may be too late. »

7 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016

Page 8: Metagenomic ANalysis

SHAMAN : shaman.c3bi.pasteur.fr

8 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016

Counts Annotation

Page 9: Metagenomic ANalysis

SHAMAN : shaman.c3bi.pasteur.fr« There is no disputing the importance of statistical analysis in biological research, but too often it is 

considered only after an experiment is completed, when it may be too late. »

9 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016

Page 10: Metagenomic ANalysis

Metagenomic vs RNA-seq

Metagenomic RNA-seq

DistributionOverdispersed counts → Negative

binomialOverdispersed counts → Negative

binomial

Constraints Highly abundant species Highly expressed genes

Goal

Find differentially abundant features (species, familly, …):

OTU distributions and abundances vary between conditions

Find differentially expressed genes: Distributions and expression vary

between conditions

DESeq2 approach is usually used for RNA-seq dataset

Metagenomic data are similar to RNA-seq data

10 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016

Page 11: Metagenomic ANalysis

Data normalization

How ?✗ Fitting the distributions (Total Read Count, UpperQuartile, Median, Full Quantile)✗ Account for the feature length (RPKM)✗ Concept of « effective reads number » (TMM, DESeq2)

Why ? ✗ To correct technical biaises and make samples comparables.

Remarks?✗ Some methods normalize the counts, others the library sizes✗ Some are designed for differential analysis

11 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016

Page 12: Metagenomic ANalysis

DESeq2 normalization (OTU level)

Assumption✗ Most of the OTU have the « same » abundance between 2 conditions

where

Xij : Number of mapped reads of the OTU i in sample jn: Number of samples

Normalization factor :

12 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016

Page 13: Metagenomic ANalysis

Comparison with RPKM (1/3)

● RPKM : Reads Per Kilobase per Million mapped reads

Assumption✗ Counts are proportional to abundance, the length and the sequencing deepth.

● Method

Normalized counts =

per Million Per Kilobase

Length Number of reads of sample j

13 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016

Page 14: Metagenomic ANalysis

Comparison with RPKM (2/3)

Comparison of 7 normalization methods

14 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016

Page 15: Metagenomic ANalysis

Comparison with RPKM (3/3)

Results on real data (7 samples) FDR and Power

15 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016

Dillies M. et al., Bioinformatics 2013

Page 16: Metagenomic ANalysis

To sum up

DESeq2 normalization provides better results

Recommend using DESeq2 to perform analysis of differential abundance

16 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016

Page 17: Metagenomic ANalysis

Statistical model of DESeq2

Generalized Linear Model

Dispersion

Log2 fold change

Advantages✗ Allows complex experimental designs.

17 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016

Size factor

Moyenne

Page 18: Metagenomic ANalysis

Dispersion estimation

Problem✗ Get a good estimate of the dispersion with a small number of samples.

Modelisation of the dispersion :

Function of the mean of normalized count

Local parametric regression

18 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016

Page 19: Metagenomic ANalysis

Contrasts (comparisons)

Aim✗ Testing a specific effect without having to re-fit the model.

Advantages✗ Parameters are estimated with all samples.

Coefficients

Contrast vector

Covariance matrix

19 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016

Page 20: Metagenomic ANalysis

Conclusions

SHAMAN

16s/18s/its analysis

Strong statistical approach

Several visualizations available

Access : http://shaman.c3bi.pasteur.fr

Incoming features

WGS analysis

New visualizations (Taxonomy plot, Krona, continuous data)

Compatibility with FROGS

Page 21: Metagenomic ANalysis

CIB – FROGS 16S/18S – GALAXY Pasteur

Galaxy team : Mathieu Valade, Fabien MareuilEmmanuel Quevillon, Eric Deveaud

Vsearchswarm

21 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016

Page 22: Metagenomic ANalysis

Acknowledgements

CiTech Biomics Pole

Sean KENNEDYBéatrice REGNAULT

Olivier GASCUELMarie-Agnès DILLIESChristophe MALABATHugo VARETNicolas MAILLETPierre LECHATAnna ZHUKOVARachel TORCHET

C3BI Bioinformatics Biostatistics Hub