76
Metagenomic s - Potentials and pitfalls Mads Albertsen MEWE 2013 CENTER FOR MICROBIAL COMMUNITIES

[13.07.07] albertsen mewe13 metagenomics

Embed Size (px)

Citation preview

Page 1: [13.07.07] albertsen mewe13 metagenomics

Metagenomics- Potentials and pitfalls

Mads AlbertsenMEWE 2013

CENTER FOR MICROBIAL COMMUNITIES

Page 2: [13.07.07] albertsen mewe13 metagenomics

Agenda

Introduction

Pitfalls

Potentials

Recommendations

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Page 3: [13.07.07] albertsen mewe13 metagenomics

Introduction

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Genome = Parts list of a single genome

Page 4: [13.07.07] albertsen mewe13 metagenomics

Introduction

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Metagenome = Parts list of the community

Photo: D. Kunkel; color, E. Latypova

Page 5: [13.07.07] albertsen mewe13 metagenomics

Introduction

”...functional analysis of the collective genomes of soil microflora, which we term the metagenome of the soil.”

- J. Handelsman et al., 1998

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Page 6: [13.07.07] albertsen mewe13 metagenomics

Introduction

PubMed: metagenom*[Title/Abstract]

”...functional analysis of the collective genomes of soil microflora, which we term the metagenome of the soil.”

- J. Handelsman et al., 1998

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Page 7: [13.07.07] albertsen mewe13 metagenomics

Introduction

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

”...functional analysis of the collective genomes of soil microflora, which we term the metagenome of the soil.”

- J. Handelsman et al., 1998

PubMed: metagenom*[Title/Abstract]

Sequencing costs

http://www.genome.gov/sequencingcosts/

Page 8: [13.07.07] albertsen mewe13 metagenomics

Introduction

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Metagenomics ≠ Amplicon sequencing

Page 9: [13.07.07] albertsen mewe13 metagenomics

Sequencing and assembly

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

≈3.000.000 bppr. genome

≈1000 bp+contigs

150 bp reads

Page 10: [13.07.07] albertsen mewe13 metagenomics

Assigning information

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Contigs

Function

Taxonomy

Databases

Binning

Page 11: [13.07.07] albertsen mewe13 metagenomics

What have metagenomics been used for?

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Rusch et al., 2007 Plos Biology

Exploration

Qin et al., 2010 Nature

• 6.3 Gbp of sequence (2x Human genomes, 2000 x Bacterial genomes)

• Most sequences were novel compared to the databases

• 127 Human gut metagenomes• 600 Gbp sequence (200 x Human genomes)• 3.3 million genes identified• Minimal gut metagenome definded

Page 12: [13.07.07] albertsen mewe13 metagenomics

What have metagenomics been used for?

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

• A characteristic microbial fingerprint for each of the nine different ecosystem types

Dinsdale et al., 2008 Nature

Comparative Specific functions

Hess et al., 2011 Science

• Identified 27.755 putative carbohydrate-active genes from a cow rumen metagenome

• Expressed 90 candidates of which 57% had enzymatic activity against cellulosic substrates

Page 13: [13.07.07] albertsen mewe13 metagenomics

What have metagenomics been used for?

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

• Genome extraction from low complexity metagenome

• Candidatus Accumulibacter phosphatis• The first genome of a polyphosphate

accumulating organism (PAO) with a major role en enhanced biological phosphorus removal

Extracting genomes

• Genome extraction of low abundant species (< 0.1%) from metagenomes

• First complete TM7 genome• Access to genomes of the ”uncultured

majority”

Garcia Martin et al., 2006 Nat. Biotechnol. Albertsen et al., 2013 Nat. Biotechnol.

Page 14: [13.07.07] albertsen mewe13 metagenomics

Pitfalls

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Page 15: [13.07.07] albertsen mewe13 metagenomics

Metagenomics made easy

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Great resources – but use with care

Page 16: [13.07.07] albertsen mewe13 metagenomics

MG-RAST example

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Contigs

Page 17: [13.07.07] albertsen mewe13 metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Dataset overview

Page 18: [13.07.07] albertsen mewe13 metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

FunctionTaxonomy

Taxonomy and Function overview

Page 19: [13.07.07] albertsen mewe13 metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Compare with other samples

Samples Functional categories

Page 20: [13.07.07] albertsen mewe13 metagenomics

Pitfalls

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

You always get billions of data!

Page 21: [13.07.07] albertsen mewe13 metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Pitfalls

Is your DNA extraction OK?... and the samples you want to compare with?

Did you sequence enough?Did you know the GC bias of your protocol?Did you normalize for sequencing depth?Did you use the same sequencing platform?

Assembly = data not quantitative!Are you comparing assembled data with reads?

Page 22: [13.07.07] albertsen mewe13 metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Databases

Contigs

Databases

...you only see what is in the database

Annotated metagenome

Page 23: [13.07.07] albertsen mewe13 metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

What is in the databases?

PhylaClassOrderSpecies

2946

1001268

90249405

99322

Genomes 16S

Finshed Genomes in IMGVs.

Greengenes 16S rRNA database

Note: only including 1 strain pr. species

*97% clustering

*

Page 24: [13.07.07] albertsen mewe13 metagenomics

MG-RAST example

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Contigs

650.000 EBPR proteins with taxonomy assigned

How similar are they to the genomes in the database?

Page 25: [13.07.07] albertsen mewe13 metagenomics

Sludge microbes vs. Database genomes

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

650.000 EBPR proteins

Note: not abundance weighted

Page 26: [13.07.07] albertsen mewe13 metagenomics

Sludge microbes vs. Database genomes

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

650.000 EBPR proteins1.260.000 Human gut

Qin et al., 2010 NatureRAST ID: 4448044.3

Note: not abundance weighted

Page 27: [13.07.07] albertsen mewe13 metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Sludge microbes vs. Database genomes

The 7 genera with most EBPR proteins assigned

Page 28: [13.07.07] albertsen mewe13 metagenomics

Effect of missing genomes

What is the effect of not having closely related genomes in the database?

1. Remove a genome from the database

2. Search the removed genome against the database

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Page 29: [13.07.07] albertsen mewe13 metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Effect of missing genomes

Best hit

Bacteria 1268Proteobacteria 564Betaproteobacteria 84Rhodocyclales 5Rhodocyclaceae 5

Accumulibacter phosphatis

blastp

Related genomes

4326 proteins

Page 30: [13.07.07] albertsen mewe13 metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Effect of missing genomes

Best hit

Accumulibacter phosphatis

blastp

Related genomes

4326 proteinsAzoarcus

Bacteria 1268Proteobacteria 564Betaproteobacteria 84Rhodocyclales 5Rhodocyclaceae 5

Page 31: [13.07.07] albertsen mewe13 metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Effect of missing genomes

MEGAN LCA

Accumulibacter phosphatis

blastp

Lowest common ancester (LCA) approach:Hit 1: Beta-proteobacteria 80% IDHit 2: Gamma-proteobacteria 79% IDHit 3: Actinobacteria 59% ID

Assigned to Proteobacteria

Related genomes

4326 proteins

Bacteria 1268Proteobacteria 564Betaproteobacteria 84Rhodocyclales 5Rhodocyclaceae 5

Page 32: [13.07.07] albertsen mewe13 metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Effect of missing genomes

MEGAN LCA

Accumulibacter phosphatis

blastp

Genus

No hits 261

Bacteria 325

Proteobacteria 860

Beta- 853

Rhodocyclaceae 1149

4326 proteins:• 27% correctly

classified on genus level

• 54% not assigned the correct class

• 101 genera identified

Related genomes

Lowest common ancester (LCA) approach:Hit 1: Beta-proteobacteria 80% IDHit 2: Gamma-proteobacteria 79% IDHit 3: Actinobacteria 59% ID

Assigned to Proteobacteria

4326 proteins

Bacteria 1268Proteobacteria 564Betaproteobacteria 84Rhodocyclales 5Rhodocyclaceae 5

Page 33: [13.07.07] albertsen mewe13 metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Effect of missing genomes

MEGAN LCA

Nitrospira defluvii

Bacteria 1268Nitrospirae 3

blastp

Related genomes

4268 proteins:• 1% correctly

classified on phylum level

Phylum

Page 34: [13.07.07] albertsen mewe13 metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Effect of missing genomes

MEGAN LCA+

KEGG

Nitrospira defluvii

blastp

Related genomesBacteria 1268Nitrospirae 3

What about function?

Page 35: [13.07.07] albertsen mewe13 metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Effect of missing genomes

MEGAN LCA+

KEGG

Nitrospira defluvii

blastp

Related genomesBacteria 1268Nitrospirae 3

Page 36: [13.07.07] albertsen mewe13 metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Effect of missing genomes

Nitrospira defluvii

blastp

Related genomes

MEGAN LCA+

KEGG

Bacteria 1268Nitrospirae 3

Page 37: [13.07.07] albertsen mewe13 metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Implication of missing genomes

Function A

Function B

Function C

Function D

Page 38: [13.07.07] albertsen mewe13 metagenomics

Pitfalls

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

You always get billions of data!

Page 39: [13.07.07] albertsen mewe13 metagenomics

Potentials

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Page 40: [13.07.07] albertsen mewe13 metagenomics

Potentials

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

1. Hunting novel antibiotic resistance genes

2. Extracting genomes from metagenomes

Page 41: [13.07.07] albertsen mewe13 metagenomics

Hunting novel antibiotic resistance genes

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

What if you want to find something that is not in the

database?

Page 42: [13.07.07] albertsen mewe13 metagenomics

Hunting novel antibiotic resistance genes

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Functional metagenomics

M. Sommer, DTU, Denmark (in prep)

Page 43: [13.07.07] albertsen mewe13 metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Hunting novel antibiotic resistance genes

89 different antibiotic resistance genes

19 novel

M. Sommer, DTU, Denmark (in prep)

Page 44: [13.07.07] albertsen mewe13 metagenomics

Hunting novel antibiotic resistance genes

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

How abundant are the antibiotic genes in the

environment?

Page 45: [13.07.07] albertsen mewe13 metagenomics

Hunting novel antibiotic resistance genes

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

The number of metagenome reads

reflect the abundance of the bacteria.

Bacteria Reads

Page 46: [13.07.07] albertsen mewe13 metagenomics

Hunting novel antibiotic resistance genes

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Bacteria Reads

Page 47: [13.07.07] albertsen mewe13 metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Hunting novel antibiotic resistance genes

Bacteria Reads

Page 48: [13.07.07] albertsen mewe13 metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Hunting novel antibiotic resistance genes

Metagenomes

Antib

iotic

gen

es

89 different antibiotic resistance genes

M. Sommer, DTU, Denmark (in prep)

Page 49: [13.07.07] albertsen mewe13 metagenomics

Extracting genomes

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Page 50: [13.07.07] albertsen mewe13 metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

≈3.000.000 bppr. genome

≈1000 bp+contigs

150 bp reads

Why not full genomes?

Extracting genomes

Page 51: [13.07.07] albertsen mewe13 metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

≈3.000.000 bppr. genome

≈1000 bp+contigs

150 bp reads

Why not full genomes?

1. Micro-diversity

2. Separation of genomes (Binning)

Extracting genomes

Page 52: [13.07.07] albertsen mewe13 metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Not 1 strain

Many closely related strains

AAAAAAAAAAAAAA

AAAAAAAAATAAAA

AAAAAAAAACAAAA

AAAAAAAAA

TAAAA

CAAAA

What you get

AAAAA

Assembly

Extracting genomes

Page 53: [13.07.07] albertsen mewe13 metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Extracting genomes

Metagenome assembly is not quantitative!

Page 54: [13.07.07] albertsen mewe13 metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Reduce microdiversity

Low micro-diversityHigh micro-diversity

Short term enrichment

Page 55: [13.07.07] albertsen mewe13 metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

≈3.000.000 bppr. genome

≈1000 bp+contigs

150 bp reads

Why not full genomes?

1. Micro-diversity

2. Separation of genomes (Binning)

Extracting genomes

Page 56: [13.07.07] albertsen mewe13 metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

BinningGenomic signatures:- GC / Codon usage- Tetranucleotide frequency + statistical method

Complex sample

PhD student

”Binning”

Page 57: [13.07.07] albertsen mewe13 metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

BinningGenomic signatures:- GC / Codon usage- Tetranucleotide frequency + statistical method

Complex sample

PhD student

”Binning”

Problems:- Short pieces of sequence (1-10kbp)- Local sequence divergence

Page 58: [13.07.07] albertsen mewe13 metagenomics

Sequence composition-independent binning

Sample 1

Abun

danc

e

Sample 2

Abun

danc

e

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Binning

Page 59: [13.07.07] albertsen mewe13 metagenomics

Sequence composition-independent binning

Sample 1 Sample 2

Abundance Sample 1

Abun

danc

e Sa

mpl

e 2

Abun

danc

e

Abun

danc

e

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Binning

Page 60: [13.07.07] albertsen mewe13 metagenomics

1. Reduce micro-diversity

2. Use multiple related samples

Abundance Sample 1

Abun

danc

e Sa

mpl

e 2

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Binning

Page 61: [13.07.07] albertsen mewe13 metagenomics

1. Reduce micro-diversity

2. Use multiple related samples

Abundance Sample 1

Abun

danc

e Sa

mpl

e 2

Abundance Sample 1

Abun

danc

e Sa

mpl

e 2

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Binning

Page 62: [13.07.07] albertsen mewe13 metagenomics

Simple reactors

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYH. Daims & C. Dorninger, DOME, University of Vienna

• Nitrospira enrichment running for years

• 3 dominant species

• No micro-diversity

Page 63: [13.07.07] albertsen mewe13 metagenomics

Short term enrichment

Full-scale EBPR plantSBR reactor

Days 1. Reduction of (micro)-diversity

Competibacter

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech.

Page 64: [13.07.07] albertsen mewe13 metagenomics

Short term enrichment

Full-scale EBPR plantSBR reactor

2. Two different

DNA extraction methods

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech.

Page 65: [13.07.07] albertsen mewe13 metagenomics

Colored using a set of 100 phylogenetic marker genes

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech.

Page 66: [13.07.07] albertsen mewe13 metagenomics

Colored using a set of 100 phylogenetic marker genes

TM7-1 (1.6%)

TM7-2 (0.7%)

TM7-3 (0.2%)

TM7-4 (0.06%)

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech.

Page 67: [13.07.07] albertsen mewe13 metagenomics

Zoom on target

TM7-2 (0.7%)

Colored using a set of 100 phylogenetic marker genes

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech.

Page 68: [13.07.07] albertsen mewe13 metagenomics

Zoom on target

PC2

PC1

TM7-2

PCA on genomic signatures

TM7-2 (0.7%)

Colored using a set of 100 phylogenetic marker genes

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech.

Page 69: [13.07.07] albertsen mewe13 metagenomics

Colored using a set of 100 phylogenetic marker genes

TM7-1 (1.6%)

Candidate phylum TM7

Saccharibacteria

Candidatus Saccharimonas aalborgensis

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech.

Page 70: [13.07.07] albertsen mewe13 metagenomics

CandidatusCompetibacter denitrificans

(10.6%)

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech.

Poster by S. McIlroy

Page 71: [13.07.07] albertsen mewe13 metagenomics

Genome assembly validation

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech.

Phyla

Genes (HMM model)

Essential single copy genesAssembly inspection

Page 72: [13.07.07] albertsen mewe13 metagenomics

Multi-metagenome

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech.

http://madsalbertsen.github.io/multi-metagenome/Short: goo.gl/0ctA3

• Guides• Workflow scripts• Example data• All the code• Reccomendations

Page 73: [13.07.07] albertsen mewe13 metagenomics

Multi-metagenome

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Highly complex environments...

...add more samples!

Talk by SM. Karst

Page 74: [13.07.07] albertsen mewe13 metagenomics

Potentials

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Metabolites

Proteins

mRNA

DNA

Meta-bolomics

Meta-proteomics

Meta-transcriptomics

Meta-genomics

Data integration

In Situ methods

Community structure Microbial functions

Extraction

P-Removal:

N-Removal:

-Removal:

Foaming:

Ethanol production:

Microbial needsEcology

Page 75: [13.07.07] albertsen mewe13 metagenomics

Recommendations

• Do you really need metagenomics?

• Are the databases usefull in your environment?• Unless human related they are not...

• Metagenomics is just the parts list ... of the DNA that could be extracted... and the functions that could be annotated

• Validation, validation validation!• Bioinformatic• In situ

• Genome extraction from simple reactors is possible• Enables comprehensive transcriptomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Page 76: [13.07.07] albertsen mewe13 metagenomics

Metagenomics is pretty...

...but not always informative