[13.07.07] albertsen mewe13 metagenomics

Preview:

Citation preview

Metagenomics- Potentials and pitfalls

Mads AlbertsenMEWE 2013

CENTER FOR MICROBIAL COMMUNITIES

Agenda

Introduction

Pitfalls

Potentials

Recommendations

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Introduction

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Genome = Parts list of a single genome

Introduction

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Metagenome = Parts list of the community

Photo: D. Kunkel; color, E. Latypova

Introduction

”...functional analysis of the collective genomes of soil microflora, which we term the metagenome of the soil.”

- J. Handelsman et al., 1998

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Introduction

PubMed: metagenom*[Title/Abstract]

”...functional analysis of the collective genomes of soil microflora, which we term the metagenome of the soil.”

- J. Handelsman et al., 1998

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Introduction

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

”...functional analysis of the collective genomes of soil microflora, which we term the metagenome of the soil.”

- J. Handelsman et al., 1998

PubMed: metagenom*[Title/Abstract]

Sequencing costs

http://www.genome.gov/sequencingcosts/

Introduction

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Metagenomics ≠ Amplicon sequencing

Sequencing and assembly

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

≈3.000.000 bppr. genome

≈1000 bp+contigs

150 bp reads

Assigning information

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Contigs

Function

Taxonomy

Databases

Binning

What have metagenomics been used for?

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Rusch et al., 2007 Plos Biology

Exploration

Qin et al., 2010 Nature

• 6.3 Gbp of sequence (2x Human genomes, 2000 x Bacterial genomes)

• Most sequences were novel compared to the databases

• 127 Human gut metagenomes• 600 Gbp sequence (200 x Human genomes)• 3.3 million genes identified• Minimal gut metagenome definded

What have metagenomics been used for?

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

• A characteristic microbial fingerprint for each of the nine different ecosystem types

Dinsdale et al., 2008 Nature

Comparative Specific functions

Hess et al., 2011 Science

• Identified 27.755 putative carbohydrate-active genes from a cow rumen metagenome

• Expressed 90 candidates of which 57% had enzymatic activity against cellulosic substrates

What have metagenomics been used for?

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

• Genome extraction from low complexity metagenome

• Candidatus Accumulibacter phosphatis• The first genome of a polyphosphate

accumulating organism (PAO) with a major role en enhanced biological phosphorus removal

Extracting genomes

• Genome extraction of low abundant species (< 0.1%) from metagenomes

• First complete TM7 genome• Access to genomes of the ”uncultured

majority”

Garcia Martin et al., 2006 Nat. Biotechnol. Albertsen et al., 2013 Nat. Biotechnol.

Pitfalls

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Metagenomics made easy

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Great resources – but use with care

MG-RAST example

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Contigs

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Dataset overview

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

FunctionTaxonomy

Taxonomy and Function overview

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Compare with other samples

Samples Functional categories

Pitfalls

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

You always get billions of data!

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Pitfalls

Is your DNA extraction OK?... and the samples you want to compare with?

Did you sequence enough?Did you know the GC bias of your protocol?Did you normalize for sequencing depth?Did you use the same sequencing platform?

Assembly = data not quantitative!Are you comparing assembled data with reads?

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Databases

Contigs

Databases

...you only see what is in the database

Annotated metagenome

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

What is in the databases?

PhylaClassOrderSpecies

2946

1001268

90249405

99322

Genomes 16S

Finshed Genomes in IMGVs.

Greengenes 16S rRNA database

Note: only including 1 strain pr. species

*97% clustering

*

MG-RAST example

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Contigs

650.000 EBPR proteins with taxonomy assigned

How similar are they to the genomes in the database?

Sludge microbes vs. Database genomes

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

650.000 EBPR proteins

Note: not abundance weighted

Sludge microbes vs. Database genomes

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

650.000 EBPR proteins1.260.000 Human gut

Qin et al., 2010 NatureRAST ID: 4448044.3

Note: not abundance weighted

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Sludge microbes vs. Database genomes

The 7 genera with most EBPR proteins assigned

Effect of missing genomes

What is the effect of not having closely related genomes in the database?

1. Remove a genome from the database

2. Search the removed genome against the database

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Effect of missing genomes

Best hit

Bacteria 1268Proteobacteria 564Betaproteobacteria 84Rhodocyclales 5Rhodocyclaceae 5

Accumulibacter phosphatis

blastp

Related genomes

4326 proteins

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Effect of missing genomes

Best hit

Accumulibacter phosphatis

blastp

Related genomes

4326 proteinsAzoarcus

Bacteria 1268Proteobacteria 564Betaproteobacteria 84Rhodocyclales 5Rhodocyclaceae 5

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Effect of missing genomes

MEGAN LCA

Accumulibacter phosphatis

blastp

Lowest common ancester (LCA) approach:Hit 1: Beta-proteobacteria 80% IDHit 2: Gamma-proteobacteria 79% IDHit 3: Actinobacteria 59% ID

Assigned to Proteobacteria

Related genomes

4326 proteins

Bacteria 1268Proteobacteria 564Betaproteobacteria 84Rhodocyclales 5Rhodocyclaceae 5

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Effect of missing genomes

MEGAN LCA

Accumulibacter phosphatis

blastp

Genus

No hits 261

Bacteria 325

Proteobacteria 860

Beta- 853

Rhodocyclaceae 1149

4326 proteins:• 27% correctly

classified on genus level

• 54% not assigned the correct class

• 101 genera identified

Related genomes

Lowest common ancester (LCA) approach:Hit 1: Beta-proteobacteria 80% IDHit 2: Gamma-proteobacteria 79% IDHit 3: Actinobacteria 59% ID

Assigned to Proteobacteria

4326 proteins

Bacteria 1268Proteobacteria 564Betaproteobacteria 84Rhodocyclales 5Rhodocyclaceae 5

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Effect of missing genomes

MEGAN LCA

Nitrospira defluvii

Bacteria 1268Nitrospirae 3

blastp

Related genomes

4268 proteins:• 1% correctly

classified on phylum level

Phylum

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Effect of missing genomes

MEGAN LCA+

KEGG

Nitrospira defluvii

blastp

Related genomesBacteria 1268Nitrospirae 3

What about function?

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Effect of missing genomes

MEGAN LCA+

KEGG

Nitrospira defluvii

blastp

Related genomesBacteria 1268Nitrospirae 3

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Effect of missing genomes

Nitrospira defluvii

blastp

Related genomes

MEGAN LCA+

KEGG

Bacteria 1268Nitrospirae 3

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Implication of missing genomes

Function A

Function B

Function C

Function D

Pitfalls

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

You always get billions of data!

Potentials

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Potentials

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

1. Hunting novel antibiotic resistance genes

2. Extracting genomes from metagenomes

Hunting novel antibiotic resistance genes

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

What if you want to find something that is not in the

database?

Hunting novel antibiotic resistance genes

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Functional metagenomics

M. Sommer, DTU, Denmark (in prep)

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Hunting novel antibiotic resistance genes

89 different antibiotic resistance genes

19 novel

M. Sommer, DTU, Denmark (in prep)

Hunting novel antibiotic resistance genes

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

How abundant are the antibiotic genes in the

environment?

Hunting novel antibiotic resistance genes

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

The number of metagenome reads

reflect the abundance of the bacteria.

Bacteria Reads

Hunting novel antibiotic resistance genes

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Bacteria Reads

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Hunting novel antibiotic resistance genes

Bacteria Reads

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Hunting novel antibiotic resistance genes

Metagenomes

Antib

iotic

gen

es

89 different antibiotic resistance genes

M. Sommer, DTU, Denmark (in prep)

Extracting genomes

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

≈3.000.000 bppr. genome

≈1000 bp+contigs

150 bp reads

Why not full genomes?

Extracting genomes

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

≈3.000.000 bppr. genome

≈1000 bp+contigs

150 bp reads

Why not full genomes?

1. Micro-diversity

2. Separation of genomes (Binning)

Extracting genomes

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Not 1 strain

Many closely related strains

AAAAAAAAAAAAAA

AAAAAAAAATAAAA

AAAAAAAAACAAAA

AAAAAAAAA

TAAAA

CAAAA

What you get

AAAAA

Assembly

Extracting genomes

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Extracting genomes

Metagenome assembly is not quantitative!

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Reduce microdiversity

Low micro-diversityHigh micro-diversity

Short term enrichment

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

≈3.000.000 bppr. genome

≈1000 bp+contigs

150 bp reads

Why not full genomes?

1. Micro-diversity

2. Separation of genomes (Binning)

Extracting genomes

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

BinningGenomic signatures:- GC / Codon usage- Tetranucleotide frequency + statistical method

Complex sample

PhD student

”Binning”

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

BinningGenomic signatures:- GC / Codon usage- Tetranucleotide frequency + statistical method

Complex sample

PhD student

”Binning”

Problems:- Short pieces of sequence (1-10kbp)- Local sequence divergence

Sequence composition-independent binning

Sample 1

Abun

danc

e

Sample 2

Abun

danc

e

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Binning

Sequence composition-independent binning

Sample 1 Sample 2

Abundance Sample 1

Abun

danc

e Sa

mpl

e 2

Abun

danc

e

Abun

danc

e

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Binning

1. Reduce micro-diversity

2. Use multiple related samples

Abundance Sample 1

Abun

danc

e Sa

mpl

e 2

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Binning

1. Reduce micro-diversity

2. Use multiple related samples

Abundance Sample 1

Abun

danc

e Sa

mpl

e 2

Abundance Sample 1

Abun

danc

e Sa

mpl

e 2

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Binning

Simple reactors

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYH. Daims & C. Dorninger, DOME, University of Vienna

• Nitrospira enrichment running for years

• 3 dominant species

• No micro-diversity

Short term enrichment

Full-scale EBPR plantSBR reactor

Days 1. Reduction of (micro)-diversity

Competibacter

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech.

Short term enrichment

Full-scale EBPR plantSBR reactor

2. Two different

DNA extraction methods

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech.

Colored using a set of 100 phylogenetic marker genes

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech.

Colored using a set of 100 phylogenetic marker genes

TM7-1 (1.6%)

TM7-2 (0.7%)

TM7-3 (0.2%)

TM7-4 (0.06%)

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech.

Zoom on target

TM7-2 (0.7%)

Colored using a set of 100 phylogenetic marker genes

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech.

Zoom on target

PC2

PC1

TM7-2

PCA on genomic signatures

TM7-2 (0.7%)

Colored using a set of 100 phylogenetic marker genes

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech.

Colored using a set of 100 phylogenetic marker genes

TM7-1 (1.6%)

Candidate phylum TM7

Saccharibacteria

Candidatus Saccharimonas aalborgensis

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech.

CandidatusCompetibacter denitrificans

(10.6%)

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech.

Poster by S. McIlroy

Genome assembly validation

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech.

Phyla

Genes (HMM model)

Essential single copy genesAssembly inspection

Multi-metagenome

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech.

http://madsalbertsen.github.io/multi-metagenome/Short: goo.gl/0ctA3

• Guides• Workflow scripts• Example data• All the code• Reccomendations

Multi-metagenome

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Highly complex environments...

...add more samples!

Talk by SM. Karst

Potentials

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Metabolites

Proteins

mRNA

DNA

Meta-bolomics

Meta-proteomics

Meta-transcriptomics

Meta-genomics

Data integration

In Situ methods

Community structure Microbial functions

Extraction

P-Removal:

N-Removal:

-Removal:

Foaming:

Ethanol production:

Microbial needsEcology

Recommendations

• Do you really need metagenomics?

• Are the databases usefull in your environment?• Unless human related they are not...

• Metagenomics is just the parts list ... of the DNA that could be extracted... and the functions that could be annotated

• Validation, validation validation!• Bioinformatic• In situ

• Genome extraction from simple reactors is possible• Enables comprehensive transcriptomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Metagenomics is pretty...

...but not always informative

Recommended