View
215
Download
0
Tags:
Embed Size (px)
Citation preview
Scalable metabolic reconstructionfor metagenomic data
and the human microbiome
Sahar Abubucker, Nicola Segata, Johannes Goll,Alyxandria Schubert, Beltran Rodriguez-Mueller, Jeremy Zucker,
the Human Microbiome Project Metabolic Reconstruction team,the Human Microbiome Consortium,
Patrick D. Schloss, Dirk Gevers, Makedonka Mitreva,Curtis Huttenhower
04-14-11Harvard School of Public HealthDepartment of Biostatistics
2
What’s metagenomics?Total collection of microorganisms
within a community
Also microbial community or microbiota
Total genomic potential of a microbial community
Total biomolecular repertoire of a microbial community
Study of uncultured microorganisms from the environment, which can include
humans or other living hosts
The Human Microbiome Project for a normal population
300 People/15(18) Body
Sites
Multifaceted analyses
2 clin. centers, 4 seq. centers, data generation,technology development, computational tools,
ethics…
Multifaceted data
Human population
Microbial population
Novel organisms
BiotypesVirusesMetabolism
>12,000 samples>50M 16S seqs.4.6Tbp unique
metagenomicsequence
>1,900 referencegenomes
Full clinicalmetadata
15+ Demonstration Projects for microbial communities in disease
Gastrointestinal
All include additional subjects and technology development
Skin
PsoriasisAcneAtopic
dermatitis
ObesityCrohn’s diseaseUlcerative
colitisAutoimmunityCancerNecrotizing
enterocolitis
Urogenital
Bacterialvaginosis
STDsReproductive
health
5
What to do with your metagenome?
(x1010)
Diagnostic or prognostic
biomarker for host disease
Public health tool monitoring
population health and interactions
Comprehensive snapshot of
microbial ecology and evolution
Reservoir of gene and protein
functional informationWho’s there?
What are they doing?
What do functional genomic data tell us about microbiomes?
What can our microbiomes tell us about us?
6
GeneexpressionSNPgenotypes
Metabolic/Functional Reconstruction: The Goal
Healthy/IBDBMI
Diet
Taxon abundancesEnzyme family abundancesPathway abundances
Batch effects?Populationstructure?
Niches &Phylogeny
Test for correlates
Multiplehypothesiscorrection
Featureselection
p >> n
Confounds/stratification/environment
Cross-validate
Biological story?
Independent sample
Intervention/perturbation
7
HMP: Metabolic reconstruction
WGS reads
Pathways/modules
Genes(KOs)
Pathways(KEGGs)
Functional seq.KEGG + MetaCYC
CAZy, TCDB,VFDB, MEROPS…
BLAST → Genes
rra
r
raa
p
gap
ggc
)(
)(
1
)()1(
||
1)(
Genes → PathwaysMinPath (Ye 2009)
SmoothingWitten-Bell
otherwiseTNNgc
gcTNTVTNgc
)/()(
0)()/()/()(Gap filling
c(g) = max( c(g), median )
300 subjects1-3 visits/subject~6 body sites/visit
10-200M reads/sample100bp reads
BLAST
?Taxonomic limitation
Rem. paths in taxa < ave.
XipeDistinguish zero/low
(Rodriguez-Mueller in review)
HUMAnN:HMP UnifiedMetabolic AnalysisNetworkhttp://huttenhower.sph.harvard.edu/humann
8
HMP: Metabolic reconstruction
Pathway coverage Pathway abundance
9
HUMAnN: Validating gene and pathwayabundances on synthetic data
Validated on individual gene families, module coverage, and abundance
• 4 synthetic communities:Low (20 org.) and high (100 org.)
complexityEven and lognormal abundances
• Few false negatives: short genes (<100bp),
taxonomically rare pathways
• Few false positives: large and multicopy
(not many in bacteria)
Individual gene families
ρ=0.86
10
Functional modules in 741 HMP samples
Coverage
Abundance
ANO(BM)PF O(SP)S RCO(TD)
← Samples →
← P
ath
wa
ys
→
• Zero microbes (of ~1,000)
are core among body sites
• Zero microbes are core
among individuals
• 19 (of ~220) pathways are
present in every sample
• 53 pathways are present in
90%+ samples
• Only 31 (of 1,110) pathways
are present/absent from
exactly one body site
• 263 pathways aredifferentially
abundant inexactly one body
site
11
A portrait of the human microbiome:Who’s there?
With Jacques Izard, Susan Haake, Katherine Lemon
12
Pat
hway
cov
erag
e
A portrait of the human microbiome:What are they doing?
13
HMP: How do microbes vary within each body site across the population?
14
HMP: How do body sites compare between individuals across the population?
15
HMP: Penetrance of species (OTUs)across the population
Data from Pat Schloss
16
HMP: Penetrance of genera (phylotypes)across the population
Data from Pat Schloss
17
HMP: Penetrance of pathwaysacross the population
KEGG Metabolic modulesM00001: Glycolysis (Embden-Meyerhof)
M00002: Glycolysis, core moduleM00003: Gluconeogenesis
M00004: Pentose phosphate cycleM00007: Pentose phosphate (non-oxidative)
M00049: Adenine biosynthesisM00050: Guanine biosynthesis
M00052: Pyrimidine biosynthesisM00053: Pyrimidine deoxyribonuleotide biosynthesis
M00120: Coenzyme A biosynthesisM00126: Tetrahydrofolate biosynthesis
M00157: F-type ATPase, bacteriaM00164: ATP synthase
M00178: Ribosome, bacteriaM00183: RNA polymerase, bacteria
M00260: DNA polymerase III complex, bacteriaM00335: Sec (secretion) system
M00359: Aminoacyl-tRNA biosynthesisM00360: Aminoacyl-tRNA biosynthesis, prokaryotes
M00362: Nucleotide sugar biosynthesis, prokaryotesM00006: Pentose phosphate (oxidative)
M00051: Uridine monophosphate biosynthesisM00125: Riboflavin biosynthesis
M00008: Entner-Doudoroff pathwayM00239: Peptides/nickel transport system
M00018: Threonine biosynthesisM00168: CAM (Crassulacean acid metabolism), dark
M00167: Reductive pentose phosphate cycle
• Human microbiome functional structure dictated
primarily by microbial niche, not host (in health)
• Huge variation among hosts in who’s there;
small variation in what they’re doing
• Note: definitely variation in how thesefunctions are implemented
• Does not yet speak in detail to hostenvironment (diet!), genetics, or disease
18
Population summary statistics population biology
Posterior fornix, ref. genomes← Individuals →
← S
pe
cie
s →
← P
ath
wa
ys
→
Lactobacillus inersLactobacillus crispatusGardnerella vaginalisLactobacillus jenseniiLactobacillus gasseri
Posterior fornix, functional modules
Basic biology, sugar transport
Essential amino acids
Urea cycle, amines, aromatic AAs
19
LEfSe: Metagenomic classcomparison and explanation
LEfSe
http://huttenhower.sph.harvard.edu/lefse
Nicola Segata
LDA +Effect Size
20
LEfSe: Evaluation on synthetic data
Their FP rate
Our FP rate
21
Microbes characteristic of theoral and gut microbiota
Aerobic, microaerobic and anaerobic communities
• High oxygen:skin, nasal
• Mid oxygen:vaginal, oral
• Low oxygen:gut
23
LEfSe: The TRUC murine colitis microbiotaWith Wendy Garrett
24
Microbial biomolecular function and biomarkers in the human microbiome: the story so far?
• Who’s there changes– What they’re doing doesn’t (as much)– How they’re doing it does
• The data so far only scratch the surface– Only 1/3 to 2/3 of the reads/sample map to cataloged gene families– Only 1/3 to 2/3 of these gene families have cataloged functions– Very much in line with MetaHIT study of gut microbiota– Job security!
• Looking forward to functional reconstruction…– In environmental communities– With respect to host environment + genetics– With respect to host disease
25
Thanks!
Jacques IzardWendy Garrett
Pinaki SarderNicola Segata
Levi Waldron LarisaMiropolsky
Interested? We’re recruiting graduate students and postdocs!
Human Microbiome Project
HMP Metabolic Reconstruction
George WeinstockJennifer WortmanOwen WhiteMakedonka MitrevaErica SodergrenMihai PopVivien Bonazzi Jane PetersonLita Proctor
Sahar AbubuckerYuzhen Ye
Beltran Rodriguez-MuellerJeremy ZuckerQiandong Zeng
Mathangi ThiagarajanBrandi Cantarel
Maria RiveraBarbara Methe
Bill KlimkeDaniel Haft
Ramnik Xavier Dirk Gevers
Bruce Birren Mark DalyDoyle Ward Eric AlmAshlee Earl Lisa Cosimi
http://huttenhower.sph.harvard.eduhttp://huttenhower.sph.harvard.edu/humann
http://huttenhower.sph.harvard.edu/lefse