The Emerging Global Community of Microbial Metagenomics Researchers
Opening Talk
Metagenomics 2007
Calit2@UCSD
July 11, 2007
Dr. Larry Smarr
Director, California Institute for Telecommunications and Information Technology
Harry E. Gruber Professor,
Dept. of Computer Science and Engineering
Jacobs School of Engineering, UCSD
Abstract
Calit2, the J. Craig Venter Institute, and UCSD's SDSC and Scripps Institution of Oceanography, is creating a metagenomic Community Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analysis (CAMERA), funded by the Gordon and Betty Moore Foundation. The CAMERA computational and storage cluster, which contains multiple ocean microbial metagenomic datasets, as well as the full genomes of ~166 marine microbes, is actively in use. End users can access the metagenomic data either via the web or over novel dedicated 10 Gb/s light paths (termed "lambdas") through the National LambdaRail. The end user clusters are reconfigured as "OptIPortals," providing the end user with local scalable visualization, computing, and storage. Currently over 1000 users from over 40 countries are CAMERA registered users, with over a dozen remote OptIPortal sites becoming active. This CAMERA connected community sets the stage for creating a software system to support a social network of metagenomic researchers--a "MySpace" for scientists. We look forward to gathering ideas from Metagenomics 2007 participants for the functional requirements of such a system.
Calit2 Brings Computer Scientists and Engineers Together with Biomedical Researchers
• Some Areas of Concentration:– Algorithmic and System Biology
– Bioinformatics
– Metagenomics
– Cancer Genomics
– Human Genomic Variation and Disease
– Proteomics
– Mitochondrial Evolution
– Computational Biology
– Multi-Scale Cellular Imaging
– Information Theory and Biological Systems
– Telemedicine
UC Irvine
UC Irvine
Southern California Telemedicine Learning Center (TLC)
National Biomedical Computation Resource an NIH supported resource center
PI Larry Smarr
Paul Gilna Ex. Dir.
Announced January 17, 2006$24.5M Over Seven Years
Philip Papadopoulos,
SDSC/Calit22pm Friday
CAMERA 1.1 is Up and Running!
CAMERA Combines Genomic and Metagenomic Tools
Can We Create a “My Space” for Science Researchers? Microbial Metagenomics as a Cyber-Community
Over 1000 Registered Users From 45 Countries
USA 583United Kingdom 46Canada 35France 35Germany 32
70 CAMERA Users Feedback Session
Friday 2pm Paul Gilna
• Calit2 is Prototyping Social Networks for Reseachers
• Research Intelligence Project
– ri.calit2.net
• Add in:– MyProteins
– MyMicrobes
– MyEnvironments
– MyPapers
– MyGenomes
Emerging Capabilities That Tie Together Metagenomics Researchers
• Advanced Computing Techniques• Broad Coverage of Complete Microbe Genomes
– Moore Foundation– DOE JGI
• Proteomics of Microbes• Cellular Network Models
Metagenomic Challenge--Enormous Biodiversity:Very Little of GOS Metagenomic Data Assembles Well
• Use Reference Genomes to Recruit Fragments– Compared 334 Finished and 250 Draft Microbial Genomes
• Only 5 Microbial Genera Yielded Substantial and Uniform Recruitment – Prochlorococcus, Synechococcus, Pelagibacter, Shewanella, and Burkholderia
Source: Douglas Rusch, et al. (PLOS Biology March 2007)
Use of Self Organizing Maps to Identify SpeciesMassive Computation on the Japanese Earth Simulator
Human
Fugu
Arabidopsis
RiceC. Elegans
Drosophilia
www.es.jamstec.go.jp/publication/journal/jes_vol.6/pdf/JES6_22-Abe.pdf
T. Abe, H. Sugawara, S. Kanaya, T. IkemuraJournal of the Earth Simulator, Volume 6, October 2006, 17–23
SOM Created from an
Unsupervised Neural Network
Algorithm to Analyze
Tetranucleotide Frequencies in a Wide Range of
Genomes 10kb Moving Window
Using SOM, Sargasso Sea Metagenomic Data Yields 92 Microbial Genera !
Eukaryotes
Prokaryotes
Viruses
Mitochondria
Chloroplasts
Input Genomes:
1500 Microbes 40 Eukaryotes 1065 Viruses 642 Mitochondria 42 Chloroplasts
5kb Window
T. Abe, H. Sugawara, S. Kanaya, T. IkemuraJournal of the Earth Simulator, Volume 6, October 2006, 17–23
Moore Microbial Genome Sequencing ProjectSelected Microbes Throughout the World’s Oceans
www.moore.org/microgenome/worldmap.asp
Microbes Nominated by Leading Ocean Microbial
Biologists
Moore Foundation Funded the Venter Institute to Provide the Full Genome Sequence of 155 Marine Microbes
Phylogenetic Trees Created by Uli Stingl, Oregon State
Blue Means Contains One of the Moore 155 Genomes
www.moore.org/microgenome/trees.aspx
Moore 155 Marine Microbial Genomes Gives Broad Coverage of Microbial “Tree of Life”
www.moore.org/microgenome/alpha-proteobacteria.aspx
Phylogenetic Trees Created by Uli Stingl, Oregon State
Joint Genome Institute is a Leading Microbial Genomic Source
2005
2006
termite hindgut (CalTech) planktonic archaea (MIT) EBPR sludge (UW/UQ) groundwater (ORNL)
AMD Alaskan soil (UW) Gutless worm (MPI) TA-degrading bioreactor (NUS)
Antarctic bacterioplankton (DRI) hypersaline mats (UCol) Korarchaeota enrichment Farm soil (Diversa)
2007 8 new metagenomic projects
JGI Metagenomics Projects (42 Projects)
Source: Eddie Rubin, DOE JGI
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
Key Problem with Analysis of Microbial Metagenomic Data
At Least 40 Phyla of Bacteria,But Only a Few are Well Sampled
Source: Eddie Rubin, DOE JGI
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
Well sampled phyla
No cultured taxa
DOE Genomic Encyclopedia of Bacteria and Archaea (GEBA) / Bergey Solution: Deep Sampling Across Phyla
Source: Eddie Rubin, DOE JGI
GEBA / Bergey Pilot Project at JGI
• Goal– To Finish ~100 Bacterial and Archaeal Genomes – Selected Based on:
– Phylogeny, – Availability of Phenotype Information – Community Interest
• Approach– Select 200 Organisms– Order DNA from Culture Collections (DSMZ and ATCC)– Sequence 100 for which DNA QC is Received
• Project Lead (Jonathan Eisen JGI/UC Davis)– Project Management (David Bruce JGI/LANL)– Methods for Sequencing in Changing Technology Landscape (Paul
Richardson JGI)– Linking to educational project (Cheryl Kerfeld JGI)
Input / Interactions with: Community Advisory Group ,
ASM, Academy of Microbiology,
Etc…
Source: Eddie Rubin, DOE JGI
• How many folds?• How many
sequences adopt the same fold?
• How does function vary as sequences diverge within a family?
• Are there still Kingdom-specific families?
• Can we determine function from structure?
• How diverse are metabolic pathways and networks?
Converting Genome Sequences to Protein Fold Space
JCSG: 2hxvJCSG: 2hxv
5-amino-6-(5-phosphoribosylamino)uracil reductase
Building Genome-Scale Models of Living Organisms
• E. Coli– Has 4300
Genes– Model Has
2000!
Regulatory Actions
Input Signals
Monomers &Energy
Proteins
Genomics
Transcriptomics
Proteomics
Metabolomics
EnvironmentInteractomics
Transcription &Translation
Metabolism
Regulation
E4PX5PGLC
G6P
F6P
FDP
DHAP
3PG
DPG
GA3P
2PG
PEP
PYR
AcCoA
SuccCoA
SUCC
AKG
ICIT
CIT
FUM
MAL
OAA
Ru5P
R5P
S7P
6PGA 6PG
ACTPETH
ATP
NADPHNADH FADH
SUCCxt
pts
pts
pgi
pfkA
fba
tpi
fbp
gapA
pgk
gpmA
eno
pykFppsAaceE
zwfpgl gnd
rpiA
rpe
talAtktA1 tktA2
gltA
acnA icdA
sucA
sucC
sdhA1
frdA
fumA
mdh
adhE
AC
ackA
pta
pckA
ppc
cyoA
pnt1A
sdhA2nuoA
atpA
ACxtETHxt
O2O2xt
CO2 CO2xt
Pi Pixt
O2 trx
CO2 trx
Pi trx
EXTRACELLULARMETABOLITE
reaction/gene name
Map Legend
INTRACELLULARMETABOLITE
GROWTH/BIOMASSPRECURSORS
ETH trxAC trx
SUCC trx
acs
FOR
pflA
FORxt
FOR trx
dld
LAC
LACxtLAC trx
PYRxt PYR trx
glpDgpsA
GL3P
GL glpK
GLxt
GL trx
GLCxtGLC trx
glk
RIB
rbsK
RIBxt
RIB trx
FORfdoH
pnt2A
H+ Qh2
GLX
aceA
aceB
maeB
sfcA
E4PX5PGLC
G6P
F6P
FDP
DHAP
3PG
DPG
GA3P
2PG
PEP
PYR
AcCoA
SuccCoA
SUCC
AKG
ICIT
CIT
FUM
MAL
OAA
Ru5P
R5P
S7P
6PGA 6PG
ACTPETH
ATP
NADPHNADH FADH
SUCCxt
pts
pts
pgi
pfkA
fba
tpi
fbp
gapA
pgk
gpmA
eno
pykFppsAaceE
zwfpgl gnd
rpiA
rpe
talAtktA1 tktA2
gltA
acnA icdA
sucA
sucC
sdhA1
frdA
fumA
mdh
adhE
AC
ackA
pta
pckA
ppc
cyoA
pnt1A
sdhA2nuoA
atpA
ACxtETHxt
O2O2xt
CO2 CO2xt
Pi Pixt
O2 trx
CO2 trx
Pi trx
EXTRACELLULARMETABOLITE
reaction/gene name
Map Legend
INTRACELLULARMETABOLITE
GROWTH/BIOMASSPRECURSORS
ETH trxAC trx
SUCC trx
acs
FOR
pflA
FORxt
FOR trx
dld
LAC
LACxtLAC trx
PYRxt PYR trx
glpDgpsA
GL3P
GL glpK
GLxt
GL trx
GLCxtGLC trx
glk
RIB
rbsK
RIBxt
RIB trx
FORfdoH
pnt2A
H+ Qh2
GLX
aceA
aceB
maeB
sfcA
E4PX5PGLC
G6P
F6P
FDP
DHAP
3PG
DPG
GA3P
2PG
PEP
PYR
AcCoA
SuccCoA
SUCC
AKG
ICIT
CIT
FUM
MAL
OAA
Ru5P
R5P
S7P
6PGA 6PG
ACTPETH
ATP
NADPHNADH FADH
SUCCxt
pts
pts
pgi
pfkA
fba
tpi
fbp
gapA
pgk
gpmA
eno
pykFppsAaceE
zwfpgl gnd
rpiA
rpe
talAtktA1 tktA2
gltA
acnA icdA
sucA
sucC
sdhA1
frdA
fumA
mdh
adhE
AC
ackA
pta
pckA
ppc
cyoA
pnt1A
sdhA2nuoA
atpA
ACxtETHxt
O2O2xt
CO2 CO2xt
Pi Pixt
O2 trx
CO2 trx
Pi trx
EXTRACELLULARMETABOLITE
reaction/gene name
Map Legend
INTRACELLULARMETABOLITE
GROWTH/BIOMASSPRECURSORS
ETH trxAC trx
SUCC trx
acs
FOR
pflA
FORxt
FOR trx
dld
LAC
LACxtLAC trx
PYRxt PYR trx
glpDgpsA
GL3P
GL glpK
GLxt
GL trx
GLCxtGLC trx
glk
RIB
rbsK
RIBxt
RIB trx
FORfdoH
pnt2A
H+ Qh2
GLX
aceA
aceB
maeB
sfcA
E4PX5PGLC
G6P
F6P
FDP
DHAP
3PG
DPG
GA3P
2PG
PEP
PYR
AcCoA
SuccCoA
SUCC
AKG
ICIT
CIT
FUM
MAL
OAA
Ru5P
R5P
S7P
6PGA 6PG
ACTPETH
ATP
NADPHNADH FADH
SUCCxt
pts
pts
pgi
pfkA
fba
tpi
fbp
gapA
pgk
gpmA
eno
pykFppsAaceE
zwfpgl gnd
rpiA
rpe
talAtktA1 tktA2
gltA
acnA icdA
sucA
sucC
sdhA1
frdA
fumA
mdh
adhE
AC
ackA
pta
pckA
ppc
cyoA
pnt1A
sdhA2nuoA
atpA
ACxtETHxt
O2O2xt
CO2 CO2xt
Pi Pixt
O2 trx
CO2 trx
Pi trx
EXTRACELLULARMETABOLITE
reaction/gene name
Map Legend
INTRACELLULARMETABOLITE
GROWTH/BIOMASSPRECURSORS
ETH trxAC trx
SUCC trx
acs
FOR
pflA
FORxt
FOR trx
dld
LAC
LACxtLAC trx
PYRxt PYR trx
glpDgpsA
GL3P
GL glpK
GLxt
GL trx
GLCxtGLC trx
glk
RIB
rbsK
RIBxt
RIB trx
FORfdoH
pnt2A
H+ Qh2
GLX
aceA
aceB
maeB
sfcA
G1 + RNAP G1*
v1
nNTP
mRNA1 nNMPb4
b2
v2
v3=k1[mRNA1]
2aGTP
rib
rib1*
protein1b3
v4 (subject to global max.)
v5
aAA-tRNA
b7
2aGDP + 2aPib8
b5
b1 aAAatRNA
aATP
aAMP
+ 2aPi
b6
v6
2nPi
Pi
b9
G1 + RNAP G1*
v1
nNTP
mRNA1 nNMPb4
b2
v2
v3=k1[mRNA1]
2aGTP
rib
rib1*
protein1b3
v4 (subject to global max.)
v5
aAA-tRNA
b7
2aGDP + 2aPib8
b5
b1 aAAatRNA
aATP
aAMP
+ 2aPi
b6
v6
2nPi2nPi
Pi
b9
Pi
b9
G1 + RNAP G1*
v1
nNTP
mRNA1 nNMPb4
b2
v2
v3=k1[mRNA1]
2aGTP
rib
rib1*
protein1b3
v4 (subject to global max.)
v5
aAA-tRNA
b7
2aGDP + 2aPib8
b5
b1 aAAatRNA
aATP
aAMP
+ 2aPi
b6
v6
2nPi
Pi
b9
G1 + RNAP G1*
v1
nNTP
mRNA1 nNMPb4
b2
v2
v3=k1[mRNA1]
2aGTP
rib
rib1*
protein1b3
v4 (subject to global max.)
v5
aAA-tRNA
b7
2aGDP + 2aPib8
b5
b1 aAAatRNA
aATP
aAMP
+ 2aPi
b6
v6
2nPi2nPi
Pi
b9
Pi
b9
Gc2
tc2
Rc2
Pc2 Carbon2A
Oc2
Carbon1
(indirect)
(-)
If [Carbon1] > 0, tc2 = 0
G2a
t2a
R2a
P2a BC + 2 ATP + 3 NADH
O2a
B(+)
G5
t5
R5
P5 C + 4 NADH
O5
(+)
3 E
If R1 = 0, we say [B] is not in surplus, t2a = t5 = 0
G6a
t6a
R6a
P6aH
O6a
(-)
Hext
If Rh> 0, [H] is in surplus, t6a = 0
Gres
tres
Rres
Pres O2 + NADH
ATP
Ores
O2
(+)
G3b
t3b
R3b
P3bG
O3b
(+)
0.8 C + 2 NADH
If Oxygen = 0, we say [O2] = 0, tres= t3b = 0
G + 1 ATP + 2 NADH
Gc2
tc2
Rc2
Pc2 Carbon2A
Oc2
Carbon1
(indirect)
(-)
If [Carbon1] > 0, tc2 = 0
G2a
t2a
R2a
P2a BC + 2 ATP + 3 NADH
O2a
B(+)
G5
t5
R5
P5 C + 4 NADH
O5
(+)
3 E
If R1 = 0, we say [B] is not in surplus, t2a = t5 = 0
G6a
t6a
R6a
P6aH
O6a
(-)
Hext
If Rh> 0, [H] is in surplus, t6a = 0
Gres
tres
Rres
Pres O2 + NADH
ATP
Ores
O2
(+)
G3b
t3b
R3b
P3bG
O3b
(+)
0.8 C + 2 NADH
If Oxygen = 0, we say [O2] = 0, tres= t3b = 0
G + 1 ATP + 2 NADH
E. coli i2K
Source: Bernhard PalssonUCSD Genetic Circuits Research Group
http://gcrg.ucsd.edu
JTB 2002
JBC 2002
in Silico Organisms Now Available
2007:
•Escherichia coli •Haemophilus influenzae •Helicobacter pylori •Homo sapiens Build 1•Human red blood cell •Human cardiac mitochondria •Methanosarcina barkeri •Mouse Cardiomyocyte •Mycobacterium tuberculosis •Saccharomyces cerevisiae •Staphylococcus aureus
Biochemically, Genetically and Genomically (BiGG) Genome-Scale Metabolic Reconstructions
H. influenzae
H. pylori
S. aureus
S. typhimurium
M. barkeri• 619 Reactions• 692 Genes
S. cerevisiae• 1402 Reactions• 910 Genes
E. coli• 2035 Reactions• 1260 Genes
S. aureus• 640 Reactions• 619 Genes Mitoc.
• 218 Rxns
RBC• 39 Rxns
H. sapiens• 3311 Reactions• 1496 Genes
S. typhimurium• 898 Reactions• 826 Genes
H. pylori• 558 Reactions• 341 Genes
H. influenzae• 472 Reactions• 376 Genes
M. tuberculosis• 939 Reactions• 661 Genes
Systems Biology Research Grouphttp://systemsbiology.ucsd.edu
Use of Tiled Display Wall OptIPortal to Interactively View Microbial Genome
Acidobacteria bacterium Ellin345 Soil Bacterium 5.6 Mb
Use of Tiled Display Wall OptIPortal to Interactively View Microbial Genome
Source: Raj Singh, UCSD
Use of Tiled Display Wall OptIPortal to Interactively View Microbial Genome
Source: Raj Singh, UCSD
OptIPortal–Termination Device for the Dedicated Gigabit/sec Lightpaths
Photo Source: David Lee, Mark Ellisman NCMIR, UCSD
Collaborative Analysis of Large Scale Images of
Cancer Cells
Integration of High
Definition Video
Streamswith Large
Scale Image Display Walls
NW!
CICESE
UW
JCVI
MIT
SIO UCSD
SDSU
UIC EVL
UCI
OptIPortals
OptIPortal
An Emerging High Performance Collaboratoryfor Microbial Metagenomics
UC Davis
UMich