Upload
noah-rogers
View
215
Download
1
Tags:
Embed Size (px)
Citation preview
Kayo Arima
California Institute for Telecommunications and
Information Technology (Calit2)-University of
California, San Diego Division
Cyber Metagenomics; Challenge to See The Unseen Majority in The Ocean
Looking Back Nearly 4 Billion YearsIn the Evolution of Microbe Genomics
Science Falkowski and Vargas 304 (5667): 58
Eukaryote has the nuclei .
Prokaryotes has genes but
no nuclear membrane.
Evolution is the Principle of Biological Systems:Most of Evolutionary Time Was in the Microbial World
You Are
Here
Source: Carl Woese, et al
Much of Genome Work Has
Occurred in Animals
Two completely different approach to get microbial genomic information
Microbial whole genomics Metagenomics
Source: Karin RemingtonJ. Craig Venter Institute
Environmental sample
DNA extraction
Enz. digestion
Shotgun sequencing
Scaffold assembly
Environmental sample
Culture (grow) in lab
Isolate the colony
Culture the isolated colony
DNA extraction
Enz. digestion
Shotgun sequencing
Gene assembly
Down Side of Metagenomics
• Often fragmentary
• Often highly
divergent
• Rarely any known
activity
• No chromosomal
placement
• No organism of origin
• Ab initio ORF
predictions
• Huge data
Genomic Data Is Growing Rapidly, But Metagenomics Will Vastly Increase The Scale…
GenBank Protein Data Bank
www.rcsb.org/pdb/holdings.htmlwww.ncbi.nlm.nih.gov/Genbank
100 Billion Bases!
Total Data < 1TB
35,000 Structures
Full Genome Sequencing is Exploding:Most Sequenced Genomes are Bacterial
Archaeal
Bacterial
Eukaryal
Total 1665
Ongoing Genomes
www.genomesonline.org
First Genome 1995 6 Genomes/ Year 2000
Total 422
Completed Genomes
90Metagenomes
Marine Metagenomics
• Microbes account for more than 90% of ocean
biomass, mediate all biochemical cycles in the
oceans and are responsible for 98% of primary
production in the sea.
• Metagenomics is a breakthrough sequencing
approach to examine the open-space microbial
species without the need for isolation and lab
cultivation of individual species.
PI Larry Smarr
Paul Gilna Ex. Dir.
PI Larry Smarr
Marine Genome Sequencing ProjectMeasuring the Genetic Diversity of Ocean Microbes
Sorcerer II Data from this area has already reach to 10% of GenBank.
The Entire Data Will Double Number of Proteins in Embank!
Sample Metadata from GOS
• Site Metadata
– Location (lat/long, water depth)
– Site characterization (finite list of types plus “other”)
– Site description (free text)
– Country
• Sampling Metadata– Sample collection date/time
– Sampling depth
– Conditions at time of sampling (e.g., stormy, surface temperature)
– Sample physical/chemical measurements (T (oC), S (ppt), chl a (mg m-3), etc)
– “author”
• Experimental Parameters– Filter size
– Insert size
Flat FileServerFarm
W E
B P
OR
TA
L
TraditionalUser
Response
Request
DedicatedCompute Farm(1000 CPUs)
TeraGrid: Cyberinfrastructure Backplane(scheduled activities, e.g. all by all comparison)
(10000s of CPUs)
Data-BaseFarm
10 GigE Fabric
Calit2’s Direct Access Core Architecture Will Create Next Generation Metagenomics Server
Source: Phil Papadopoulos, SDSC, Calit2+
We
b S
erv
ice
s
Sargasso Sea Data
Sorcerer II Expedition (GOS)
JGI Community Sequencing Project
Moore Marine Microbial Project
NASA Goddard Satellite Data
Community Microbial Metagenomics Data
Web(other service)
Local Cluster
LocalEnvironment
DirectAccess LambdaCnxns
Marine Metagenomics
Who is there?
Drug discovery
Environmental surveyMicrobial genetic survey
Microbial genomic survey
Symbiosis
Organism discovery
Marine conservation
Evolution study
Bioenergy discovery
Endosymbiosis
Biogeochemistry mapping
Metabolic pathway discovery