View
238
Download
0
Category
Preview:
Citation preview
8/12/2019 Genomic Database Performance Improvements With Document-Based Database Architecture
http://slidepdf.com/reader/full/genomic-database-performance-improvements-with-document-based-database-architecture 1/17
Genomic DatabasePerformance Improvements
With Document-BasedDatabase Architecture
Wade L. Schulz MD/PhDDonn K. Felker
Brent G. Nelson MD
Sponsor: Michael Linden MD/PhD
presentations.wadeschulz.com/aclps2014
8/12/2019 Genomic Database Performance Improvements With Document-Based Database Architecture
http://slidepdf.com/reader/full/genomic-database-performance-improvements-with-document-based-database-architecture 2/17
DisclosuresStakeholders in AgileMedicine, which does not provideany genomics-related software, products, or services.
Whole-genomesequencing
“The obvious laboratory testto determine the basis ofevery disease process.”
8/12/2019 Genomic Database Performance Improvements With Document-Based Database Architecture
http://slidepdf.com/reader/full/genomic-database-performance-improvements-with-document-based-database-architecture 3/17
“Shotgun sequencing”described
Capillaryelectrophoresis
released
First commercialsequencer (ABI Prism)
Pyrosequencingdeveloped
1980 1986
1995
Sequencing Evolution
slide: 3 / 17CLPS 2014 Genomic Database Performance
1998
2005
First commercialpyrosequencer
(454 Life Sciences)
2010
Semiconductorsequencer released
(Ion Torrent)
8/12/2019 Genomic Database Performance Improvements With Document-Based Database Architecture
http://slidepdf.com/reader/full/genomic-database-performance-improvements-with-document-based-database-architecture 4/17
Database Evolution
slide: 4 / 17CLPS 2014 Genomic Database Performance
Relational databasedefined
First NoSQL
databases begin toemerge
MySQL released MongoDB released
1970
1995 2007 2009984
Sybase released
8/12/2019 Genomic Database Performance Improvements With Document-Based Database Architecture
http://slidepdf.com/reader/full/genomic-database-performance-improvements-with-document-based-database-architecture 5/17
Database Evolution
slide: 5 / 17CLPS 2014 Genomic Database Performance
8/12/2019 Genomic Database Performance Improvements With Document-Based Database Architecture
http://slidepdf.com/reader/full/genomic-database-performance-improvements-with-document-based-database-architecture 6/17
1Assess efficiency ofrelational and documentdatabases for storinggenomic annotations
Experimental Goals
slide: 6 / 17
2Quantify the benefit of in-memory indexing to querygenomic annotations
3Determine whethertraditional disk or solidstate drives improvedatabase performance
ACLPS 2014 Genomic Database Performance
8/12/2019 Genomic Database Performance Improvements With Document-Based Database Architecture
http://slidepdf.com/reader/full/genomic-database-performance-improvements-with-document-based-database-architecture 7/17
ParseWrite recordsor documentsinto database
Load IndexCreateindexes
Create data setfrom dbSNPannotation
Query documentsand single/multi-table records
Query
Experimental Design
slide: 7 / 17CLPS 2014 Genomic Database Performance
61,268,661 records
8/12/2019 Genomic Database Performance Improvements With Document-Based Database Architecture
http://slidepdf.com/reader/full/genomic-database-performance-improvements-with-document-based-database-architecture 8/17
Virtual HardwareAmazon EC2 Digital Ocean
Operating System Amazon Linux (x64) CentOS (x64)
Processors 4 vCPU/8 ECU 8 vCPU
Memory 15 GB 16 GB
Disk TypeElastic Block Store
or PIOPSSolid State
slide: 8 / 17CLPS 2014 Genomic Database Performance
Data Models{
_id: ObjectId(),has_sig: bool,rsid: string,
chr: string,loci:[
{gene: string,mrna_acc: string,class: string
}]
}
MongoDB MySQL
8/12/2019 Genomic Database Performance Improvements With Document-Based Database Architecture
http://slidepdf.com/reader/full/genomic-database-performance-improvements-with-document-based-database-architecture 9/17
What actually happens?Results
8/12/2019 Genomic Database Performance Improvements With Document-Based Database Architecture
http://slidepdf.com/reader/full/genomic-database-performance-improvements-with-document-based-database-architecture 10/17
Write Speed
slide: 10 / 17CLPS 2014 Genomic Database Performance
8/12/2019 Genomic Database Performance Improvements With Document-Based Database Architecture
http://slidepdf.com/reader/full/genomic-database-performance-improvements-with-document-based-database-architecture 11/17
slide: 11 / 17CLPS 2014 Genomic Database Performance
Index CreationMongoDB
MySQL
8/12/2019 Genomic Database Performance Improvements With Document-Based Database Architecture
http://slidepdf.com/reader/full/genomic-database-performance-improvements-with-document-based-database-architecture 12/17
slide: 12 / 17CLPS 2014 Genomic Database Performance
Query EfficiencyString: Search for number of SNPs with gene code (COMT: 842 records)
- MongoDB: {"loci.gene":"GRIN2B"}- MySQL: “SELECT count(distinct s.rsid)
FROM locus l, snp sWHERE l.snp_id = s.id AND l .gene = COMT‘”
Boolean: Search for number of records with clinical significance annotation- MongoDB: {"has_sig":"true"}- MySQL: "SELECT count(s.id)
FROM locus l, snp sWHERE l.snp_id = s.id AND s.has_sig = true"
MongoDB
MySQL
8/12/2019 Genomic Database Performance Improvements With Document-Based Database Architecture
http://slidepdf.com/reader/full/genomic-database-performance-improvements-with-document-based-database-architecture 13/17
Why? In-place update?
slide: 13 / 17CLPS 2014 Genomic Database Performance
A B A BC
A B C
H D D
S S D
8/12/2019 Genomic Database Performance Improvements With Document-Based Database Architecture
http://slidepdf.com/reader/full/genomic-database-performance-improvements-with-document-based-database-architecture 14/17
Conclusions
- Drive type candrastically affectwrite speed
- MongoDB hassignificantly higherwrite speeds,especially for largeimports
Write speed- MySQL is more efficient
at creating Booleanindexes
- Index creation is
otherwise comparableon traditional disk
- MySQL index creationrates may suffer on SSD
Indexing
- In-memory indexing ofMongoDB provides asignificant performanceadvantage (~150x)
Queries
slide: 14 / 17CLPS 2014 Genomic Database Performance
8/12/2019 Genomic Database Performance Improvements With Document-Based Database Architecture
http://slidepdf.com/reader/full/genomic-database-performance-improvements-with-document-based-database-architecture 15/17
WadeSchulz
DonnFelker
Clinical Pathology
Resident,Yale University
BrentNelson
Healthcare Software
Architect,Mobile and CloudComputing
Neuromodulation
Fellow,University of Minnesota
team
slide: 15 / 17CLPS 2014 Genomic Database Performance
8/12/2019 Genomic Database Performance Improvements With Document-Based Database Architecture
http://slidepdf.com/reader/full/genomic-database-performance-improvements-with-document-based-database-architecture 16/17
Questions
8/12/2019 Genomic Database Performance Improvements With Document-Based Database Architecture
http://slidepdf.com/reader/full/genomic-database-performance-improvements-with-document-based-database-architecture 17/17
Presentation Resources
ACLPS 2014 Genomic Database Performance slide: 17 / 17
presentations.wadeschulz.com/aclps2014
github.com/wadeschulz/research_snpdb
Recommended