Upload
wadeschulz
View
23
Download
2
Tags:
Embed Size (px)
DESCRIPTION
2015 Association for Pathology Informatics presentation on the Yale Department of Laboratory Medicine next generation sequencing software.
Citation preview
S L I D E 0
VarBase: A Platform for the Storage and Clinical Interpretation of Next Generation Sequencing Data
Wade L. Schulz, MD, PhD, John G. Howe, PhD, Karl Hager, PhD, Henry M. Rinder, MD
Yale University, Department of Laboratory Medicine
S L I D E 1
The Problem
Next generation sequencing panel brought online for leukemia and myelodysplastic syndrome, needed interpretive and data management software
Interpretation efficiency
Patient safety
Turnaround time
S L I D E 2
Project Goals
Integrate annotation information from disparate data silos
Provide interpretation interface
Generate digital and printed reports
Provide robust research tools for ongoing clinical and laboratory studies
S L I D E 3
The VarBase Platform
Research Applications
Web Interface
VarBase - Kepler
Internal Clients
Galileo Services
IonReporter
Cloud (Kepler)
Local (Galileo)
S L I D E 4
Relations vs. Documents: SQL vs NoSQL
SQL/Relational
ACID Compliant
Column-level encryption
Well-known deployment
NoSQL/Document
Dynamic schema
Improved read/write speeds
Easy scaling and redundancy
S L I D E 5
Tool Selection: Public Data Repository (Kepler)
Elasticsearch
Web service wrapper
Administrative import interface COSMIC
ClinVar
OMIM
dbSNP
Index updated every 3 months
Research Applications
Web Interface
VarBase - Kepler
Internal Clients
Galileo Services
IonReporter
S L I D E 6
Tool Selection: Private Data Warehouse (Galileo)
MSSQL + Elasticsearch
Patient data encrypted in SQL
Panel information stored in SQL
Non-demographic information in SQL+Elasticsearch
Variants in Elasticsearch+Disk
Document database caveat: Most are not ACID compliant, should not be used as a primary data store
S L I D E 7
Tool Selection: Authentication/Authorization
Institutional Active Directory
Role-based authorization
Web application restrictions Laboratory personnel
Trainee (Resident/Fellow)
Attending
Web interface restrictions Patient-based restriction
Data-type restriction
Research Applications
Web Interface
VarBase - Kepler
Internal Clients
Galileo Services
IonReporter
S L I D E 8
Variants in JSON
{ "chromosome": "chr7", "position": 148506396, "type": "snv", "refAllele": "A", "altAllele": "C", "totalReads": 1998, "forwardReads": 1038, "forwardRefReads": 524, "forwardAltReads": 514, "reverseReads": 960, "reverseRefReads": 500, "reverseAltReads": 460, "refReads": 1024, "altReads": 974, "vaf": 48.749, "variantRegion": "intronic", "variantEffect": "", "snvEffect": "A>C", "gene": "EZH2"
}
- Variant location in genome
- Nucleotide change
- Sequencing statistics
- Variant allele frequency
- Variant coding/protein effects
S L I D E 9
Research Integration: Data Visualization (Kibana)
S L I D E 10
System Statistics
Cloud-Based Elasticsearch Cluster
60 million variant annotations
10 million oncology annotations
Local Elasticsearch+MSSQL Instance
>80 specimens + validation specimens
Turnaround time: 1-2 weeks
S L I D E 11
Conclusions
Hybrid data store can efficiently and securely store complex data types
Cloud-based variant annotation can be integrated into multiple services and provides auditable interpretation information
Technology agnostic web service interfaces provide easily accessible data interchange
S L I D E 12
Acknowledgements
Henry Rinder, MD
Alexa Siddon, MD
Richard Torres, MD
Christopher Tormey, MD
Thomas Durant, MD
Molecular Pathology Laboratory
John G. Howe, PhD
Karl Hager, PhD
Laboratory Informatics
Rodion Rathbone, MD
Nathan Price