13
S L I D E 0 VarBase: A Platform for the Storage and Clinical Interpretation of Next Generation Sequencing Data Wade L. Schulz, MD, PhD, John G. Howe, PhD, Karl Hager, PhD, Henry M. Rinder, MD Yale University, Department of Laboratory Medicine

VarBase: A Platform for the Storage and Clinical Interpretation of Next Generation Sequencing Data

Embed Size (px)

DESCRIPTION

2015 Association for Pathology Informatics presentation on the Yale Department of Laboratory Medicine next generation sequencing software.

Citation preview

  • S L I D E 0

    VarBase: A Platform for the Storage and Clinical Interpretation of Next Generation Sequencing Data

    Wade L. Schulz, MD, PhD, John G. Howe, PhD, Karl Hager, PhD, Henry M. Rinder, MD

    Yale University, Department of Laboratory Medicine

  • S L I D E 1

    The Problem

    Next generation sequencing panel brought online for leukemia and myelodysplastic syndrome, needed interpretive and data management software

    Interpretation efficiency

    Patient safety

    Turnaround time

  • S L I D E 2

    Project Goals

    Integrate annotation information from disparate data silos

    Provide interpretation interface

    Generate digital and printed reports

    Provide robust research tools for ongoing clinical and laboratory studies

  • S L I D E 3

    The VarBase Platform

    Research Applications

    Web Interface

    VarBase - Kepler

    Internal Clients

    Galileo Services

    IonReporter

    Cloud (Kepler)

    Local (Galileo)

  • S L I D E 4

    Relations vs. Documents: SQL vs NoSQL

    SQL/Relational

    ACID Compliant

    Column-level encryption

    Well-known deployment

    NoSQL/Document

    Dynamic schema

    Improved read/write speeds

    Easy scaling and redundancy

  • S L I D E 5

    Tool Selection: Public Data Repository (Kepler)

    Elasticsearch

    Web service wrapper

    Administrative import interface COSMIC

    ClinVar

    OMIM

    dbSNP

    Index updated every 3 months

    Research Applications

    Web Interface

    VarBase - Kepler

    Internal Clients

    Galileo Services

    IonReporter

  • S L I D E 6

    Tool Selection: Private Data Warehouse (Galileo)

    MSSQL + Elasticsearch

    Patient data encrypted in SQL

    Panel information stored in SQL

    Non-demographic information in SQL+Elasticsearch

    Variants in Elasticsearch+Disk

    Document database caveat: Most are not ACID compliant, should not be used as a primary data store

  • S L I D E 7

    Tool Selection: Authentication/Authorization

    Institutional Active Directory

    Role-based authorization

    Web application restrictions Laboratory personnel

    Trainee (Resident/Fellow)

    Attending

    Web interface restrictions Patient-based restriction

    Data-type restriction

    Research Applications

    Web Interface

    VarBase - Kepler

    Internal Clients

    Galileo Services

    IonReporter

  • S L I D E 8

    Variants in JSON

    { "chromosome": "chr7", "position": 148506396, "type": "snv", "refAllele": "A", "altAllele": "C", "totalReads": 1998, "forwardReads": 1038, "forwardRefReads": 524, "forwardAltReads": 514, "reverseReads": 960, "reverseRefReads": 500, "reverseAltReads": 460, "refReads": 1024, "altReads": 974, "vaf": 48.749, "variantRegion": "intronic", "variantEffect": "", "snvEffect": "A>C", "gene": "EZH2"

    }

    - Variant location in genome

    - Nucleotide change

    - Sequencing statistics

    - Variant allele frequency

    - Variant coding/protein effects

  • S L I D E 9

    Research Integration: Data Visualization (Kibana)

  • S L I D E 10

    System Statistics

    Cloud-Based Elasticsearch Cluster

    60 million variant annotations

    10 million oncology annotations

    Local Elasticsearch+MSSQL Instance

    >80 specimens + validation specimens

    Turnaround time: 1-2 weeks

  • S L I D E 11

    Conclusions

    Hybrid data store can efficiently and securely store complex data types

    Cloud-based variant annotation can be integrated into multiple services and provides auditable interpretation information

    Technology agnostic web service interfaces provide easily accessible data interchange

  • S L I D E 12

    Acknowledgements

    Henry Rinder, MD

    Alexa Siddon, MD

    Richard Torres, MD

    Christopher Tormey, MD

    Thomas Durant, MD

    Molecular Pathology Laboratory

    John G. Howe, PhD

    Karl Hager, PhD

    Laboratory Informatics

    Rodion Rathbone, MD

    Nathan Price