27
genomeinabottl e.org Genome in a Bottle Consortium GIAB/GRC Pre-ASHG Workshop October 5, 2015 Reference Materials for Clinical Applications of Human Genome Sequencing Justin Zook and Marc Salit National Institute of Standards and Technology

GIAB-GRC workshop oct2015 giab introduction 151005

Embed Size (px)

Citation preview

Page 1: GIAB-GRC workshop oct2015 giab introduction 151005

genomeinabottle.org

Genome in a Bottle Consortium GIAB/GRC Pre-ASHG Workshop

October 5, 2015

Reference Materials for Clinical Applications of Human Genome Sequencing

Justin Zook and Marc SalitNational Institute of Standards and Technology

Page 2: GIAB-GRC workshop oct2015 giab introduction 151005

genomeinabottle.org

Sequencing technologies and bioinformatics pipelines disagree

O’Rawe et al. Genome Medicine 2013, 5:28

Page 3: GIAB-GRC workshop oct2015 giab introduction 151005

genomeinabottle.org

Sequencing technologies and bioinformatics pipelines disagree

O’Rawe et al. Genome Medicine 2013, 5:28

Who is right?

Is anyone right?

Page 4: GIAB-GRC workshop oct2015 giab introduction 151005

genomeinabottle.org

GIAB Scope

• The Genome in a Bottle Consortium is developing the reference materials, reference methods, and reference data needed to assess confidence in human genome variant calls.

• A principal motivation for this consortium is to enable performance assessment of sequencing and science-based regulatory oversight of clinical sequencing.

Page 5: GIAB-GRC workshop oct2015 giab introduction 151005

genomeinabottle.org

Well-characterized, stable RMs• Obtain metrics for validation,

QC, QA, PT• Determine sources and types of

bias/error• Learn to resolve difficult

structural variants• Improve reference genome

assembly• Optimization

– integration of data from multiple platforms

– sequencing and analysis• Enable regulated applications Comparison of SNP Calls for

NA12878 on 2 platforms, 3 analysis methods

Page 6: GIAB-GRC workshop oct2015 giab introduction 151005

genomeinabottle.org

NGS Validation Process usingGenomes in Bottles

Sample

gDNA isolation

Library Prep

Sequencing

Alignment/Mapping

Variant Calling

Confidence Estimates

Downstream Analysis

Analytical ProcessGenome in a Bottle Scope

Pre-Analytical Process

Clinical InterpretationGIAB Data

Page 7: GIAB-GRC workshop oct2015 giab introduction 151005

genomeinabottle.org

Genome in a Bottle Consortium (GIAB)Hosted by US National Institute of Standards and Technology

Goal: Provide infrastructure to assess confidence in human variant calls

• Appropriately consented widely available DNA samples, distributed by the Coriell Institute– Also, QCed Reference Material (RM) versions

from controlled lots will be available from NIST– Also, PGP samples are commercially available

• High-accuracy reference data for these samples

• Tools to facilitate their use– With the Global Alliance Data Working Group

Benchmarking Team

ga4gh.org

Page 8: GIAB-GRC workshop oct2015 giab introduction 151005

genomeinabottle.org

GIAB Selected SamplesCEPH/Utah Pedigree 1463

NA12889

NA12879

NA12890

NA12880NA12881

NA12882NA12883

NA12884NA12885

NA12886NA12887

NA12888NA12893

NA12877 NA12878

NA12891 NA12892

✔ ✔NA24149 NA24143

NA24385

Ashkenazi Jewish Trio

NA24694 NA24695

NA24631

Asian (Han Chinese) Trio

Note: Illumina and RTG have used data from the pedigreeto improve variant calls in the specific GIAB samples.

New

New

PersonalGenomeProject

Available asNIST RM8398

Page 9: GIAB-GRC workshop oct2015 giab introduction 151005

genomeinabottle.org

NIST Human Genome Reference Materials (RMs)

• NIST RM 8398 is available!– tinyurl.com/giabpilot– DNA isolated from large

growth cell cultures– Stable, homogeneous – Best for regulated uses– DNA from same cell line at

Coriell (NA12878)

• New AJ and Asian Samples– Available from Coriell now– NIST RM available in 2016

Page 10: GIAB-GRC workshop oct2015 giab introduction 151005

genomeinabottle.org

Integrated 14 datasets from 5 platforms to establish Reference SNP/indel Calls for NA12878

Zook et al., Nature Biotechnology, 2014.

Page 11: GIAB-GRC workshop oct2015 giab introduction 151005

genomeinabottle.org

Integration Methods to Establish Reference Variant Calls for NA12878

Candidate Variants from Each Platform

Identify Concordant Variants

Identify Characteristics of Systematic Error

Arbitrate Using Evidence of Systematic Error

Exclude regions potentially biased for all short reads (e.g., repeats, SVs)

Zook et al., Nature Biotechnology, 2014.

Page 12: GIAB-GRC workshop oct2015 giab introduction 151005

genomeinabottle.org

Assigning confidence to genomic regions for NA12878

High-confidence (77%)• Platforms agree or we

understand the systematic biases causing disagreement

• At least some methods have no evidence of systematic errors

• Mendelian inheritance consistent

Lower confidence (23%)• In a region known to be

difficult for current technologies– Segmental Dups– Repeats, Low Complexity– High/Low GC– Etc.

• Evidence of systematic error across many platforms

• Inconsistent inheritance

Zook et al., Nature Biotechnology, 2014.

Page 13: GIAB-GRC workshop oct2015 giab introduction 151005

genomeinabottle.org

Using high-confidence NIST-GIAB genotypes for NA12878

• NIST have released several versions of high-confidence genotypes for its pilot RM

• These data are presently being used for benchmarking– prior to release of RMs– SNPs & indels

• ~77% of the genome•Data on FTP now well-organized

Page 14: GIAB-GRC workshop oct2015 giab introduction 151005

genomeinabottle.org

GeT-RM Browser from NCBI and CDC• http://www.ncbi.nlm.nih.gov/variation/tools/get-rm/• Allows visualization of data underlying call each call

Page 15: GIAB-GRC workshop oct2015 giab introduction 151005

genomeinabottle.org

Uses of GIAB NA12878

Oncology – Molecular and Cellular Tumor Markers“Next Generation” Sequencing (NGS) guidelines for somatic genetic variant detection

www.bioplanet.com/gcat

Page 16: GIAB-GRC workshop oct2015 giab introduction 151005

genomeinabottle.org

Global Alliance for Genomics and HealthBenchmarking Task Team

• Formed June 2014 to develop methods and tools for comparing variant calls to a benchmark

• Developed standardized definitions for performance metrics like TP, FP, and FN.

• Initial focus on germline SNPs/indels• Developing benchmarking tools

• Comparison engine• Pluggable web interface with

modules for:• Reporting/calculation of metrics• Visualization/user interface

• Working with Genome in a Bottle Consortium to host data and calls from their well-characterized genomes

www.bioplanet.com/gcat

Example User Interface

Page 17: GIAB-GRC workshop oct2015 giab introduction 151005

genomeinabottle.org

Global Alliance for Genomics and HealthBenchmarking Task Team

Credit: Rebecca Truty, Complete Genomics

How should we interpret this complex variant on chr21?

Page 18: GIAB-GRC workshop oct2015 giab introduction 151005

genomeinabottle.org

Global Alliance for Genomics and HealthBenchmarking Task Team

Credit: Rebecca Truty, Complete Genomics

Beyond simple T/F classification: Genotype errorsTruth

Callset

Description ProposedName(s)

CM#1 region match

CM#2 allele match CM#3 genotype match

0/1 1/1 zygosity/genotype error

GE TP 1TP, 1GE FN

1/1 0/1

1/2 0/11/10/22/2

common allele, FN allele

GE_FN TP 1TP, 1GE, 1FN FN

0/1 1/2 common allele, FP allele

GE_FP TP 1TP, 1GE, 1FP FP, FN

1/1 1/2

1/2 1/3 common allele, FP allele, FN allele

GE_FP_FN TP 1TP, 1GE, 1FP, 1FN

FP, FN

Page 19: GIAB-GRC workshop oct2015 giab introduction 151005

genomeinabottle.org

Global Alliance for Genomics and HealthBenchmarking Task Team

Credit: Rebecca Truty, Complete Genomics

Beyond simple T/F classification: no-calls and half-calls

Truth Callset Description ProposedName(s)

CM#1 region match

CM#2 allele match CM#3 genotype match

0/1 ./1 half-call, TP allele HC_TP NC, NCV, TP 1NC, 1NCV, 1TP, 1GE TP

1/1 ./1 1NC, 1NCV, 1TP, 1GE FN

0/11/1

./0 half call, FN allele(s)

HC_FN NC, NCV, TP 1NC, 1NCV, 1FN FN

1/2 ./0 1NC, 2NCV, 2FN FN

1/2 ./1./2

half-call, TP allele, FN allele

HC_TP_FN

NC, NCV, TP 1NC, 1NCV, 1TP, 1GE, 1FN

FN

Page 20: GIAB-GRC workshop oct2015 giab introduction 151005

genomeinabottle.org

Stratifying False PositivesGC ContentTR

Unit <7

TRUnit >=7

TRUnit

2TRUnit

1

TRUnit

3

TRUnit

4

Credit:Abby BeelerEllie Wood

GA4GH - Stratification

Page 21: GIAB-GRC workshop oct2015 giab introduction 151005

genomeinabottle.org

Public data from GIAB AJ PGP Trio

Long reads/”Linked” reads• ~70/30/30x PacBio

– ~11kb N50• BioNano• 10X Genomics• Moleculo• Complete Genomics LFR• Oxford Nanopore

Short reads• 300x Illumina paired-end• 15x Illumina 6kb mate-pair• Complete Genomics• SOLiD 5500W• Ion Proton Exome

http://biorxiv.org/content/early/2015/09/15/026468

Page 22: GIAB-GRC workshop oct2015 giab introduction 151005

genomeinabottle.org

GIAB Analysis Group – New Data Sets

Leaders• Francisco de la Vega

– Annai Systems• Chris Mason

– Weil Cornell Medical Center• Tina Graves

– Washington University• Valerie Schneider

– NCBI•and Justin and Marc

Status• Analysis Group Responsibilities:

– https://docs.google.com/document/d/10eA0DwB4iYTSFM_LPO9_2LyyN2xEqH49OXHhtNH1uzw/edit?usp=sharing

• Analysis Milestones:– https://docs.google.com/spreadsheets/d/1Pj4nSz

H742g40wJz2fA6f8kFtZYAToZpSZYVPiC5st4/edit?usp=sharing

• Analysis Methods– https://docs.google.com/spreadsheet

s/d/1Je2g85H7oK6kMXbBOoqQ1FMNrvGnFuUJTJn7deyYiS8/edit?usp=sharing

• Analysis Plan:– https://drive.google.com/file/d/0B7Ao1qq

JJDHQdnVEaVdqbWdEdkE/view?usp=sharing

• Collecting Data and analyses on GIAB FTP Site

• Recruiting people to help with the work.

Goal: Establish and distribute a set of authoritative benchmark variant calls of all types and sizes, as well as homozygous reference regions, on GIAB PGP trios

Page 23: GIAB-GRC workshop oct2015 giab introduction 151005

genomeinabottle.org

Data Release Policy: Real-time, Open, Public Release

Individual Datasets• Uploaded to GIAB FTP site

as it is collected• Includes raw reads, aligned

reads, and variant/reference calls

Integrated High-confidence Calls• First develop SNP, indel, and

homozygous reference calls• Then develop SV and non-

SV calls• Released calls are versioned• Preliminary callsets will be

made available to be critiqued

Page 24: GIAB-GRC workshop oct2015 giab introduction 151005

genomeinabottle.org

Analysis Progress: AJ Trio• SNPs/indels

– Several candidate callsets– NIST working on integration– Plan to use 10X/moleculo/PacBio for difficult-to-map regions

• Assembly– 2 de novo assemblies of AJ trio (MHAP/PBcR and Falcon/Bionano)– Will be used by at least 2 groups for SV calling

• Structural variants– Candidate calls being generated by 15+ groups with >20 different

algorithms and 6 datasets– 3 integration methods: Bina-MetaSV, DNAnexus/Baylor-

Parliament, NIST-svclassify– Parliament: ~7k SVs with evidence in PacBio and Illumina

• Long-range Phasing– 2 phased calls so far (CG LFR and 10X)– Integration methods needed

Page 25: GIAB-GRC workshop oct2015 giab introduction 151005

genomeinabottle.org

Proposed approach to form high-confidence SV (and non-SV) calls

Generate candidate calls from multiple methods

Compare/evaluate calls using Parliament/MetaSV/svclassify/others?;

manually inspect discordant calls

Integrate new and revised calls

Combine integrated calls (with heuristics and/or machine learning) to generate high-

confidence calls

August 30, 2015

Nov 1, 2015

Jan 1, 2016

Jan 26, 2016

Page 26: GIAB-GRC workshop oct2015 giab introduction 151005

genomeinabottle.org

Acknowledgments

• FDA – Elizabeth Mansfield, Computing staff

• Many members of Genome in a Bottle– New members

welcome!– Sign up on website for

email newsletters

Steering Committee– Marc Salit – Justin Zook– David Mittelman – Andrew Grupe – Michael Eberle– Steve Sherry – Deanna Church – Francisco De La Vega– Christian Olsen – Monica Basehore – Lisa Kalman – Christopher Mason – Elizabeth Mansfield – Liz Kerrigan – Leming Shi – Melvin Limson – Alexander Wait Zaranek – Nils Homer – Fiona Hyland– Steve Lincoln – Don Baldwin – Robyn Temple-Smolkin – Chunlin Xiao– Kara Norman– Luke Hickey

Page 27: GIAB-GRC workshop oct2015 giab introduction 151005

genomeinabottle.org

For More Informationwww.genomeinabottle.org - sign up for general GIAB and Analysis Team google group emails

www.bioplanet.com/gcat - exome comparison tool

www.ncbi.nlm.nih.gov/variation/tools/get-rm/ - Get-RM Browser

Data: http://biorxiv.org/content/early/2015/09/15/026468

Global Alliance Benchmarking work group– ga4gh.org/#/benchmarking-team

Twice yearly workshop – Winter: January 28-29, 2016 at Stanford University, California, USA– Summer at NIST, Maryland, USA

Public Meetings

Justin Zook: [email protected] Salit: [email protected]