18
10X Genomics Novel variants and variant validation September 2016

Sept2016 smallvar 10_x

Embed Size (px)

Citation preview

Page 1: Sept2016 smallvar 10_x

10X Genomics

Novel variants and variant validation

September 2016

Page 2: Sept2016 smallvar 10_x

2

Partitioning to Linked Reads

1.0ng input

Post

er: M

etho

ds

Page 3: Sept2016 smallvar 10_x

3

Linked read data

Confidential — Do not distribute

Page 4: Sept2016 smallvar 10_x

4

Unlinked, unphased short read SNP

Page 5: Sept2016 smallvar 10_x

5

Linked reads, phased SNP

Page 6: Sept2016 smallvar 10_x

6

Standard Short Read Alignment

Close Paralogs

Short Reads

Short Read Aligners Cannot Place Reads Correctly

Page 7: Sept2016 smallvar 10_x

7

Long Ranger – LariatTM Aligner

1. Confident mapping provides anchors

2. Barcodes recruit short reads into paralogous loci

Close Paralogs

LariatTM Aligner Correctly Places Short Reads Even in Paralogous Loci

Linked-Reads

Page 8: Sept2016 smallvar 10_x

8

Improved alignment leads to improved variant calling

•SMN1 and SMN2: part of an inverted tandem duplication on chr5–Differ by 8 nucleotides (3 exonic)

• SMN1: causative of spinal muscular atrophy• SMN2: low function copy, not disease-causing

Haplotype 2 Reads

Haplotype 1 Reads

Standard Genome

Chromium Genome

SMN2

NA12878 WGS 128Gb

Page 9: Sept2016 smallvar 10_x

9

Inference

chr1

chr3

chr5

chr11

chr13

source

sink

• For every active alignment in the sink whose read has an alignment in the sink, switch the alignment in the sink to active and score probabilistically. If the source has few or no active alignments, the score goes up.

Page 10: Sept2016 smallvar 10_x

10

Inference

chr1

chr3

chr5

chr11

chr13

• This source is also now inactive.

source

sink

Page 11: Sept2016 smallvar 10_x

11

Inference

chr1

chr3

chr5

chr11

chr13

• Fast forward and we have the following active molecules left.

Page 12: Sept2016 smallvar 10_x

12

•Called by 10X data not in GIAB 3.2.2 (whole genome, not restricted to confident regions)

•Validated with PacBio requiring > 2 alt alleles supported and >15% allele fraction

•Of regions with PacBio coverage >=12, validation rates are 94% for 10X and 89% for truseq.

Novel variants

10X Truseq Diff 10x validated

Truseq validated

Diff

SNPs 335k 292k 43k 289k 237k 52k

Deletions 76k 56k 20k 73k 54k 19k

Insertions 59k 43k 16k 58k 42k 16k

Total 470k 391k 79k 420k 333k 87k

Page 13: Sept2016 smallvar 10_x

13

• PacBio validation – align pac bio reads to reference then align them to the reference with the alt allele in place of the reference allele. Only count as support if one scores higher than the other.

Novel variant validation method

• Can we validate this validation method• Sensitivity of validation in confident

region• Negative predictive value of

“random” mutations• For SNPs, random is straight

forward (could include TI/TV bias)

• For indels• Pick length from geometric

distribution• For deletions, the alt allele is

trivial• For insertions, the alt allele

used is the bases in the reference at that locus repeated.

Page 14: Sept2016 smallvar 10_x

14

•Entire 10X team especially Patrick Marks and Deanna Church•GIAB workshop organizers

1. Zheng, Grace XY, et al. "Haplotyping germline and cancer genomes with high-throughput linked-read sequencing." Nature biotechnology (2016).

2. Samonte, Rhea Vallente, and Evan E. Eichler. "Segmental duplications and the evolution of the primate genome." Nature Reviews Genetics 3.1 (2002): 65-72.

3. Bishara A et al. (2015) Read clouds uncover variation in complex regions of the human genome. Genome Res, 25:1570-1580.

4. Li, Heng, and Richard Durbin. "Fast and accurate short read alignment with Burrows–Wheeler transform." Bioinformatics 25.14 (2009): 1754-1760.

Acknowledgements and references

Page 15: Sept2016 smallvar 10_x

15

Addendum

Page 16: Sept2016 smallvar 10_x

16

SNP validation validation

Confidential — Do not distribute

Used for validation

Page 17: Sept2016 smallvar 10_x

17

Deletion validation validation

Confidential — Do not distribute

Used for validation

Page 18: Sept2016 smallvar 10_x

18

Insertion validation validation

Confidential — Do not distribute

Used for validation