15
GIAB Workshop Len Trigg & Sean Irvine

Sept2016 smallvar rtg

Embed Size (px)

Citation preview

GIAB WorkshopLen Trigg & Sean Irvine

Phasing NA12878 by segregation in children

Phasing NA12878 by segregation in children

● Joint calling of 17 member CEPH pedigree.● Benefits:

○ High Mendelian consistency across all members.○ (Near) full phasing of NA12878 (and NA12877) according

to segregation in the 11 children.● Latest run incorporates 300x Illumina reads for NA12878

RM8398 sample (other members ~30x).

● Calls that segregate well are more likely to be correct.● Could look at phasing inconsistent calls in more detail.

○ Structural variants○ Somatic variants

Concordance of NA12878 with GIAB 3.2.2

NA24385, RTG on 10X Genomics Chromium

Unifying call sets

Different callers, different representations.Different samples, different representations.

Given some number of call sets, represent the calls in as consistent manner as possible.

● Incrementally accumulate alleles from call sets.● Recode call sets using accumulated alleles.● Harmonization rather than Canonicalization (chosen

representation comes from within rather than externally specified).

Example: chr20, NA12878

Example: Harmonization of AJ trio

Example from v3.3 AJ trio

3 non-Mendelian calls become consistent on recoding.12 original alleles recoded into 6 alleles.

Original child mother father1:73974514 GAACCC G . 0|1 .1:73974515 A T 0/1 . .1:73974516 ACCC A 0/1 . .1:73974520 TC T . 0|1 .1:73974521 CATA C 0/1 . .1:73974524 A C . 0|1 .

Recoded1:73974515 A T 0/1 0/1 .1:73974516 ACCC A 0/1 0/1 .1:73974521 CATA C 0/1 0/1 .

Notes and Limitations

● Recoding loses existing annotations. Could recover in simple cases, but not clear what to do when calls are moved, split, or combined as a result of the recoding.

● If a new call set needs to be added, can incrementally accumulate new sample, but existing ones will need to be recoded.

● Final result is dependent on the order in which call sets are accumulated.

● Minimizes number of alleles (can in rare cases introduce Mendelian violations).

Phase Transfer

Another mode of operation for vcfeval. The phasing in one call set can be lifted over to another call set without losing annotations or changing the representation of calls.

v3.3 HG002/NA243859.7%

RTG AJ trio 300x88.1%

phase-transferred90.2%

chr20 NA12878 GATK0%

RTG CEPH SP 37.7.099.9%

89.0%

Illumina PG 8.0.199.9%

phase-transferred90.8%

Phase Transfer

During normal operation vcfeval ignores phasing information and tries each allele on each haplotype.

During phase transfer vcfeval will obey the phasing of one (or both) of the samples. Effectively restricts the matches that can be made. Ideally want at least one sample to be fully phased.

A special output mode is used to report the phasing found during the matching. Apart from the phasing, the calls are not changed and all the original annotations are retained.