25
talks Genotype Refinement Workflow Using addi8onal data to improve genotype calls and likelihoods

Genotype Refinement Workflowqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr...--filterExpression “GQ

Embed Size (px)

Citation preview

Page 1: Genotype Refinement Workflowqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr...--filterExpression “GQ

talks

GenotypeRefinementWorkflow

Usingaddi8onaldatatoimprovegenotypecallsandlikelihoods

Page 2: Genotype Refinement Workflowqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr...--filterExpression “GQ

YouarehereintheGATKBestPrac8cesworkflowforgermlinevariantdiscovery

Analysis-Ready Variants

111Raw Reads

Raw Variants IndelsSNPs

Analysis-ReadyReads

Indel Realignment

Base Recalibration

SNPs & Indels

Variants

IndelsSNPs

VariantAnnotation

Variant Evaluation

look good?

use in projecttroubleshoot

111Analysis-ReadyReads

Genotype Likelihoods

Joint Genotyping

Analysis-Ready

No

n-G

AT

K

Mark Duplicates& Sort (Picard)

Var. Calling HC in ERC mode

separately per variant type

Variant Recalibration

Map to Reference

BWA mem GenotypeRefinement

Data Pre-processing Variant Discovery>> >> Callset Refinement

Page 3: Genotype Refinement Workflowqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr...--filterExpression “GQ

Whycareaboutgenotypes?

•  Medicalgene8cistsneedgenotypesforpa8ents–  Doanypa8entshavetwocopiesofaLOFmuta8on?–  Aretheparentsofadiseasedchildlikelytohavemoreafflictedchildren?

•  Popula8ongene8cistsneedgenotypesforassocia8onstudies–  Howdoesthenumberofcopiesofanalleleaffectthephenotype?

Page 4: Genotype Refinement Workflowqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr...--filterExpression “GQ

Variantcallvs.Genotypecall

•  Variantcall=thereisvaria8onatthissiteinoneormoresamples

•  Genotypecall=thesearethehaplotypespresentinthissample

Page 5: Genotype Refinement Workflowqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr...--filterExpression “GQ

Genotypecallqualityisimportant!

•  Somesites/sampleshavepoorgenotypecalls–  Canbeambiguousduetolowconfidence–  Mightbeen8relywrong!

•  Canaddi8onal(independent)dataimprovegenotypecalls?–  Usehighqualitydata(like1000G)aspriors–  Usepedigree(ifavailable)–  Calculateposteriorgenotypeprobabili8es

Page 6: Genotype Refinement Workflowqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr...--filterExpression “GQ

ReviewofBayes’sRule

Giventhatyourcoworkerjustwalkedinwithanumbrella,whatistheprobabilitythatitisraining?•  Observa8on=umbrella•  Θ=probabilityofrain

prior

(normalize)

likelihoodposteriorprobability

Page 7: Genotype Refinement Workflowqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr...--filterExpression “GQ

GenotypeRefinementWorkflow

High Quality Variants

De Novo Variants

Recalibrated Variants

Population Priors Family Priors

CalculateGenotypePosteriors

VariantFiltration

VariantAnnotator

Variants with Posterior Qualities

Page 8: Genotype Refinement Workflowqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr...--filterExpression “GQ

CalculateGenotypePosteriors

java -jar GenomeAnalysisTK.jar \ -T CalculateGenotypePosteriors \ -R reference.fasta \ -V input.vcf \ -ped family.ped \ -supporting population.vcf \ -o output.vcf

High Quality Variants

De Novo Variants

Recalibrated Variants

Population Priors Family Priors

CalculateGenotypePosteriors

VariantFiltration

VariantAnnotator

Variants with Posterior Qualities

Page 9: Genotype Refinement Workflowqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr...--filterExpression “GQ

Case1:HOM_VARCallw/LowFrequencyPriors

1) Baseline HOM_VAR call

2) Priors w/low allele frequency applied

3) Posterior genotype called HET

4) In agreement w/NISTand BAMs

Likelihoods x Priors = Posterior Probabilities[895,3,0] AF=0.002 [868,0,27]

[HOM_REF, HET, HOM_VAR] [HOM_REF, HET, HOM_VAR]

Genotype correctedConfidence improved from Q3 to Q27

Page 10: Genotype Refinement Workflowqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr...--filterExpression “GQ

Case2:HETCallwithHighFrequencyPriors

1) Baseline HET call

2) Priors w/high allele frequency applied

3) Posterior genotype called HOM_VAR

4) In agreement w/NISTand BAMs

Likelihoods x Priors = Posterior Probabilities[894,0,0] AF=0.987 [932,16,0]

[HOM_REF, HET, HOM_VAR] [HOM_REF, HET, HOM_VAR]

Genotype corrected Confidence improvedfrom Q0 to Q16

Page 11: Genotype Refinement Workflowqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr...--filterExpression “GQ

Popula8onpriorsimprovegenotypeconfidence

Baseline HomRef calls are under confident, but posterior calls are more accurate

Baseline HomVar calls are over confident, but posterior calls are improved

HomRef

HetHom

Var

Page 12: Genotype Refinement Workflowqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr...--filterExpression “GQ

Assessingconfidenceandcorrectness

Intercept = 9.9612Slope = 0.9302

Average Q10 increase for correct calls, Q≤30

BaselineGenotypeQuality

PosteriorG

enotypeQuality

Incorrect calls stay about the same

HomozygousReferenceCalls

Page 13: Genotype Refinement Workflowqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr...--filterExpression “GQ

Parentalgenotypesinformchildgenotypes

!(!!|!! ,!! ,!!)!

Child Mother FatherHR HR HRHR HR HETHR HET HRHR HET HET

Child Mother FatherHET HET HETHET HR HETHET HET HRHET HV HETHET HET HVHET HR HVHET HV HR

Child Mother FatherHV HV HVHV HV HETHV HET HVHV HET HET

•  Childcanonlyinheritallelespresentinparents•  Parentgenotypesdeterminepossiblechildgenotypes

(assumingnomuta8ons)

•  HaplotypeCallergives•  Giventriodatawecanderive

Page 14: Genotype Refinement Workflowqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr...--filterExpression “GQ

! ! !"# = ! ! !"# ! !(!)!(!"#|!)!(!)!

!

Bayesianpriorsappliedtotrios

•  RecallBayes’sRule:

•  Establishgenotypeconfigura8onprobabili8es

•  Applyfamilypriors

! !! = !" ! = !!(!! = !") !! !! !! !! !(!)!!,!!!(! ! ! !(!) !

applyprior

normalize

likelihoodposterior

! !!,!! ,!! = !! !!, 1!"!!, 2!"#

1− 10! − 2!!,!"! −!"!

!!

Page 15: Genotype Refinement Workflowqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr...--filterExpression “GQ

Familypriorsimprovegenotypeconfidence

Baseline HomRef calls are under confident, but posterior calls are more accurate

Posterior HomRef and HomVar calls are higher confidence

HomRef

HetHom

Var

Page 16: Genotype Refinement Workflowqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr...--filterExpression “GQ

Assessingconfidenceandcorrectness

Intercept = 12.831,Slope = 1.238

Average Q13 increase for correct calls, Q≤30

BaselineGenotypeQuality

PosteriorG

enotypeQuality

Incorrect calls stay about the same

HomozygousReferenceCalls

Page 17: Genotype Refinement Workflowqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr...--filterExpression “GQ

FilterlowconfidenceGQs

•  UseVariantFiltra8ontofilterambiguous,low-confidencecalls

•  RecommendedthresholdisGQ=20–  GQ20isPhred-scaled99%confidence

•  Restrictfurtheranalysistohigh-qualitydata

High Quality Variants

De Novo Variants

Recalibrated Variants

Population Priors Family Priors

CalculateGenotypePosteriors

VariantFiltration

VariantAnnotator

Variants with Posterior Qualities java -jar GenomeAnalysisTK.jar \ -T VariantFiltration \ -R reference.fasta \ -V input.vcf \ --filterExpression “GQ<20” \ --filterName “lowGQ” \ -o output.vcf

Page 18: Genotype Refinement Workflowqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr...--filterExpression “GQ

VariantAnnotator

High Quality Variants

De Novo Variants

Recalibrated Variants

Population Priors Family Priors

CalculateGenotypePosteriors

VariantFiltration

VariantAnnotator

Variants with Posterior Qualities

java -jar GenomeAnalysisTK.jar \ -T VariantAnnotator \ -R reference.fasta \ -V input.vcf \ -A PossibleDeNovo \ -o output.vcf

Page 19: Genotype Refinement Workflowqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr...--filterExpression “GQ

WhatareDeNovomuta8ons?

•  CulpritsinmanyrareMendeliandisorders•  ~30denovomuta8onsoccurperhumangenome

Parentsarehomozygousreference

Childishet(onecopyofaltallele)

Page 20: Genotype Refinement Workflowqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr...--filterExpression “GQ

Proper8esofsequencedDeNovos

•  Novelty–  Childhasonlyaltalleleintrio,notinherited

•  Rarity–  Allelefrequencyacrossallsamplessequencedislow

•  Confidence–  SetGQthresholdforparentsandchild–  (GQimprovementtoolshelpALOThere!)

Page 21: Genotype Refinement Workflowqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr...--filterExpression “GQ

Exampleofaclinicalcase

•  Realclinicaldata•  Suspecteddenovomuta8oninoffspring

417denovosfromrawGTcalls

17denovosbasedonposteriorGTs

8highconfidencedenovosanerGQfiltering

Page 22: Genotype Refinement Workflowqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr...--filterExpression “GQ

Priorscanbetunedforsensi8vity

Sensi8vityandspecificitycanbetunedasinVQSR

Muta8onpriorisaparameteringenotypeconfigura8onprobability:

IncreasingSensi8vity

! !!,!! ,!! = !! !!, 1!"!!, 2!"#

1− 10! − 2!!,!"! −!"!

!!

Page 23: Genotype Refinement Workflowqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr...--filterExpression “GQ

Genotyperefinementyieldsmorehigh-qualitygenotypes

•  Ini8algenotypecallsmaybeambiguousorwrong

•  Applyingpopula8on+familypriorsimprovesconfidence

•  Morehighconfidencegenotypes->moredatafordownstreamanalysis!Hom

Var

Page 24: Genotype Refinement Workflowqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr...--filterExpression “GQ

YouarehereintheGATKBestPrac8cesworkflowforgermlinevariantdiscovery

Analysis-Ready Variants

111Raw Reads

Raw Variants IndelsSNPs

Analysis-ReadyReads

Indel Realignment

Base Recalibration

SNPs & Indels

Variants

IndelsSNPs

VariantAnnotation

Variant Evaluation

look good?

use in projecttroubleshoot

111Analysis-ReadyReads

Genotype Likelihoods

Joint Genotyping

Analysis-Ready

No

n-G

AT

K

Mark Duplicates& Sort (Picard)

Var. Calling HC in ERC mode

separately per variant type

Variant Recalibration

Map to Reference

BWA mem GenotypeRefinement

Data Pre-processing Variant Discovery>> >> Callset Refinement

Page 25: Genotype Refinement Workflowqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr...--filterExpression “GQ

talks

Furtherreading

hrp://www.broadins8tute.org/gatk/guide/

hrps://www.broadins8tute.org/gatk/gatkdocs/org_broadins8tute_gatk_tools_walkers_variantu8ls_CalculateGenotypePosteriors

hrp://www.broadins8tute.org/gatk/guide/ar8cle?id=4723

hrp://www.broadins8tute.org/gatk/guide/ar8cle?id=4726

talks