33
Interpreting NGS Data Melissa Rose, FAS

Interpreting NGS Data - Agilent · PR7000-0663 What You’ll Learn Today Variant Call Format •Components •Format types Recommendations for NGS Interpretation Publicly available

Embed Size (px)

Citation preview

Interpreting NGS Data

Melissa Rose, FAS

PR7000-0663

What You’ll Learn Today

Variant Call Format

• Components

• Format types

Recommendations for NGS Interpretation

Publicly available Databases

Somatic versus Germline Interpretations Tools / Differences /

Similarities

Automation

Reporting

PR7000-0663

Variant Call Format

PR7000-0663

Meta-information Lines of a vcf

https://docs.gdc.cancer.gov/Data/File_Formats/VCF_Format/#vcf-file-structure

gdcWorkflow: information on GDC pipelines that were used to generate the VCF file. GDC annotated VCF files contain two gdcWorkflow lines, one representing the variant calling process and the other representing the variant annotation process.

INDIVIDUAL: information about the study participant, including:

• NAME: Submitter ID (barcode) associated with the participant, and

• ID: GDC case UUID

SAMPLE: sample information, including:

• ID: NORMAL or TUMOR

• NAME: Submitter ID (barcode) of the aliquot

• ALIQUOT_ID: GDC aliquot UUID

• BAM_ID: BAM file UUID

INFO: format of additional information fields

• NOTE: GDC Annotated VCFs may contain multiple INFO lines. The last INFO line contains information about annotation fields generated by the Somatic Annotation Workflow (see GDC INFO Fields below).

FILTER: description of filters that have been applied to the data

FORMAT: description of genotype fields

reference: the reference genome used to generate the VCF file

contig: contigs included in the VCF files

• NOTE: Annotated VCFs include contig information for autosomes, sex chromosomes, and mitochondrial DNA. Unplaced, unlocalized, human decoy, and viral genome sequences are not included.

VEP: the VEP command used by the Somatic Annotation Workflow to generate the annotated VCF file.

PR7000-0663

Column Header Line in vcf

CHROM: chromosome

POS: position

ID: identifier

REF: reference base(s)

ALT: alternate base(s)

QUAL: quality

FILTER: filter status

INFO: additional information

FORMAT: format of sample genotype data

NORMAL: normal sample genotype data

TUMOR: tumor sample genotype data

See Variant Call Format (VCF) Version 4.1 Specification for details.

PR7000-0663

Data in vcf

This contains the recorded called variant information. It is in the

format of tab-delimited information. Each line contains a

recorded variant from the calls upstream.

Example of a line of a variant call in the vcf:

chr22 17264565 . G T 255.0 Pass

DP=132;DP4=41,39,28,24;STDP4=41,39,28,24;AF1=0.39393938;AN=2;MQ=41 GT:PL:GQ

1/0:0,255,255:100

PR7000-0663

Standards for Interpretation

PR7000-0663

Standards for Interpretation

PR7000-0663

Standards for Interpretation

PR7000-0663

Commonly Used Publicly Available Databases

1000 Genomes - >1000 participants to catalog human genetic variation

ClinVar – medically important variants and phenotypes

COSMIC – Catelog of somatic mutations in cancer

dbNSFP – based on Ensemble for functional prediction and annotation of all potential non-synonymous single-nucleotide variants (nsSNVs) in the human genome

dbSNP - for genetic variation within and across different species developed and hosted by (NCBI) in collaboration with (NHGRI)

ESP6500 - genes and mechanisms contributing to heart, lung and blood disorders

ExAC - >60,000 unrelated individuals sequenced as part of various disease-specific and population genetic studies

HGMD - known (published) gene lesions responsible for human inherited disease

OMIM – comprehensive collection of human genes and genetic phenotypes

RefSeq - annotated and curated collection of publicly available nucleotide sequences (DNA, RNA) and their protein products.

HPO - standardized vocabulary of phenotypic abnormalities encountered in human disease

NCBI GeneBank - NIH genetic sequence database

PR7000-0663

How Databases Can Be Utilized

Population Frequencies

Subpopulation / Cohort Studies

Genotype/Phenotype Correlation

Known Diseases

Known Mutations

Functional Effect Predictions

Inheritance Mode of Disease

Other

PR7000-0663

Some Analysis Methods/ Tools / Hypotheses

Population Frequencies

Assessing Inheritance Modes using Mendelian Law

Exonic / Protein Investigation

Quality of called variant

Zygosity

Targeted Panels

Splice Sites

Compound Variation

Public database and Lab’s Own Curation

PR7000-0663

How is the Information Processed/Saved?

Vcf file > Bioinformatics / Excel Files

Tens to thousands of lines of information per patient!

Questions/methods/tools applied to each line:

• chr22 17264565 . G T 255.0 Pass

DP=132;DP4=41,39,28,24;STDP4=41,39,28,24;AF1=0.39393938;AN

=2;MQ=41 GT:PL:GQ 1/0:0,255,255:100

- What is the population frequency for this?

- Is there a protein change?

- Was this inherited?

- Does either parent display phenotype?

- What is the gene relationship?

PR7000-0663

Issues with Manual Analysis

Saving Information – Big Data!

Relying on Interpretation Method (reproducible?)

Audit Trails

Bioinformatics – not always avaialble

No specific standard – only recommendations

Data increasing daily! Storage!

PR7000-0663

An Example Analysis Using Alissa Interpret

PR7000-0663

Patient is suspected to have lung cancer

PR7000-0663

How can we move from a suspected cancer to a report including diagnoses, clinical trails, and thetapeutic options?

PR7000-0663

Begin your investigation with a fully audit traceable analysis

PR7000-0663

Annotation sources and versions tag each analysis for your reference and audit trails – what was available at that time in the resources?

PR7000-0663

Variants flow through the tree and pick up labels, annotations, etc.

How??.....

PR7000-0663

Easy to use drag and drop method to build your SOP hypothesis

Underlying

annotation

sources are

embedded in

the software

PR7000-0663

Apply your target panels, disregard incidental findings, match your known gene lists…

PR7000-0663

Select criteria for your patient’s variant filtration with as few or as many filter patterns as you see fit

PR7000-0663

Filter through ClinVar and the Cancer Gene Census accessing specific clinical significances

PR7000-0663

Filter Using Clinical Interpretations of Variants in Cancer (CIViC) for Precision Medicine Applications

PR7000-0663

Your list of 150 variants has now been reduced to 3 for final review

150 variants 3 variants

PR7000-0663

Moving from triage to review tab your pathologist can view all annotations, labels and more in one screen

PR7000-0663

Your historical curated information is also readily accessible – is this related to the disease? Is it actionable?

PR7000-0663

Categorize your variant with Clinical Relevance = Uncertain, Low, Moderate, High and Very High

PR7000-0663

If you haven’t filtered against this in your SOP tree, CIViC annotations are available in final review

PR7000-0663

CIViC – access evidence, clinical significance, available drugs and more

PR7000-0663

Report your actionable variants with your final report

Win Time

Sending Out

These

Templated

Reports from

Automated

Analyses!!

Reduce Errors with these auto-

populated report templates!!!

PR7000-0663

Stay tuned for the next webinar in this series……

To register for the next webinar in the series, please look for the

follow up email about this webinar. It will contain a link to a

recording of today’s webinar and the first webinar in this series.