Upload
leo-lamb
View
219
Download
1
Tags:
Embed Size (px)
Citation preview
Genome Informatics I (2015 Spring)
MES7594-01 Genome Infor-matics I
- Lecture VIII. Interpreting variants
Sangwoo Kim, Ph.D.Assistant Professor,
Severance Biomedical Research Institute, Yonsei University College of Medicine
Genome Informatics I (2015 Spring)
Overview
• Goal of this lecture– You will learn how to interpret discovered vari-
ants to filter and prioritize for associated pheno-type (e.g. disease) and practice
• Predicting functional impact of vari-ants– Utilizing sequence features– Utilizing protein features
• Popular methods and practice– Polyphen2– Mutationassessor– SeattleSeq
Genome Informatics I (2015 Spring)
FUNCTIONAL IMPACT OF VARIANTS
Genome Informatics I (2015 Spring)
We usually have too many variants
Saksena et al, “Developing Algo-rithms to Discover Novel Cancer Genes: A look at the challenges and approaches”
We want to narrow down the number of “called” vari-ant as small as possible
Genome Informatics I (2015 Spring)
A simple mutation calling does not give you the final answer
mutation calling (NGS)
A lot of candidate variants
some from se-quencing error
some from polymorphisms
some from mapping error
some are pas-sengers
Genome Informatics I (2015 Spring)
A simple mutation calling does not give you the final answer
mutation calling (NGS)
A lot of candidate variants
some from se-quencing error
some from polymorphisms
some from mapping error
some are pas-sengers
A few real patho-genic variants
Genome Informatics I (2015 Spring)
Gold mining
Bunch of candidate variants
Many vari-ants
A few vari-ants
Strategy I: Do they really exist?- Any mistakes in sequencing
and variant calling?- Any non-disease causing poly-
morphisms?
Strategy II: Are they functional?- Are they damaging? pathogenic?- Are they related to phenotypes?
Genome Informatics I (2015 Spring)
Five ways to narrow down1. Include control data
1. eliminate germline variants
2. Use more strict variant quality threshold1. work on only confident variants
3. Filter out polymorphisms1. remove non-damaging polymorphisms
4. Predict functional impacts1. find damaging levels
5. Use disease specific knowledge1. to acquire final candidates
Genome Informatics I (2015 Spring)
Five ways to narrow down1. Include control data
1. eliminate germline variants
2. Use more strict variant quality threshold1. work on only confident variants
3. Filter out polymorphisms1. remove non-damaging polymorphisms
4. Predict functional impacts1. find damaging levels
5. Use disease specific knowledge1. to acquire final candidates
Strategy I
Genome Informatics I (2015 Spring)
Five ways to narrow down1. Include control data
1. eliminate germline variants
2. Use more strict variant quality threshold1. work on only confident variants
3. Filter out polymorphisms1. remove non-damaging polymorphisms
4. Predict functional impacts1. find damaging levels
5. Use disease specific knowledge1. to acquire final candidates
Strategy I
Strategy II
Genome Informatics I (2015 Spring)
1. Include control data
germline
somatic
somatic
100,000~~500,000 100~10
00
100~1000
We should eliminate unwanted germline variants
Genome Informatics I (2015 Spring)
When controls are unavail-able
• Single nucleotide polymorphism rate = 1/100~1/1000
• Whole Genome Sequencing– Total DNA length = 3 billion– Expected SNP numbers = 3~30 million
• Whole Exome Sequencing– Total DNA length = 50 million– Expected SNP numbers = 50~500 thousands
• Targeted Sequencing (Panel)– Total DNA length = 100~1000 thousands– Expected SNP numbers = 1000~10,000
• Hotspot Panel (only for very well known vari-ants)– Controls can be omitted
Genome Informatics I (2015 Spring)
2. Use more strict quality threshold
• Variant quality
Low Variant Quality- This variant (although it has
been called) can be false
Cause of low quality- Low read depth (insufficient
observation)- Bad basecall/mapping quality- Low allele frequency
Genome Informatics I (2015 Spring)
2. Use more strict quality threshold
• Possible actions– Cut out variants based on
• Variant quality (e.g. QUAL<10)• Total read depth (e.g. <20)• Number of alt-depth (e.g. <5)• Allele frequency (e.g. <0.1)
– Prioritize variants• Sort with variant quality and inspect from the top
Genome Informatics I (2015 Spring)
3. Filter out polymorphisms• When you had no control data (panel)
– Check if the variants have been reported as polymor-phism
• When you had control data– You may not have polymorphisms
• Because somatic mutations callers removes germline calls
– However, there are some cases that polymorphisms can be reported (as somatic mutations)• For example, low read depth in control sample
low depthbad region
Variant Undetected
Variant De-tected
Genome Informatics I (2015 Spring)
dbSNP
• Database of SNP
chr7:11584142 A>T
Genome Informatics I (2015 Spring)
dbSNP
• Database of SNP
chr7:11584142 A>T
Genome Informatics I (2015 Spring)
4. Predict functional im-pacts
• Types of point mutations– Coding mutations
• Synonymous (silent)– Amino acid unchanged
• Missense– Amino acid changed
• Nonsense– Stop codon gained
• Readthrough– Stop codon loss
– Non-coding mutations• Intron• Splice-variants• Variants in regulatory elements
Genome Informatics I (2015 Spring)
Functional impacts
• Types of indels– Inframe
• Insertion or deletion in a multiple of 3 base-pairs
– Frameshift
Genome Informatics I (2015 Spring)
General classification (pri-ority)
Genome Informatics I (2015 Spring)
General classification (pri-ority)
high-impactlow-inci-dencelow-confi-dence
High inci-dence
Genome Informatics I (2015 Spring)
Functional impact prediction of missense mutations
• How critical is an AA change to its protein function?– Amino acid conservation
• If the AA is essential, it would be conserved though the evolution
– Amino acid in protein conformation • Substitution of AA in active site would be more dam-
aging
Genome Informatics I (2015 Spring)
Amino acid conservation
Genome Informatics I (2015 Spring)
Protein Structure
Genome Informatics I (2015 Spring)
5. Use disease specific knowledge
• Your knowledge about the disease– e.g. cancer– “Has it been reported in other previous sam-
ples?”– Search it in COSMIC, if you found it is recurrent,
it is likely to be functional
Genome Informatics I (2015 Spring)
Five ways to narrow down1. Include control data
1. eliminate germline variants
2. Use more strict variant quality threshold1. work on only confident variants
3. Filter out polymorphisms1. remove non-damaging polymorphisms
4. Predict functional impacts1. find damaging levels
5. Use disease specific knowledge1. to acquire final candidates
Many, uncertain vari-ants
A few, reliable variants
Genome Informatics I (2015 Spring)
Five ways to narrow down1. Include control data
1. eliminate germline variants
2. Use more strict variant quality threshold1. work on only confident variants
3. Filter out polymorphisms1. remove non-damaging polymorphisms
4. Predict functional impacts1. find damaging levels
5. Use disease specific knowledge1. to acquire final candidates
Many, uncertain vari-ants
A few, reliable variants
Functional study, Mechanism study
Genome Informatics I (2015 Spring)
SUMMARY OF PART I
Genome Informatics I (2015 Spring)
- Connect to Linux cluster, Job script writing and submission- NGS technologies, NGS data - Short read alignment- Variant Calling, CNV, SV calling - Interpretation of discovered variants
Genome Informatics I (2015 Spring)
In the remaining classes
• Genomic data to expression data– Gene mRNA Protein Pathways and Net-
works Phenotype
• Use high throughput data for your study• Don’t forget your project
Genome Informatics I (2015 Spring)
PRACTICE - FUNCTIONAL VARIANT ANNOTATION WITH SEATTLESEQ
Genome Informatics I (2015 Spring)
Today’s data
• Somatic variants in chr22 of anonymous cancer called from Virmid
• Data location– /scratch/2015_GenomeInformatics/{yourdir}/
virmidoutput– If you did not complete somatic calling prac-
tice, copy it from /scratch/2015_GenomeInformatics/public
data download to local PC
① move to your virmid out directory
② check your virmid output
③ click FTP
④ double click
seattle-seq
search then click here!!!
seattle-seq
① write your email
② input your VCF file
③ check!!
④ check!!
① click file > open..
② select ‘all file’
③ select annotated file
①②
Filtering phase• accession (column H)
– for filtering curated isoforms• NM: mNRA• XM: predicted mRNA model filter
• functionGVS (column I)– for filtering damaging mutation type
• missense, missense-near-splice• stop-gain, stop-loss• splice-donor, splice-acceptor• The others filter
① ②
①②
IGV download
search then click here!!!
IGV download
download then double click!!
IGV view
IGV view
IGV view
① input disease bam file
② input normal bam file
③ input VCF file
IGV view