Upload
ezekiel-acevedo
View
42
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Exome sequencing and complex disease :. practical aspects of rare variant association studies Alice Bouchoms Amaury Vanvinckenroye Maxime Legrand. What is exome sequencing ?. Exon : coding sequence of the DNA Exome sequencing : - PowerPoint PPT Presentation
Citation preview
1
EXOME SEQUENCING AND COMPLEX DISEASE :practical aspects of rare variant association studies
Alice Bouchoms
Amaury Vanvinckenroye
Maxime Legrand
2
WHAT IS EXOME SEQUENCING ?
Exon : coding sequence of the DNA Exome sequencing :
Aim : to sequence the coding part of the DNA i.e. the exons
3
INTRODUCTION
GWAS : helped discover common coding variants
Exome sequencing Also rare coding variants Faster, better large sample ( > 10 000 individuals) Before 2010 : only few publications on PUBMED Now : more than 2000 publications on PUBMED
2013
20122011
4
KEY QUESTIONS TO ASK YOURSELF
5
STUDY DESIGN
State objectives Focus on extreme outcomes
• Unusual phenotype or traits• BUT : CAREFUL : de novo mutations
Geographical restrictions ?
6
STUDY DESIGN Sequencing strategy ?
Quality of the sample : 20x or greater level of coverage
depth of sequencing/person : 60x or greater Non-coding regions : can still be usefull
Determine ancestries or estimate genotype 0,2x to 2x
7
VARIANT CALLING
Goal : obtain high-quality genotypes Several steps:
DNA contamination, DNA fingerprints, good follow-up?
Alignment with reference genome, calibration of base quality score, removal of duplicate reads.
8
VARIANT CALLING
After reads mapping: Sample quality metrics (spotting of outlier
properties) Variant calling:
Look for differences where overlaps appear in alignment with the reference genome
9
VARIANT CALLING Machine-learning-based classifier:
Polymorphic variants / artifacts Evaluate metrics : true / false positives
Quality metrics on samples Recommendation: min depth of coverage 20X Development of standards for storing sequence
data and variant calls
ASSOCIATION ANALYSIS Goal: find functional effects of variants Score: indicates the effect on the protein function
Separation between variants with high damage and the others
If multiple annotations, 3 ways: Focus on the longest transcript Focus on the most deleterious effect Focus on the canonical transcript
11
ASSOCIATION ANALYSIS
Single variant association test Check of quality data
Usual way of processing rare variants: gather them in groups acting on the same gene to do the analysis
12
ASSOCIATION ANALYSIS 2 methods for processing groups:
Comparison of the number of variants between cases and controls
Comparison with chance expectations Recommendation: at least a test of each category
with different thresholds If no threshold, variety of frequency cut-offs
13
ASSOCIATION ANALYSIS
Packages available to perform the tests with subsets of data
Example : 1. missense, splice, stop altering variants 2. subset of deleterious variants 3. splice, stop altering variants
14
ASSOCIATION ANALYSIS No optimal choices for the analysis because of
variability of variants and of their charateristics between genes.
Permutation-based approachesStatistical significance
If no permutation-based threshold, p values ≤ 5 10-7
QQ plots to summarize the results
15
APPROACHES FOR FOLLOW-UP
To demonstrate association based on the analysed samples, additional samples are needed.
16
APPROACHES FOR FOLLOW-UP
Exome chip experiments examine most of the varaints, but not very sensitive to non-European populations.
17
APPROACHES FOR FOLLOW-UP
Statistical imputation
Take the base which has the highest correlation with the missing one, and assume it is the same allele than T (i.e. minor or major). But again, often not possible for mixed
populations
18
ROLE OF FUNCTIONAL ASSAYS
Study the changes in the proteins due to coding variants
Study why these changes result in diverse diseases.
19
FORWARD GENETICS
Other approach to study functional variants First look at which proteins show changes Then search in the DNA sequence for the
variant(s)
20
DISCUSSION
In other articles : more careful about the sample quality gain of sensitivity in variant calls if made among
several samples indels in variant call are the major source of false
positive. Need alignment algorithm which allows gapped alignement
Check results of association in data bases
21
DISCUSSION
Because of costs, exome sequencing studies focus on coding part of the genome. Thus not suitable for non-exonic sequence. (stuctural variants, chromosomal rearrangements)
These problems will be partially solved by the cut in costs of sequencing
22
REFERENCES
23