Exome sequencing and complex disease :practical aspects of rare variant association studies

Alice BouchomsAmaury VanvinckenroyeMaxime Legrand11What is exome sequencing ?Exon : coding sequence of the DNAExome sequencing : Aim : to sequence the coding part of the DNA i.e. the exons

22IntroductionGWAS : helped discover common coding variantsExome sequencingAlso rare coding variantsFaster, better large sample ( > 10 000 individuals)Before 2010 : only few publications on PUBMEDNow : more than 2000 publications on PUBMED3

2013201220113Key questions to ask yourself

44Study designState objectivesFocus on extreme outcomesUnusual phenotype or traitsBUT : CAREFUL : de novo mutationsGeographical restrictions ?

55Study designSequencing strategy ?Quality of the sample : 20x or greater level of coverage depth of sequencing/person : 60x or greaterNon-coding regions : can still be usefullDetermine ancestries or estimate genotype0,2x to 2x

66Variant callingGoal : obtain high-quality genotypes Several steps:DNA contamination, DNA fingerprints, good follow-up?Alignment with reference genome, calibration of base quality score, removal of duplicate reads.

77Variant callingAfter reads mapping:Sample quality metrics (spotting of outlier properties)Variant calling:Look for differences where overlaps appear in alignment with the reference genome

88Variant callingMachine-learning-based classifier:Polymorphic variants / artifactsEvaluate metrics : true / false positivesQuality metrics on samplesRecommendation: min depth of coverage 20XDevelopment of standards for storing sequence data and variant calls

99Association analysisGoal: find functional effects of variantsScore: indicates the effect on the protein function Separation between variants with high damage and the othersIf multiple annotations, 3 ways:Focus on the longest transcriptFocus on the most deleterious effectFocus on the canonical transcript

10Association analysisSingle variant association test Check of quality dataUsual way of processing rare variants: gather them in groups acting on the same gene to do the analysis

1111Association analysis2 methods for processing groups:Comparison of the number of variants between cases and controls Comparison with chance expectationsRecommendation: at least a test of each category with different thresholdsIf no threshold, variety of frequency cut-offs

1212Association analysisPackages available to perform the tests with subsets of dataExample :1. missense, splice, stop altering variants2. subset of deleterious variants3. splice, stop altering variants

1313Association analysisNo optimal choices for the analysis because of variability of variants and of their charateristics between genes.Permutation-based approachesStatistical significanceIf no permutation-based threshold, p values 5 10-7 QQ plots to summarize the results

1414Approaches for follow-upTo demonstrate association based on the analysed samples, additional samples are needed.

1515Approaches for follow-upExome chip experiments examine most of the varaints, but not very sensitive to non-European populations.

1616Approaches for follow-upStatistical imputation

Take the base which has the highest correlation with the missing one, and assume it is the same allele than T (i.e. minor or major). But again, often not possible for mixed populations

1717Role of functional assaysStudy the changes in the proteins due to coding variantsStudy why these changes result in diverse diseases.1818Forward geneticsOther approach to study functional variantsFirst look at which proteins show changesThen search in the DNA sequence for the variant(s)1919DiscussionIn other articles : more careful about the sample qualitygain of sensitivity in variant calls if made among several samplesindels in variant call are the major source of false positive. Need alignment algorithm which allows gapped alignement Check results of association in data bases2020DiscussionBecause of costs, exome sequencing studies focus on coding part of the genome. Thus not suitable for non-exonic sequence. (stuctural variants, chromosomal rearrangements) These problems will be partially solved by the cut in costs of sequencing




