Click here to load reader

Guimin Gao, Wenan Chen, & Xi Gao Department of Biostatistics, VCU Association Tests for Rare Variants Using Sequence Data

Embed Size (px)

Citation preview

  • Slide 1

Guimin Gao, Wenan Chen, & Xi Gao Department of Biostatistics, VCU Association Tests for Rare Variants Using Sequence Data Slide 2 Introduction to Association tests: two hypotheses Common variant-common disease Common variant: Minor allele frequencies (MAF) >= 5% Using linkage disequilibrium(LD) Rare variant-common disease Rare variant: MAF < 1% (or 5%) High allelic heterogeneity: collectively by multiple rare variants with moderate to high penetrances Associations through LD would not be suitable Slide 3 Association tests for Common variants Test a single marker each time Cochran-Armitages trend test (CATT) (assuming additive (ADD)) Power: High for additive (ADD) or Multiplicative (MUL); low recessive (REC) or Dominant (DOM) Genotype association test (GAT) using chi-square statistic Power: a little lower for ADD, higher for REC MAX3 = maximum of three trend test statistics across the REC, ADD, and DOM models (Freidlin et al. 2002 Hum Hered.) Power: lower than CATT under ADD higher than CATT & CAT under REC Slide 4 Association tests for Common variants Test for single marker (CATT, GAT, & MAX3) Low power when MAF Weighted Sum Method (Madsen & Browning 09) Power comparison: Simulations assuming genotypic relative risk is proportion to MAF at disease loci (Madsen & Browning 09) Weighted Sum Method (WSM) > CMC > CAST (WSM) > CMC may not be true in other situations Can be applied to rare variants & common variants Disadvantage: Give very high weights to very rare alleles (singleton), very low weights to common variants. Slide 13 An evaluation of the CMC method and Weighted sum method by using GAW 17 data Both methods are powerful (based on the authors simulation) Our evaluation based on simulated datasets from GAW 17 GAW 17 data: a subset of genes with real sequence data available in the 1000 genome project Simulated phenotypes Unrelated individuals, families Dataset of 697 unrelated individuals 24487 SNPs in 3205 genes from 22 autosomal chromosomes Only test for the 2196 genes with non-synonymous SNPs Slide 14 GAW 17 dataset of unrelated individuals Four phenotypes: Q1, Q2, Q4 and disease status. Q1, Q2, and Q4 are quantitative traits Q1 associated with 39 SNP in 9 genes, Q2 associated with 72 SNPs in 13 genes Q4: not related to any genes Disease status is a binary trait: affected or unaffected, associated with 37 genes 200 simulated phenotype replicates Only one replicate of genotype data (original data) Slide 15 Methods: case-control design Transform Q1, Q2, Q4 into binary traits Splitting at the top 30% percentile of the distributions Transforming Phenotypes Slide 16 Criteria for evaluation of Tests Familywise error rate (FWER) 2196 genes with non-synonymous SNPs, 2196 tests 2196 null hypotheses H j0 : gene not associated with the trait Q1 associated in 9 genes, 9 null hypotheses are not true. (2196-9) null hypotheses are true FWER = Pr(reject at least one true null hypothesis) = N f /200 N f : No. of replicates, at least one true hypothesis are rejected Average Power Mean of power for all the 9 genes that affect the phenotypes Evaluating power: Q1, Q2, Disease Evaluate FWER: Q4 Slide 17 Distribution of MAF in the GAW 17 dataset Figure 1. Distribution of MAF of 24487 SNPs in GAW 17 Slide 18 Figure 1. Group SNPs based on MAFs for CMC 0 - 0.010.01 - 0.1>=0.1 Similar to Madsen & Browning (2009) Slide 19 Table 1: Average power TraitsCMC methodWeighted sum method Q10.1440.112 Q20.006150.00308 Disease0.004440.00500 Slide 20 Table 2: FWER (nominal = 0.05) TraitCMC methodWeighted sum method Q40.1150.0100 CMC has FWER inflation Population stratification or admixture, Samples from Asian, Europe, Relatedness among samples Similar results in Power and FWER were reported at GAW 17 Slide 21 Variable-Threshold Approach (Price et al 2010) Given a threshold T, calculate a score for indiv j I ij = 0, 1, 2, the count of the minor allele of indiv i at locus j Calculate the sum of score for cases: Calculate Z(T) = V(T)/Var(V(T)) Find T to maximize Z(T), Z max = max (Z(T)) Permutation to estimate p-value for Z max Power: >CMC; Extended to quantitative traits Slide 22 A weighted approach (Price et al 2010) Calculate a weighted score for indiv j I ij = 0, 1, 2 Calculate the sum of score for cases Possible weight Power: similar to the weighted sum method (Madsen & Browning 09) Slide 23 A weighted approach (Price et al 2010) Calculate the sum of score for cases I ij = 0, 1, 2 Calculate weight by the prediction of functional effects PolyPhen-2 is used to predict damaging effects of missense mutations with probabilistic scores. Probabilistic scores as weights may reduce the noise of non- functional variants. Higher Power than other methods Slide 24 A data-adaptive sum test (Han & Pan 2010, Hum Hered) Logistic model x ij = 0, 1, 2, the count of the minor allele of indiv i at locus j Effect on opposite directions If j