Upload
audrey-holmes
View
221
Download
0
Tags:
Embed Size (px)
Citation preview
Statistical methods forgenetic association studies
http://www.stats.gla.ac.uk/~paulj/assoc_study_stats.ppt
A tutorial on statistical methods for population association studies
David Balding
Nature Reviews Genetics (2006) 7:781-791
Environment
G×E interaction
Genetics
Health outcome
or
?
Recombination
A X
a x
Gametophytes(gamete-producing cells)
Gametes
a X
A x
Recombination
B
B
b
b
X/x: unobserved causative mutation
A/a: distant marker
B/b: linked marker
Approaches to finding disease genes
• Population-based association study– “unrelated” subjects
• Family-based association study– nuclear families
• Admixture mapping– recently admixed population
• Linkage mapping– large pedigrees
Darvasi & Shifman (2005) Nature Genetics
Types of population association study
• Candidate causative polymorphism– SNP (single nucleotide polymorphism), deletion, duplication
• Candidate causative gene (5-50 marker SNPs)– evidence from linkage study or function
• Candidate causative region (100s of marker SNPs)– evidence from linkage study
• Genome-wide (>300,000 marker SNPs)– no prior evidence required
Common disease common variant (CDCV) hypothesis
• Assuming mating is random and the population is large, HWE genotype frequencies will apply
• Allele frequencies:P(X) = pP(x) = q
• HWE genotype frequencies:P(XX) = p2
P(Xx) = 2pqP(xx) = q2
• Useful data quality check:– chi-squared or exact test– log QQ plot
• But can discard causative mutations
p q
p p2 pq
q pq q2
Preliminary analysis: data quality
Log QQ plot
Preliminary analysis: dealing with missing data
• Imputation– various methods: maximum likelihood; probalistic;
‘hot-deck’; regression modelling– test for independence of ‘missingness’ and case-
control status
Choice of inheritance model
Dominant vs additive inheritance
0%
50%
100%
0 1 2
Number trait alleles inherited
Tra
it v
alu
e
Dominant
Additive
Dominant vs additive inheritance
0%
50%
100%
0 1 2
Number trait alleles inherited
Tra
it v
alu
e
Dominant
Additive
Choice of inheritance model
Dominant vs additive inheritance
0%
50%
100%
0 1 2
Number trait alleles inherited
Tra
it v
alu
e
Dominant
Additive
Choice of inheritance model
Tests of association: single SNP
• Case-control– Treat genotype as factor with 3 levels, perform 2x3 goodness-of-
fit test. Loses power if effect is additive– Count alleles rather than individuals, perform 2x2 goodness-of-fit
test. Out of favour because• sensitive to deviation from HWE• risk estimates not interpretable
Major allele homozygote (0)
Heterozygote (1) Minor allele homozygote (2)
Case
Control
Tests of association: single SNP
• Case-control– Cochran-Armitage test
• loses power if additivity assumption wrong
Cochran-Armitage test
Tests of association: single SNP
• Case-control– Armitage or goodness-of-fit? Depends on:
• Prior knowledge of inheritance (additive, dominant, etc)
• Genotype frequencies, e.g. use Armitage test when minor allele is rare, goodness-of-fit test otherwise
Tests of association: single SNP
• Case-control– Logistic regression
• Easily incorporates inheritance model (additive, dominant, etc)
• But assumes phenotype is outcome variable not genotype, so easier to justify for prospective studies
Tests of association: single SNP
• Continuous outcome– Linear regression
• Ordered categorical outcomes– Multinomial regression
Problems: population stratification
Cases
Correcting for population stratification
• Genomic control– Genotype null SNPs and use to calculate background
inflation in test statistic due to population stratification– Limited to simple single-SNP analyses– Can over- or under-correct
• Other approaches using null SNPs– Regression, principal components analysis, model
underlying demography
Problems: multiple testing
• Bonferroni correction– conservative when SNPs are linked
• Permutation– computationally demanding
• False discovery rate• Bayesian approaches
• Advantages– Many SNPs may be linked to a gene, but individually may not
have a significant effect– Interactions between SNPs can be modelled– ‘Tag’ SNPs can reduce testing of redundant linked SNPs
• Methods– Linear regression, logistic regression– Armitage test
• Haplotype-based methods– Natural interpretation– But power reduced due to multiple alleles
Tests of association: multiple SNPs
Haplotypes
Nature Genetics 37, 915 - 916 (2005)
Inferring haplotype phase
Inferring haplotype phase
?
Inferring haplotype phase
Inferring haplotype phase
Methods & software• PHASE, FASTPHASE• EH+• FBAT• HAPLOTYPER• EM-DECODER• PLEM• HAP• HAPLORE• Haplo.stat • SNPEM• PEDPHASE• SNPHAP• TDTHAP
Inferring haplotype phase
• Phase cases and controls separately or pooled?– Separating can give inflated type I error– Pooling can reduce power
Inferring haplotype phase