30
Statistical methods for genetic association studies http://www.stats.gla.ac.uk/~paulj/assoc_study_stats.ppt

Statistical methods for genetic association studies paulj/assoc_study_stats.ppt

Embed Size (px)

Citation preview

Page 1: Statistical methods for genetic association studies paulj/assoc_study_stats.ppt

Statistical methods forgenetic association studies

http://www.stats.gla.ac.uk/~paulj/assoc_study_stats.ppt

Page 2: Statistical methods for genetic association studies paulj/assoc_study_stats.ppt

A tutorial on statistical methods for population association studies

David Balding

Nature Reviews Genetics (2006) 7:781-791

Page 3: Statistical methods for genetic association studies paulj/assoc_study_stats.ppt

Environment

G×E interaction

Genetics

Health outcome

or

?

Page 4: Statistical methods for genetic association studies paulj/assoc_study_stats.ppt

Recombination

A X

a x

Gametophytes(gamete-producing cells)

Gametes

a X

A x

Recombination

B

B

b

b

X/x: unobserved causative mutation

A/a: distant marker

B/b: linked marker

Page 5: Statistical methods for genetic association studies paulj/assoc_study_stats.ppt

Approaches to finding disease genes

• Population-based association study– “unrelated” subjects

• Family-based association study– nuclear families

• Admixture mapping– recently admixed population

• Linkage mapping– large pedigrees

Darvasi & Shifman (2005) Nature Genetics

Page 6: Statistical methods for genetic association studies paulj/assoc_study_stats.ppt

Types of population association study

• Candidate causative polymorphism– SNP (single nucleotide polymorphism), deletion, duplication

• Candidate causative gene (5-50 marker SNPs)– evidence from linkage study or function

• Candidate causative region (100s of marker SNPs)– evidence from linkage study

• Genome-wide (>300,000 marker SNPs)– no prior evidence required

Page 7: Statistical methods for genetic association studies paulj/assoc_study_stats.ppt

Common disease common variant (CDCV) hypothesis

Page 8: Statistical methods for genetic association studies paulj/assoc_study_stats.ppt

• Assuming mating is random and the population is large, HWE genotype frequencies will apply

• Allele frequencies:P(X) = pP(x) = q

• HWE genotype frequencies:P(XX) = p2

P(Xx) = 2pqP(xx) = q2

• Useful data quality check:– chi-squared or exact test– log QQ plot

• But can discard causative mutations

p q

p p2 pq

q pq q2

Preliminary analysis: data quality

Page 9: Statistical methods for genetic association studies paulj/assoc_study_stats.ppt

Log QQ plot

Page 10: Statistical methods for genetic association studies paulj/assoc_study_stats.ppt

Preliminary analysis: dealing with missing data

• Imputation– various methods: maximum likelihood; probalistic;

‘hot-deck’; regression modelling– test for independence of ‘missingness’ and case-

control status

Page 11: Statistical methods for genetic association studies paulj/assoc_study_stats.ppt

Choice of inheritance model

Dominant vs additive inheritance

0%

50%

100%

0 1 2

Number trait alleles inherited

Tra

it v

alu

e

Dominant

Additive

Page 12: Statistical methods for genetic association studies paulj/assoc_study_stats.ppt

Dominant vs additive inheritance

0%

50%

100%

0 1 2

Number trait alleles inherited

Tra

it v

alu

e

Dominant

Additive

Choice of inheritance model

Page 13: Statistical methods for genetic association studies paulj/assoc_study_stats.ppt

Dominant vs additive inheritance

0%

50%

100%

0 1 2

Number trait alleles inherited

Tra

it v

alu

e

Dominant

Additive

Choice of inheritance model

Page 14: Statistical methods for genetic association studies paulj/assoc_study_stats.ppt

Tests of association: single SNP

• Case-control– Treat genotype as factor with 3 levels, perform 2x3 goodness-of-

fit test. Loses power if effect is additive– Count alleles rather than individuals, perform 2x2 goodness-of-fit

test. Out of favour because• sensitive to deviation from HWE• risk estimates not interpretable

Major allele homozygote (0)

Heterozygote (1) Minor allele homozygote (2)

Case

Control

Page 15: Statistical methods for genetic association studies paulj/assoc_study_stats.ppt

Tests of association: single SNP

• Case-control– Cochran-Armitage test

• loses power if additivity assumption wrong

Cochran-Armitage test

Page 16: Statistical methods for genetic association studies paulj/assoc_study_stats.ppt

Tests of association: single SNP

• Case-control– Armitage or goodness-of-fit? Depends on:

• Prior knowledge of inheritance (additive, dominant, etc)

• Genotype frequencies, e.g. use Armitage test when minor allele is rare, goodness-of-fit test otherwise

Page 17: Statistical methods for genetic association studies paulj/assoc_study_stats.ppt

Tests of association: single SNP

• Case-control– Logistic regression

• Easily incorporates inheritance model (additive, dominant, etc)

• But assumes phenotype is outcome variable not genotype, so easier to justify for prospective studies

Page 18: Statistical methods for genetic association studies paulj/assoc_study_stats.ppt

Tests of association: single SNP

• Continuous outcome– Linear regression

• Ordered categorical outcomes– Multinomial regression

Page 19: Statistical methods for genetic association studies paulj/assoc_study_stats.ppt

Problems: population stratification

Cases

Page 20: Statistical methods for genetic association studies paulj/assoc_study_stats.ppt

Correcting for population stratification

• Genomic control– Genotype null SNPs and use to calculate background

inflation in test statistic due to population stratification– Limited to simple single-SNP analyses– Can over- or under-correct

• Other approaches using null SNPs– Regression, principal components analysis, model

underlying demography

Page 21: Statistical methods for genetic association studies paulj/assoc_study_stats.ppt

Problems: multiple testing

• Bonferroni correction– conservative when SNPs are linked

• Permutation– computationally demanding

• False discovery rate• Bayesian approaches

Page 22: Statistical methods for genetic association studies paulj/assoc_study_stats.ppt

• Advantages– Many SNPs may be linked to a gene, but individually may not

have a significant effect– Interactions between SNPs can be modelled– ‘Tag’ SNPs can reduce testing of redundant linked SNPs

• Methods– Linear regression, logistic regression– Armitage test

• Haplotype-based methods– Natural interpretation– But power reduced due to multiple alleles

Tests of association: multiple SNPs

Page 23: Statistical methods for genetic association studies paulj/assoc_study_stats.ppt

Haplotypes

Nature Genetics  37, 915 - 916 (2005)

Page 24: Statistical methods for genetic association studies paulj/assoc_study_stats.ppt
Page 25: Statistical methods for genetic association studies paulj/assoc_study_stats.ppt

Inferring haplotype phase

Page 26: Statistical methods for genetic association studies paulj/assoc_study_stats.ppt

Inferring haplotype phase

?

Page 27: Statistical methods for genetic association studies paulj/assoc_study_stats.ppt

Inferring haplotype phase

Page 28: Statistical methods for genetic association studies paulj/assoc_study_stats.ppt

Inferring haplotype phase

Page 29: Statistical methods for genetic association studies paulj/assoc_study_stats.ppt

Methods & software• PHASE, FASTPHASE• EH+• FBAT• HAPLOTYPER• EM-DECODER• PLEM• HAP• HAPLORE• Haplo.stat • SNPEM• PEDPHASE• SNPHAP• TDTHAP

Inferring haplotype phase

Page 30: Statistical methods for genetic association studies paulj/assoc_study_stats.ppt

• Phase cases and controls separately or pooled?– Separating can give inflated type I error– Pooling can reduce power

Inferring haplotype phase