26
Genome-wide association mapping Introduction to theory and methodology Aaron Lorenz Department of Agronomy and Horticulture

Genome-wide association mapping Introduction to theory and methodology Aaron Lorenz Department of Agronomy and Horticulture

Embed Size (px)

Citation preview

Page 1: Genome-wide association mapping Introduction to theory and methodology Aaron Lorenz Department of Agronomy and Horticulture

Genome-wide association mapping

Introduction to theory and methodology

Aaron Lorenz

Department of Agronomy and Horticulture

Page 2: Genome-wide association mapping Introduction to theory and methodology Aaron Lorenz Department of Agronomy and Horticulture

GWAS – Genome-wide Association Study• Big subject• Lots of methods and software packages• Lots of considerations for handling data• We have some data to analyze

• 75 minutes

Page 3: Genome-wide association mapping Introduction to theory and methodology Aaron Lorenz Department of Agronomy and Horticulture

Slide credit: Mike Gore

Page 4: Genome-wide association mapping Introduction to theory and methodology Aaron Lorenz Department of Agronomy and Horticulture

Goal

Find genes contributing to variation in phenotypes of interest

Page 5: Genome-wide association mapping Introduction to theory and methodology Aaron Lorenz Department of Agronomy and Horticulture

Approaches to mapping genes

Yu and Buckler, 2006

Page 6: Genome-wide association mapping Introduction to theory and methodology Aaron Lorenz Department of Agronomy and Horticulture

Germplasm

Biometris

Page 7: Genome-wide association mapping Introduction to theory and methodology Aaron Lorenz Department of Agronomy and Horticulture

Germplasm• Any genetically diverse natural or artificial population can

be used– Examples

• 71 elite European maize inbred lines (Andersen et al., 2005)

• Diverse panel of 288 maize lines (Harjes et al., 2008)

• Diverse panel of 191 Arabidopsis lines (Stock center accessions and individuals sampled from the wild; Atwell et al. 2010)

• 915 dogs from 80 domestic breeds, 83 wild canids, 10 outbred African shelter dogs.

Page 8: Genome-wide association mapping Introduction to theory and methodology Aaron Lorenz Department of Agronomy and Horticulture

Linkage disequilibrium (LD)

AB A BD p p p 2

2

A a B b

Dr

p p p p

Common statistic to quantify LD. Normalized value of D.

• The non-random association of alleles between loci.

• Extent of LD over physical distance determines marker density needed.

Page 9: Genome-wide association mapping Introduction to theory and methodology Aaron Lorenz Department of Agronomy and Horticulture

LD decay in bi-parental linkage mapping populations

Slide credit: Peter Bradbury

Page 10: Genome-wide association mapping Introduction to theory and methodology Aaron Lorenz Department of Agronomy and Horticulture

Plots of LD across the Maize d3 Gene (Remington et al., 2001).

Gaut B. S., Long A. D. Plant Cell 2010:15:1502-1506

Copyright © 2003. American Society of Plant Biologists. All rights reserved.

r2 above diagonal, D’ below diagonal

bp

Note that LD drops to nearly 0 within 500 base pairs

Page 11: Genome-wide association mapping Introduction to theory and methodology Aaron Lorenz Department of Agronomy and Horticulture

Extensive LD in barley of the Upper Midwest

Page 12: Genome-wide association mapping Introduction to theory and methodology Aaron Lorenz Department of Agronomy and Horticulture

• 500 random individuals from a population phenotyped and genotyped– Genotypes were scored for one marker linked to a

candidate gene– Individuals scored as A1A1 = 0, A1A2 = 1, A2A2 = 2.

Toy example

0 1 2

Phe

no v

alue

0 : 0

: 0A

y bw

H b

H b

Page 13: Genome-wide association mapping Introduction to theory and methodology Aaron Lorenz Department of Agronomy and Horticulture

R: lm function• Fits a linear model with normal errors and constant

variance; generally this is used for regression analysis using continuous explanatory variables.

• Simple linear regression– lm(y ~ x)

• See riceGwasEmma.r

Page 14: Genome-wide association mapping Introduction to theory and methodology Aaron Lorenz Department of Agronomy and Horticulture

Population structure• Nearly always present in association mapping panels• Causes spurious associations if not accounted for.

AA

BB

AA

BB

AA

BB

AA

BBA

ABB

AA

BB

aa

bb

aa

bb

aa

bb

aa

bb

aa

bb

aa

bb

aa

bb

Extreme example

Within each of these populations, the Ab or bA gametes never occur, soD = freq(AB) – freq(A)*freq(B) = 0.25.When the subpops are combined into population and LD is calculated, the two loci are in complete LD regardless of their physical linkage.

Page 15: Genome-wide association mapping Introduction to theory and methodology Aaron Lorenz Department of Agronomy and Horticulture

Model population structure

y vq bw e

Subpop membership and effect

Marker allele dosage and effect

y 1 Qv Wb eMatrix notation

Page 16: Genome-wide association mapping Introduction to theory and methodology Aaron Lorenz Department of Agronomy and Horticulture

Illustration3 subpopulations, 2 markers, 10 individuals

4.4 1 0.75 0.25 0.00

4.6 1 0.65 0.30 0.05

5.3 1 0.50 0.40 0.10

5.0 1 0.75 0.05 0.20

5.8 1 0.80 0.00 0.20

5.7 1 0.20 0.60 0.20

4.3 1 0.20 0.80 0.00

4.6 1 0.30 0.70 0

4.4 1

4.8 1

1

2

3

41

512

623

7

8

9

10

0 0

0 1

1 1

1 1

0 1

1 0

1 0

.00 0 1

0.10 0.00 0.90 0 0

0.10 0.00 0.90 1 1

e

e

e

ev

ebv

ebv

e

e

e

e

1y Qv Wb e

Page 17: Genome-wide association mapping Introduction to theory and methodology Aaron Lorenz Department of Agronomy and Horticulture

Population structure and differential relatedness (or family structure)

Yu and Buckler, 2006

Page 18: Genome-wide association mapping Introduction to theory and methodology Aaron Lorenz Department of Agronomy and Horticulture

Mixed-linear model to account for family structure

y 1 Qv Wb Zu e

2~ (0, )uMVN u K

Polygenic effect(random)

K = kinship matrix. Normally calculated with genome-wide markers

Page 19: Genome-wide association mapping Introduction to theory and methodology Aaron Lorenz Department of Agronomy and Horticulture

Efficient Mixed-Model Association (EMMA)

• Uses eigenvalue decomposition to more efficiently solve mixed-model equation

• (Taking direct inverse of covariance matrix is computationally intensive. Want to avoid in GWAS.)

Page 20: Genome-wide association mapping Introduction to theory and methodology Aaron Lorenz Department of Agronomy and Horticulture

Options for modeling structure and kinship [see Price et al. (2010)]Inferring and modeling structure• Use knowledge on subpop membership directly• Subpopulation clustering (explicitly infer ancestry)

– STRUCTURE– ADMIXTURE

• Principal component analysis– Use top PCs as covariates to correct for pop structure– Related approach is multi-dimensional scaling (MDS)

Inferring kinship• Marker similarity matrix• Realized genomic additive relationship matrix• Pedigree additive relationship matrix

Page 21: Genome-wide association mapping Introduction to theory and methodology Aaron Lorenz Department of Agronomy and Horticulture

Efficient Mixed-Model Association (EMMA)

See riceGwasEmma.r

Page 22: Genome-wide association mapping Introduction to theory and methodology Aaron Lorenz Department of Agronomy and Horticulture

Manhattan plot

See riceGwasEmma.r

Page 23: Genome-wide association mapping Introduction to theory and methodology Aaron Lorenz Department of Agronomy and Horticulture

Statistical threshold: Correcting for multiple testing

Here?

Here?

Page 24: Genome-wide association mapping Introduction to theory and methodology Aaron Lorenz Department of Agronomy and Horticulture

Statistical threshold: Correcting for multiple testing• Bonferroni correction

– alphaC ≈ alphaE / test#

– Assumes independent tests– Too conservative

• Permutation testing– Good for linkage mapping– Generally, not valid for GWAS because family structure not

preserved

• False-discovery rate (Benjamini and Hochberg, 1995)– Calculate expected proportion of declared QTL that are false

positives.

Page 25: Genome-wide association mapping Introduction to theory and methodology Aaron Lorenz Department of Agronomy and Horticulture

Calculate effective number of tests