34
Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis using 30-10-2014

Genome-Wide Association Studiesadegenet.r-forge.r-project.org/files/Glasgow2015... · 2015-08-06 · Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart ... •Controlling

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Genome-Wide Association Studiesadegenet.r-forge.r-project.org/files/Glasgow2015... · 2015-08-06 · Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart ... •Controlling

Genome-Wide Association Studies

Caitlin Collins, Thibaut Jombart

MRC Centre for Outbreak Analysis and Modelling

Imperial College London

Genetic data analysis using

30-10-2014

Page 2: Genome-Wide Association Studiesadegenet.r-forge.r-project.org/files/Glasgow2015... · 2015-08-06 · Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart ... •Controlling

Outline

• Introduction to GWAS

• Study design o GWAS design

o Issues and considerations in GWAS

• Testing for association o Univariate methods

o Multivariate methods

• Penalized regression methods

• Factorial methods

2

Page 3: Genome-Wide Association Studiesadegenet.r-forge.r-project.org/files/Glasgow2015... · 2015-08-06 · Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart ... •Controlling

Genomics & GWAS

3

Page 4: Genome-Wide Association Studiesadegenet.r-forge.r-project.org/files/Glasgow2015... · 2015-08-06 · Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart ... •Controlling

The genomics revolution

• Sequencing technology o 1977 – Sanger

o 1995 – 1st bacterial genomes

• < 10,000

bases per day per machine

o 2003 – 1st human genome

• > 10,000,000,000,000

bases per day per machine

• GWAS publications o 2005 – 1st GWAS

o Age-related macular degeneration

o 2014 – 1,991 publications

o 14,342 associations

Genomics & GWAS 4

Page 5: Genome-Wide Association Studiesadegenet.r-forge.r-project.org/files/Glasgow2015... · 2015-08-06 · Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart ... •Controlling

A few GWAS discoveries…

Genomics & GWAS 5

Page 6: Genome-Wide Association Studiesadegenet.r-forge.r-project.org/files/Glasgow2015... · 2015-08-06 · Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart ... •Controlling

• Genome Wide Association Study o Looking for SNPs…

associated with a phenotype.

• Purpose: o Explain

• Understanding

• Mechanisms

• Therapeutics

o Predict • Intervention

• Prevention

• Understanding not required

So what is GWAS?

Genomics & GWAS 6

Page 7: Genome-Wide Association Studiesadegenet.r-forge.r-project.org/files/Glasgow2015... · 2015-08-06 · Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart ... •Controlling

• Definition o Any relationship

between two measured quantities that renders them statistically dependent.

• Heritability o The proportion of

variance explained by genetics

o P = G + E + G*E • Heritability > 0

Association

Genomics & GWAS 7

p SNPs

n individuals

Cases

Controls

Page 8: Genome-Wide Association Studiesadegenet.r-forge.r-project.org/files/Glasgow2015... · 2015-08-06 · Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart ... •Controlling

Genomics & GWAS 8

Page 9: Genome-Wide Association Studiesadegenet.r-forge.r-project.org/files/Glasgow2015... · 2015-08-06 · Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart ... •Controlling

Why? • Environment, Gene-Environment interactions

• Complex traits, small effects, rare variants

• Gene expression levels

• GWAS methodology?

Genomics & GWAS 9

Page 10: Genome-Wide Association Studiesadegenet.r-forge.r-project.org/files/Glasgow2015... · 2015-08-06 · Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart ... •Controlling

Study Design

10

Page 11: Genome-Wide Association Studiesadegenet.r-forge.r-project.org/files/Glasgow2015... · 2015-08-06 · Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart ... •Controlling

• Case-Control o Well-defined “case”

o Known heritability

• Variations o Quantitative phenotypic data

• Eg. Height, biomarker concentrations

o Explicit models

• Eg. Dominant or recessive

GWAS design

Study Design 11

Page 12: Genome-Wide Association Studiesadegenet.r-forge.r-project.org/files/Glasgow2015... · 2015-08-06 · Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart ... •Controlling

• Data quality o 1% rule

• Controlling for confounding o Sex, age, health profile

o Correlation with other variables

• Population stratification*

• Linkage disequilibrium*

Issues & Considerations

Study Design 12

*

*

Page 13: Genome-Wide Association Studiesadegenet.r-forge.r-project.org/files/Glasgow2015... · 2015-08-06 · Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart ... •Controlling

• Problem o Violates assumed population homogeneity, independent

observations

• Confounding, spurious associations

o Case population more likely to be related than Control population

• Over-estimation of significance of associations

Population stratification

Study Design 13

• Definition o “Population stratification” =

population structure

o Systematic difference in allele frequencies btw. sub-populations…

• … possibly due to different ancestry

Page 14: Genome-Wide Association Studiesadegenet.r-forge.r-project.org/files/Glasgow2015... · 2015-08-06 · Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart ... •Controlling

• Solutions

o Visualise

• Phylogenetics

• PCA

o Correct

• Genomic Control

• Regression on Principal

Components of PCA

Population stratification II

Study Design 14

Page 15: Genome-Wide Association Studiesadegenet.r-forge.r-project.org/files/Glasgow2015... · 2015-08-06 · Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart ... •Controlling

• Definition o Alleles at separate loci are NOT

independent of each other

• Problem? o Too much LD is a problem

• noise >> signal

o Some (predictable) LD can be beneficial

• enables use of “marker” SNPs

Linkage disequilibrium (LD)

Study Design 15

Page 16: Genome-Wide Association Studiesadegenet.r-forge.r-project.org/files/Glasgow2015... · 2015-08-06 · Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart ... •Controlling

Testing for Association

16

Page 17: Genome-Wide Association Studiesadegenet.r-forge.r-project.org/files/Glasgow2015... · 2015-08-06 · Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart ... •Controlling

• Standard GWAS

o Univariate methods

• Incorporating interactions

o Multivariate methods

• Penalized regression methods (LASSO)

• Factorial methods (DAPC-based FS)

Methods for association testing

Testing for Association 17

Page 18: Genome-Wide Association Studiesadegenet.r-forge.r-project.org/files/Glasgow2015... · 2015-08-06 · Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart ... •Controlling

• Variations o Testing

• Fisher’s exact test, Cochran-Armitage trend test, Chi-squared test, ANOVA

• Gold Standard—Fischer’s exact test

o Correcting

• Bonferroni

• Gold Standard—FDR

Univariate methods

Testing for Association 18

p SNPs

n individuals

Cases

Controls

• Approach o Individual test statistics

o Correction for multiple

testing

Page 19: Genome-Wide Association Studiesadegenet.r-forge.r-project.org/files/Glasgow2015... · 2015-08-06 · Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart ... •Controlling

Strengths Weaknesses

• Straightforward

• Computationally fast

• Conservative

• Easy to interpret

• Multivariate system,

univariate framework

• Effect size of individual

SNPs may be too small

• Marginal effects of

individual SNPs ≠

combined effects

Univariate – Strengths & weaknesses

Testing for Association 19

Page 20: Genome-Wide Association Studiesadegenet.r-forge.r-project.org/files/Glasgow2015... · 2015-08-06 · Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart ... •Controlling

What about interactions?

Testing for Association 20

Page 21: Genome-Wide Association Studiesadegenet.r-forge.r-project.org/files/Glasgow2015... · 2015-08-06 · Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart ... •Controlling

• Epistasis o “Deviation from linearity under a

general linear model”

• With p predictors, there are:

• 𝑝𝑘

=𝑝𝑘

𝑘! k-way interactions

• p = 10,000,000 5 x 1011

o That’s 500 BILLION possible pair-wise interactions!

• Need some way to limit the number of pairwise interactions considered…

Interactions

Testing for Association 21

𝑌𝑖 = 𝑤0 + 𝑤1𝐴𝑖 + 𝑤2𝐵𝑖 +𝒘𝟑𝑨𝒊𝑩𝒊

Page 22: Genome-Wide Association Studiesadegenet.r-forge.r-project.org/files/Glasgow2015... · 2015-08-06 · Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart ... •Controlling

Multivariate methods

Testing for Association 22

LASSO penalized regression

Ridge regression

Neural Networks Penalized Regression

Bayesian Approaches

Factorial Methods

Bayesian Epistasis

Association Mapping

Logic Trees

Modified Logic

Regression-Gene

Expression Programming Genetic Programming for

Association Studies

Logic feature selection Monte Carlo

Logic Regression Logic regression

Supervised-PCA

Sparse-PCA

DAPC-based FS

(snpzip)

Bayesian partitioning

The elastic net

Bayesian Logistic

Regression with

Stochastic Search

Variable Selection

Odds-ratio-

based MDR

Multi-factor

dimensionality reduction

method

Genetic programming

optimized neural

networks

Parametric

decreasing method

Restricted

partitioning method

Combinatorial

partitioning method

Random forests

Set association

approach

Non-parametric Methods

Page 23: Genome-Wide Association Studiesadegenet.r-forge.r-project.org/files/Glasgow2015... · 2015-08-06 · Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart ... •Controlling

• Penalized regression methods o LASSO penalized regression

• Factorial methods o DAPC-based

feature selection

Multivariate methods (ii)

Testing for Association 23

Page 24: Genome-Wide Association Studiesadegenet.r-forge.r-project.org/files/Glasgow2015... · 2015-08-06 · Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart ... •Controlling

• Approach o Regression models multivariate association

o Shrinkage estimation feature selection

• Variations o LASSO, Ridge, Elastic net, Logic regression

• Gold Standard—LASSO penalized regression

Penalized regression methods

Testing for Association 24

Page 25: Genome-Wide Association Studiesadegenet.r-forge.r-project.org/files/Glasgow2015... · 2015-08-06 · Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart ... •Controlling

• Regression o Generalized linear model (“glm”)

• Penalization o L1 norm

o Coefficients 0

o Feature selection!

LASSO penalized regression

Testing for Association 25

Page 26: Genome-Wide Association Studiesadegenet.r-forge.r-project.org/files/Glasgow2015... · 2015-08-06 · Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart ... •Controlling

Strengths Weaknesses

• Stability

• Interpretability

• Likely to accurately

select the most

influential predictors

• Sparsity

• Multicollinearity

• Not designed for high-p

• Computationally intensive

• Calibration of penalty parameters o User-defined variability

• Sparsity

• NO p-values!

LASSO – Strengths & weaknesses

Testing for Association 26

Page 27: Genome-Wide Association Studiesadegenet.r-forge.r-project.org/files/Glasgow2015... · 2015-08-06 · Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart ... •Controlling

• Approach o Place all variables (SNPs) in a multivariate space

o Identify discriminant axis best separation

o Select variables with the highest contributions to that axis

• Variations o Supervised-PCA, Sparse-PCA, DA, DAPC-based FS

o Our focus—DAPC with feature selection (snpzip)

Factorial methods

Testing for Association 27

Page 28: Genome-Wide Association Studiesadegenet.r-forge.r-project.org/files/Glasgow2015... · 2015-08-06 · Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart ... •Controlling

Discriminant axis

De

nsi

ty o

f in

div

idu

als

Discriminant axis

De

nsi

ty o

f in

div

idu

als

DAPC-based feature selection

a

b

c d

e

Alleles

Individuals

0

0.1

0.2

0.3

0.4

0.5

a b c d e

Contribution to

Discriminant Axis

Healthy (“controls”)

Diseased (“cases”)

Testing for Association 28

Discriminant Axis Discriminant Axis

Page 29: Genome-Wide Association Studiesadegenet.r-forge.r-project.org/files/Glasgow2015... · 2015-08-06 · Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart ... •Controlling

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

a b c d e

Contribution to Discriminant Axis

?

Discriminant axis

De

nsi

ty o

f in

div

idu

als

DAPC-based feature selection • Where should we draw the line?

o Hierarchical clustering

Page 30: Genome-Wide Association Studiesadegenet.r-forge.r-project.org/files/Glasgow2015... · 2015-08-06 · Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart ... •Controlling

Hierarchical clustering (FS)

Testing for Association 30

0

0.1

0.2

0.3

0.4

0.5

a b c d e

Contribution to

Discriminant Axis

Hooray!

Page 31: Genome-Wide Association Studiesadegenet.r-forge.r-project.org/files/Glasgow2015... · 2015-08-06 · Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart ... •Controlling

Strengths Weaknesses

• More likely to catch all

relevant SNPs (signal)

• Computationally quick

• Good exploratory tool

• Redundancy > sparsity

• Sensitive to n.pca

• N.snps.selected varies

• No “p-value”

• Redundancy > sparsity

DAPC – Strengths & weaknesses

Testing for Association 31

Page 32: Genome-Wide Association Studiesadegenet.r-forge.r-project.org/files/Glasgow2015... · 2015-08-06 · Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart ... •Controlling

Conclusions

• Study design o GWAS design

o Issues and considerations in GWAS

• Testing for association o Univariate methods

o Multivariate methods

• Penalized regression methods

• Factorial methods

32

Page 33: Genome-Wide Association Studiesadegenet.r-forge.r-project.org/files/Glasgow2015... · 2015-08-06 · Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart ... •Controlling

Thanks for listening!

33

Page 34: Genome-Wide Association Studiesadegenet.r-forge.r-project.org/files/Glasgow2015... · 2015-08-06 · Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart ... •Controlling

Questions?

34