43
BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27 - DGM 328 CH-1005 Lausanne Switzerland work: ++41-21-692-5452 cell: ++41-78-663-4980 http://serverdgm.unil.ch/bergmann

Overview

  • Upload
    teague

  • View
    33

  • Download
    4

Embed Size (px)

DESCRIPTION

BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27 - DGM 328  CH-1005 Lausanne  Switzerland work: ++41-21-692-5452 cell: ++41-78-663-4980 http://serverdgm.unil.ch/bergmann. Overview. - PowerPoint PPT Presentation

Citation preview

Page 1: Overview

BSc Course:

"Experimental design“

Genome-wide Association Studies

Sven BergmannDepartment of Medical Genetics

University of LausanneRue de Bugnon 27 - DGM 328 

CH-1005 Lausanne Switzerland

 

work: ++41-21-692-5452cell: ++41-78-663-4980

http://serverdgm.unil.ch/bergmann

Page 2: Overview

Overview• Population stratification

• Associations: Basics

• Whole genome associations

• Genotype imputation

• Uncertain genotypes

• New Methods

Page 3: Overview

Overview• Population stratification

• Associations: Basics

• Whole genome associations

• Genotype imputation

• Uncertain genotypes

• New Methods

Page 4: Overview

6’18

9 in

divi

dual

s

Phenotypes

159 measurement

144 questions

Genotypes

500.000 SNPs

CoLaus = Cohort Lausanne

Collaboration with:Vincent Mooser (GSK), Peter Vollenweider & Gerard Waeber (CHUV)

Page 5: Overview

ATTGCAATCCGTGG...ATCGAGCCA…TACGATTGCACGCCG…

ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG…

ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG…

ATTGCAATCCGTGG...ATCGAGCCA…TACGATTGCACGCCG…

ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG…

Genetic variation in SNPs (Single Nucleotide Polymorphisms)

Page 6: Overview

Analysis of Genotypes only

Principle Component Analysis reveals SNP-vectors explaining largest variation in the data

Page 7: Overview

Example: 2PCs for 3d-data

http://ordination.okstate.edu/PCA.htm

Raw data points: {a, …, z}

Page 8: Overview

Example: 2PCs for 3d-data

http://ordination.okstate.edu/PCA.htm

Normalized data points: zero mean (& unit std)!

Page 9: Overview

Example: 2PCs for 3d-data

http://ordination.okstate.edu/PCA.htm

Identification of axes with the most variance

Most variance is along PCA1

The direction of most variance

perpendicular to PCA1 defines

PCA2

Page 10: Overview

Ethnic groups cluster according to geographic distances

PC1 PC1

PC

2P

C2

Page 11: Overview

PCA of POPRES cohort

Page 12: Overview

Overview• Population stratification

• Associations: Basics

• Whole genome associations

• Genotype imputation

• Uncertain genotypes

• New Methods

Page 13: Overview

Phenotypic variation:

Page 14: Overview

0

0.2

0.4

0.6

0.8

1

1.2

-6 -4 -2 0 2 4 6

What is association?chromosomeSNPs trait variant

Genetic variation yields phenotypic variation

Population with ‘ ’ allele Population with ‘ ’ allele

Distributions of “trait”

Page 15: Overview

Quantifying Significance

Page 16: Overview

T-test

t-value (significance) can be translated into p-value (probability)

Page 17: Overview

Association using regression

genotype Coded genotype

phen

otyp

e

Page 18: Overview

Regression analysis

X

Y

“response”

“feature(s)”

“intercept”

“coefficients”

“residuals”

Page 19: Overview

Regression formalism

(monotonic)transformation

phenotype(response variable)of individual i

effect size(regression coefficient)

coded genotype(feature) of individual i

p(β=0)error(residual)

Goal: Find effect size that explains best all (potentially transformed) phenotypes as a linear function of the genotypes and estimate the probability (p-value) for the data being consistent with the null hypothesis (i.e. no effect)

Page 20: Overview

Overview• Population stratification

• Associations: Basics

• Whole genome associations

• Genotype imputation

• Uncertain genotypes

• New Methods

Page 21: Overview

Whole Genome Association

Page 22: Overview

Whole Genome AssociationCurrent microarrays probe ~1M SNPs!

Standard approach: Evaluate significance for association of each SNP independently:

sig

nif

ican

ce

Page 23: Overview

Whole Genome Associationsi

gn

ific

ance

Manhattan plot

ob

serv

edsi

gn

ific

ance

Expected significance

Quantile-quantile plot

Chromosome & position

GWA screens include large number of statistical tests!• Huge burden of correcting for multiple testing!• Can detect only highly significant associations (p < α / #(tests) ~ 10-7)

Page 24: Overview

GWAS: >20 publications in 2006/2007

Massive!

Page 25: Overview
Page 26: Overview
Page 27: Overview
Page 28: Overview
Page 29: Overview

Current insights from GWAS:

• Well-powered (meta-)studies with (ten-)thousands of samples have identified a few (dozen) candidate loci with highly significant associations

• Many of these associations have been replicated in independent studies

Page 30: Overview

Current insights from GWAS:

• Each locus explains but a tiny (<1%) fraction of the phenotypic variance

• All significant loci together explain only a small (<10%) of the variance

David Goldstein:

“~93,000 SNPs would be required to explain 80% of the population variation in height.”

Common Genetic Variation and Human Traits, NEJM 360;17

Page 31: Overview

1. Other variants like Copy Number Variations or epigenetics may play an important role

2. Interactions between genetic variants (GxG) or with the environment (GxE)

3. Many causal variants may be rare and/or poorly tagged by the measured SNPs

4. Many causal variants may have very small effect sizes

5. Overestimation of heritabilities from twin-studies?

So what do we miss?

Page 32: Overview

Overview• Population stratification

• Associations: Basics

• Whole genome associations

• Genotype imputation

• Uncertain genotypes

• New Methods

Page 33: Overview

Intensity of Allele G

Inte

nsi

ty o

f A

llele

A

Genotypes are called with varying uncertainty

Page 34: Overview

Some Genotypes are missing at all …

Page 35: Overview

… but are imputed with different uncertainties

Page 36: Overview

… using Linkage Disequilibrium!

Markers close together on chromosomes are often transmitted together, yielding a non-zero correlation between the alleles.

Marker 1 2 3 n

LD

D

Page 37: Overview

Conclusion

• Genotypic markers are always measured or inferred with some degree of uncertainty

• Association methods should take into account this uncertainty

Page 38: Overview

Two easy ways dealing with uncertain genotypes

1. Genotype Calling: Choose the most likely genotype and continue as if it is true(p11=10%, p12=20% p22=70% => G=2)

2. Mean genotype: Use the weighted average genotype(p11=10%, p12=20% p22=70% => G=1.6)

Page 39: Overview

Overview• Associations: Basics

• Whole genome associations

• Population stratification

• Genotype imputation

• Uncertain genotypes

• New Methods

Page 40: Overview

New Methodbased on a mixture model both for

phenotypes and uncertain genotypes

Better control of false positives More power

Obs

erve

d si

gnifi

canc

e

Expected significance

Page 41: Overview

Modular Approach for Integrative Analysis of Genotypes and Phenotypes

Individuals

Genotypes

Phenotypes

Me

as

ure

me

nts

SN

Ps/H

ap

lotyp

es

Modular links

Page 42: Overview

Network Approaches for Integrative Association

Analysis

Using knowledge on physical gene-interactions or pathways to prioritize the search for functional interactions

Page 43: Overview

Overview• Associations: Basics

• Whole genome associations

• Population stratification

• Genotype imputation

• Uncertain genotypes

• New Methods