Upload
juniper-sutton
View
227
Download
0
Tags:
Embed Size (px)
Citation preview
HapMap:
application in the design and interpretation of association studies
Mark J. Daly, PhD on behalf of
The International HapMap Consortium
Goals of this segment
• Briefly summarize HapMap design and current status
• Discuss the application of HapMap to all aspects of association study design, analysis and interpretation
HapMap Project
High-density SNP genotyping across the genome provides information about– SNP validation, frequency, assay
conditions– correlation structure of alleles in the
genome
A freely-available public resource to increase the power and efficiency
of genetic association studies to medical traits
All data is freely available on the web for applicationin study design and analyses as researchers see fit
HapMap Samples
• 90 Yoruba individuals (30 parent-parent-offspring trios) from Ibadan, Nigeria (YRI)
• 90 individuals (30 trios) of European descent from Utah (CEU)
• 45 Han Chinese individuals from Beijing (CHB)
• 45 Japanese individuals from Tokyo (JPT)
HapMap progress
PHASE I – completed, described in Nature paper
* 1,000,000 SNPs successfully typed in all 270 HapMap samples* ENCODE variation reference resource available
PHASE II – data generation complete, data released this past Monday
* >3,500,000 SNPs typed in total !!!
ENCODE-HAPMAP variation project
• Ten “typical” 500kb regions
• 48 samples sequenced
• All discovered SNPs (and any others in dbSNP) typed in all 270 HapMap samples
• Current data set – 1 SNP every 279 bp
A much more complete variation resource by whichthe genome-wide map can evaluated
Completeness of dbSNP
Vast majority of common SNPs are contained in or highly correlated with a SNP in dbSNP
Recombination hotspots are widespread
and account for LD structure
7q21
Utility of LD in association study
• “If I’m a causal variant, what is relevant to my detection in association studies is how well correlated I am with one of the SNPs or haplotypes examined in the study.”
Coverage of Phase II HapMap(estimated from ENCODE data)
From Table 6 – “A Haplotype Map of the Human Genome”, Nature
Panel %r2 > 0.8 max r2
YRI 81 0.90CEU 94 0.97
CHB+JPT 94 0.97
Coverage of Phase II HapMap(estimated from ENCODE data)
From Table 6 – “A Haplotype Map of the Human Genome”, Nature
Panel %r2 > 0.8 max r2
YRI 81 0.90CEU 94 0.97
CHB+JPT 94 0.97
Percentage of deeply ascertained common variants highly correlated with a HapMap SNP
Coverage of Phase II HapMap(estimated from ENCODE data)
From Table 6 – “A Haplotype Map of the Human Genome”, Nature
Panel %r2 > 0.8 max r2
YRI 81 0.90CEU 94 0.97
CHB+JPT 94 0.97
Average maximum correlation between a deeplyascertained variant and a neighboring HapMap SNP
Coverage of Phase II HapMap(estimated from ENCODE data)
Vast majority of common variation (MAF > .05) captured by Phase II HapMap
Panel %r2 > 0.8 max r2
YRI 81% 0.90CEU 94% 0.97
CHB+JPT 94% 0.97
Applying the HapMap
• Study design - tagging• Study coverage evaluation• Study analysis - improving association
testing• Study interpretation
– Comparison of multiple studies– Connection to genes/genomic features– Integration with expression and other functional
data
• Other uses of HapMap data– Admixture, LOH, selection
Tagging from HapMap
• Since HapMap describes the majority of common variation in the genome, choosing non-redundant sets of SNPs from HapMap offers considerable efficiency without power loss in association studies
Pairwise tagging
Tags:
SNP 1SNP 3SNP 6
3 in total
Test for association:
SNP 1SNP 3SNP 6
A/T1
G/A2
G/C3
T/C4
G/C5
A/C6
high r2 high r2 high r2
AATT
GC
CG
GC
CG
TCCC
ACCC
GC
CG
TCCC
GGAA
GGAA
After Carlson et al. (2004) AJHG 74:106
Pairwise Tagging Efficiency
Table 7 Number of selected tag SNPs to capture all observed common SNPs in the Phase I HapMap for the three analysis panels using pairwise tagging at different r2 thresholds
YRI CEU CHB+JPT
Pairwise r2 ≥ 0.5 324,865 178,501 159,029
r2 ≥ 0.8 474,409 293,835 259,779
r2 = 1 604,886 447,579 434,476
Tag SNPs were picked to capture common SNPs in release 16c.1 for every 7,000 SNP bin using Haploview.
Tagging Phase I HapMap offers 2-5x gains in efficiency
Tags:
SNP 1SNP 3SNP 6
3 in total
Test for association:
SNP 1SNP 3SNP 6
Use of haplotypes can improve genotyping
efficiencyTags:
SNP 1SNP 3
2 in total
Test for association:
SNP 1 captures 1+2SNP 3 captures 3+5
“AG” haplotype captures SNP 4+6
AATT
GC
CG
GC
CG
TCCC
ACCC
GC
CG
TCCC
GGAA
GGAA
ACCC
A/T1
G/A2
G/C3
T/C4
G/C5
A/C6
tags in multi-marker test should be conditional on
significance of LD in order to avoid overfitting
Efficiency and powerR
elat
ive
pow
er (
%)
Average marker density (per kb)
tag SNPs
randomSNPs
P.I.W. de Bakker et al. (2005) Nat Genet Advance Online Publication 23 Oct 2005
~300,000 tag SNPsneeded to cover commonvariation in whole genome
in CEU
How to pick tag SNPs?
• What is the genetic hypothesis? Which variants do you want to test for a role in disease?– functional annotation (coding SNPs)– allele frequency (HapMap ascertainment)– previously implicated associations
• Go to http://www.hapmap.org – DCC supported interactive tagging
• Export HapMap data into tools such as Tagger, Haploview (www.broad.mit.edu/mpg)
Will tag SNPs picked from HapMap apply to other population samples?
Population differences add very little inefficiencyPlatform presentation: Paul de Bakker (#223: Sat 9.30)
CEUCEU
Whites fromLos Angeles, CA
Whites fromLos Angeles, CA Botnia, FinlandBotnia, Finland
CEUCEUCEUCEU
Utah residents with European ancestry
(CEPH)
Utah residents with European ancestry
(CEPH)
Applying the HapMap
• Study design - tagging• Study coverage evaluation• Study analysis - improving association
testing• Study interpretation
– Comparison of multiple studies– Connection to genes/genomic features– Integration with expression and other functional
data
• Other uses of HapMap data– Admixture, LOH, selection
Genome-wide association coverage
• If genome-wide products are typed on the HapMap sample panel, the SNPs on HapMap not included in the panel provide an evaluation for the coverage of the product– ENCODE (deep ascertainment) – Phase II (dense, genome-wide)
Association tests with fixed markers
Tests of association:
SNP 1SNP 3
A/T1
G/A2
G/C3
T/C4
G/C5
A/C6
AATT
GC
CG
GC
CG
TCCC
ACCC
GC
CG
TCCC
GGAA
GGAA
AC
CC
= SNP on whole-genome product
(~1 - 5% common variation directly assayed)
Association tests with fixed markers
Tests of association:
SNP 1SNP 3
A/T1
G/A2
G/C3
T/C4
G/C5
A/C6
high r2 high r2
AATT
GC
CG
GC
CG
TCCC
ACCC
GC
CG
TCCC
GGAA
GGAA
AC
CC
Association tests with fixed markers
Tests of association:
SNP 1SNP 3
SNPs actually tested:
SNP 1SNP 3SNP 2SNP 5
A/T1
G/A2
G/C3
T/C4
G/C5
A/C6
high r2 high r2
AATT
GC
CG
GC
CG
TCCC
ACCC
TCCC
AC
CC
GGAA
Genome-wide products can capture most common
variation
0%10%
20%30%
40%50%60%
70%80%
90%100%
0 0.2 0.4 0.6 0.8 1
R2 cutoff
Fra
ctio
n o
f S
NP
s
CEU
YRI
Example: 500K data generated by Affymetrix and recently submitted to HapMap DCC
• Platform presentations tomorrow morning 8 AM sharp:– Peer– Jorgenson– Lazarus
– As well as several detailed posters!
Applying the HapMap
• Study design - tagging• Study coverage evaluation• Study analysis - improving association
testing• Study interpretation
– Comparison of multiple studies– Connection to genes/genomic features– Integration with expression and other functional
data
• Other uses of HapMap data– Admixture, LOH, selection
Can incorporating tests of haplotypes of SNPs on the
genome-wide product improve this coverage?
Improving association power using data from HapMap
Tests of association:
SNP 1SNP 3
SNPs actually tested:
SNP 1SNP 3SNP 2SNP 5
A/T1
G/A2
G/C3
T/C4
G/C5
A/C6
AATT
GC
CG
GC
CG
TCCC
ACCC
TCCC
AC
CC
GGAA
Improving association power using data from HapMap
Tests of association:
SNP 1SNP 3
SNPs actually tested:
SNP 1SNP 3SNP 2SNP 5
A/T1
G/A2
G/C3
T/C4
G/C5
A/C6
AATT
GC
CG
GC
CG
TCCC
ACCC
TCCC
AC
CC
GGAA
Improving association power using data from HapMap
Tests of association:
SNP 1SNP 3
“AG haplotype”
SNPs actually tested:
SNP 1SNP 3SNP 2SNP 5SNP 4SNP 6
A/T1
G/A2
G/C3
T/C4
G/C5
A/C6
AATT
GC
CG
GC
CG
TCCC
ACCC
GGAA
Haplotypes increase coverage
0%
20%
40%
60%
80%
100%
0 0.2 0.4 0.6 0.8 1
R2 cutoff
Fra
cti
on
of
SN
Ps
single marker predictors2-marker predictors3-marker predictors
Applying the HapMap
• Study design - tagging• Study coverage evaluation• Study analysis - improving association
testing• Study interpretation
– Connection to genes/genomic features– Comparison of multiple association studies– Integration with expression and other functional
data
• Other uses of HapMap data– Admixture, LOH, selection
Integration with genomic features
• Positive association to a SNP on HapMap enables detailed interpretation:– How many other SNPs are in LD with this
SNP?– What genes are in LD with this SNP?– What coding variants and putative
functional variants are in LD with this SNP?
Potential to improve power by modifying Bayesian priors
of each association test based on this information
Example: Complement Factor H - AMD
• Original SNP hit in Affy 100K experiment – rs380390
• Extent and structure of LD from HapMap aids in the fine mapping phase of project
Klein et al Science 2005
Example: Complement Factor H - AMD
rs380390
Example: Complement Factor H - AMD
rs380390
Meta-analysis of association studies
• When different marker sets are used to study association (candidate gene or genome-wide), results can be readily integrated when all markers are typed on HapMap samples
Example: DTNBP1 and schizophrenia
• Multiple studies have described modest association to schizophrenia
• Most studies have examined small numbers of non-overlapping sets of SNPs
• HapMap data can be used to determine whether these association finding
Derek Morris, Mousumi Mutsuddi (WCPG meeting)
Extensive LD across DTNBP1
Phase IIHapMap -186 SNPs180 kb
Phylogeny of DTNBP1 tag SNPs
4 (GA), 5 (CT)
2 (AG)7 (CT)
10 (AT)
3 (GA)
AGGCCT GGATCAAGGCCA AGATTAAAGCCT
AGGCCA
2 4 53 107
Ancestral haplotype
6% 33% 42% 8% 11%
Associated alleles reported
AGGCCT GGATCAAGGCCA AGATTAAAGCCT
AGGCCA
2 4 53 107
Tag SNPsStraub 2002Van den Oord 2003
Associated alleles reported
AGGCCT GGATCAAGGCCA AGATTAAAGCCT
AGGCCA
2 4 53 107
Tag SNPsStraub 2002Van den Oord 2003
Schwab 2003
Associated alleles reported
AGGCCT GGATCAAGGCCA AGATTAAAGCCT
AGGCCA
2 4 53 107
Tag SNPsStraub 2002Van den Oord 2003
Van den Bogaert 2003Funke 2004Schwab 2003
Associated alleles reported
AGGCCT GGATCAAGGCCA AGATTAAAGCCT
AGGCCA
2 4 53 107
Tag SNPsStraub 2002Van den Oord 2003
Van den Bogaert 2003Funke 2004Schwab 2003
Williams 2004Bray 2005
Associated alleles reported
AGGCCT GGATCAAGGCCA AGATTAAAGCCT
AGGCCA
2 4 53 107
Tag SNPsStraub 2002Van den Oord 2003
Van den Bogaert 2003Funke 2004Schwab 2003
Williams 2004Bray 2005
Kirov 2004
Inconsistent findings
• No consistently associated SNP/haplotype pattern across studies
• All studies (European-derived populations) had allele/haplotype frequencies compatible with HapMap-CEU sample
• HapMap can successfully relate associations from diverse marker sets
Other Applications – Structural Variation
• 3 papers coming out in the next month describe use of HapMap data to identify large, common deletion polymorphisms
• LD around these polymorphisms permits their assessment with tag SNPs/haplotypes in genome-wide association studies
Other Applications – Admixture Scanning
• HapMap data provides a rich source of highly differentiated SNPs for design of admixture panels
• Fine mapping of admixture signals can be focused on the full set of highly differentiated alleles in any region of the genome
Other Applications –LOH
• HapMap identifies– Regions of extended LD that may
manifest themselves as unusually long stretches of homozygosity in individual samples
– The catalog of large deletion variants on the HapMap will differentiate between LOH that is potentially de novo and causal, and that which is simply commonly segregating in the population
LOH analysis cognizant of HapMap patterns under development
Early results encouraging
• At this meeting– Arking and colleagues describe
identification of variant altering QT-interval
– Herbert and colleagues describe a novel gene for obesity
– Wijmenga and colleagues describe a novel gene for celiac disease