8
E-Mail [email protected] Original Paper Hum Hered 2013;75:44–51 DOI: 10.1159/000350964 The Use of Haplotypes in the Identification of Interaction between SNPs Gie Ken-Dror Steve E. Humphries Fotios Drenos Centre for Cardiovascular Genetics, BHF Laboratories, Department of Medicine, Royal Free and University College Medical School, London, UK Introduction Genome-wide association studies (GWAS) have had great success in identifying single nucleotide polymor- phisms (SNPs) associated with a number of common complex diseases [1, 2]. Although most of these newly identified SNPs are non-functional (i.e. they do not alter the function or expression of a protein), they can be used as surrogate markers for the unobserved functional vari- ant due to allelic association (linkage disequilibrium, LD) between adjacent SNPs. The extent of LD across a ge- nomic region dictates the density of SNP markers neces- sary to ensure the capture of the association between a marker and the causative allele sought. The average dis- tance that LD extends away from a specific SNP varies from 5 to 60 kb, with an upper range able to extend up to hundreds of kb [3–6], usually being larger in populations of European decent, but smaller (e.g. only a few kb) in individuals from African populations [4–7]. Commonly, we see a pattern of high LD across long stretches of DNA punctuated by recombination hot-spots [3, 5, 7]. The tra- ditional GWAS analysis tests the association between the phenotype and SNPs at a locus one at a time [3, 8–10], thus discounting information from their joint distribu- tion. If any interactions between SNPs exist, they need to have a strong marginal effect to be detected through the univariate analysis, something that we cannot a priori as- sume. This led many to believe that it is much more in- Key Words Haplotype · Interaction · Single nucleotide polymorphisms Abstract Although haplotypes can provide great insight into the com- plex relationships between functional polymorphisms at a locus, their use in modern association studies has been lim- ited. This is due to our inability to directly observe haplo- types in studies of unrelated individuals, but also to the extra complexity involved in their analysis and the difficulty in identifying which is the truly informative haplotype. Using a series of simulations, we tested a number of different models of a haplotype carrying two functional single nucleotide polymorphisms (SNPs) to assess the ability of haplotypic analysis to identify functional interactions between SNPs at the same locus. We found that, when phase is known, analy- sis of the haplotype is more powerful than analysis of the individual SNPs. The difference between the two approach- es becomes less either as an increasing number of non-infor- mative SNPs are included, or when the haplotypic phase is unknown, while in both cases the SNP association becomes progressively better at identifying the association. Our re- sults suggest that when novel genotyping and bioinformat- ics methods are available to reconstruct haplotypic phase, this will permit the emergence of a new wave of haplotypic analysis able to consider interactions between SNPs with in- creased statistical power. Copyright © 2013 S. Karger AG, Basel Received: January 9, 2013 Accepted after revision: March 26, 2013 Published online: May 7, 2013 Fotios Drenos Centre for Cardiovascular Genetics, BHF Laboratories, The Rayne Building Department of Medicine, Royal Free and University College Medical School 5 University Street, London WC1E 6JF (UK) E-Mail giekd  @  013.net © 2013 S. Karger AG, Basel 0001–5652/13/0751–0044$38.00/0 www.karger.com/hhe Downloaded by: Norris Medical Library 68.181.176.15 - 4/1/2014 10:57:38 AM

The Use of Haplotypes in the Identification of Interaction between SNPs

  • Upload
    fotios

  • View
    214

  • Download
    1

Embed Size (px)

Citation preview

Page 1: The Use of Haplotypes in the Identification of Interaction between SNPs

E-Mail [email protected]

Original Paper

Hum Hered 2013;75:44–51 DOI: 10.1159/000350964

The Use of Haplotypes in the Identification of Interaction between SNPs

Gie Ken-Dror Steve E. Humphries Fotios Drenos

Centre for Cardiovascular Genetics, BHF Laboratories, Department of Medicine, Royal Free and University College Medical School, London , UK

Introduction

Genome-wide association studies (GWAS) have had great success in identifying single nucleotide polymor-phisms (SNPs) associated with a number of common complex diseases [1, 2] . Although most of these newly identified SNPs are non-functional (i.e. they do not alter the function or expression of a protein), they can be used as surrogate markers for the unobserved functional vari-ant due to allelic association (linkage disequilibrium, LD) between adjacent SNPs. The extent of LD across a ge-nomic region dictates the density of SNP markers neces-sary to ensure the capture of the association between a marker and the causative allele sought. The average dis-tance that LD extends away from a specific SNP varies from 5 to 60 kb, with an upper range able to extend up to hundreds of kb [3–6] , usually being larger in populations of European decent, but smaller (e.g. only a few kb) in individuals from African populations [4–7] . Commonly, we see a pattern of high LD across long stretches of DNA punctuated by recombination hot-spots [3, 5, 7] . The tra-ditional GWAS analysis tests the association between the phenotype and SNPs at a locus one at a time [3, 8–10] , thus discounting information from their joint distribu-tion. If any interactions between SNPs exist, they need to have a strong marginal effect to be detected through the univariate analysis, something that we cannot a priori as-sume. This led many to believe that it is much more in-

Key Words

Haplotype · Interaction · Single nucleotide polymorphisms

Abstract

Although haplotypes can provide great insight into the com-plex relationships between functional polymorphisms at a locus, their use in modern association studies has been lim-ited. This is due to our inability to directly observe haplo-types in studies of unrelated individuals, but also to the extra complexity involved in their analysis and the difficulty in identifying which is the truly informative haplotype. Using a series of simulations, we tested a number of different models of a haplotype carrying two functional single nucleotide polymorphisms (SNPs) to assess the ability of haplotypic analysis to identify functional interactions between SNPs at the same locus. We found that, when phase is known, analy-sis of the haplotype is more powerful than analysis of the individual SNPs. The difference between the two approach-es becomes less either as an increasing number of non-infor-mative SNPs are included, or when the haplotypic phase is unknown, while in both cases the SNP association becomes progressively better at identifying the association. Our re-sults suggest that when novel genotyping and bioinformat-ics methods are available to reconstruct haplotypic phase, this will permit the emergence of a new wave of haplotypic analysis able to consider interactions between SNPs with in-creased statistical power. Copyright © 2013 S. Karger AG, Basel

Received: January 9, 2013 Accepted after revision: March 26, 2013 Published online: May 7, 2013

Fotios Drenos Centre for Cardiovascular Genetics, BHF Laboratories, The Rayne Building Department of Medicine, Royal Free and University College Medical School 5 University Street, London WC1E 6JF (UK) E-Mail giekd   @   013.net

© 2013 S. Karger AG, Basel 0001–5652/13/0751–0044$38.00/0

www.karger.com/hhe

Dow

nloa

ded

by:

Nor

ris M

edic

al L

ibra

ry

68.1

81.1

76.1

5 -

4/1/

2014

10:

57:3

8 A

M

Page 2: The Use of Haplotypes in the Identification of Interaction between SNPs

The Use of Haplotypes in the Identification of Interaction between SNPs

Hum Hered 2013;75:44–51DOI: 10.1159/000350964

45

formative to analyse multiple markers in a region of inter-est simultaneously. This is especially true when multiple polymorphisms are involved in the phenotype, and more so when there is interaction between the functional SNPs [11, 12] .

A biologically sound way to combine the information of multiple SNPs at a locus is with the use of haplotypes, groups of alleles from the same gamete, usually but not necessarily statistically associated with each other [13, 14] . Haplotypes reflect the chromosomal organisation of alleles and hence their pattern of inheritance over evolu-tion [13] . Because of shared inheritance, the total number of actual haplotypes formed by n diallelic SNPs is gener-ally much less than the 2 n that are possible [15, 16] . For a typical gene, the number of common haplotypes, even with variable numbers of SNPs, is of the order of 10–15 [16] . It has been suggested that the analysis of haplotypes can be more powerful, informative and effective than that based on single SNPs [13, 17, 18] . Haplotypes, in contrast to single SNP associations, take local LD structure into account and decrease both the dimensions and number of tests required, which under some circumstances might increase the power of the statistical tests used and help with the problem of multiple testing [15, 19] . Moreover, haplotypes can potentially capture cis-interactions be-tween two or more causal SNPs and define functional units of multiple variants; for example, the Apolipopro-tein E protein has three common allelic variants whose different functions strongly influence plasma lipid levels, these are caused by the presence of two coding SNPs which form three common haplotypes [13, 19–21] .

An important problem with the use of haplotypes in association tests is that they are not usually observed di-rectly, and precise inference of them from observed geno-types requires genetic information from both parents. Most large-scale association studies include unrelated in-dividuals, making the identification of the true haplo-types difficult. A number of statistical approaches for the reconstruction of the haplotypes are available, either based on the EM-algorithm [22] or alternative methods [23] , but their main drawback is the remaining uncer-tainty of the imputed phase.

The statistical power of haplotypes acting as the func-tional unit is under-studied. The aim of the present study is to evaluate how the statistical power to identify a func-tional haplotype is influenced by allele frequency, LD be-tween the SNPs and the effects of non-informative SNPs when considered as part of the functional haplotype. We also examine the efficiency of the classical statistical inter-action method to detect haplotypic effects between SNPs.

Methods

Genotype and Haplotype Data Using STATA 11, we developed a simulation able to create ran-

dom diallelic SNPs (major allele, A; minor allele, B) for 3,000 indi-viduals with varying minor allele frequencies (MAF) ranging from 5 to 50%. For the initial phase of our simulation, no LD between the SNPs was modelled. In this case, the haplotype was considered as a random selection from the SNPs for each individual. We sub-sequently added pair-wise LD, simulating the r 2 measure, with val-ues ranging from a low (r 2 = 0.1) to a high (r 2 = 0.9) correlation between the SNPs. To better represent association studies of unre-lated individuals, we also assumed that the haplotypic phase was unknown, and the PHASE algorithm [23, 24] was used to infer the haplotypes. The results obtained after the use of PHASE were com-pared to results based on the known phase of the haplotypes.

Phenotype Data We assumed that the functional unit is not the set of SNPs but

their combination in the haplotype, i.e. the SNPs show interaction. If we consider two SNPs, these can give rise to four possible haplo-types: A 1 A 2 , A 1 B 2 , B 1 A 2 , and B 1 B 2 , with A and B being the two al-leles for SNP 1 and 2, respectively. There are a number of ways in which the effect of a haplotype can be modelled. The additive mod-el assumes that each change from a major to a minor allele increas-es the phenotype by one unit, resulting in A 1 A 2 = 0, A 1 B 2 = 1,B 1 A 2 = 1, and B 1 B 2 = 2, which is similar to having the two SNPs working independently. The cis-recessive scheme considers that only the haplotype with the minor allele in both SNPs has an effect such that A 1 A 2 = 0, A 1 B 2 = 0, B 1 A 2 = 0, and B 1 B 2 = 1; while the cis-dominant scheme is the opposite with A 1 A 2 = 0, A 1 B 2 = 1, B 1 A 2 = 1,and B 1 B 2 = 1, where any haplotype with a minor allele influences the phenotype. The trans model assumes that only when the major allele of one SNP is combined with the minor allele of the other is an effect evident, and is coded as A 1 A 2 = 0, A 1 B 2 = 1, B 1 A 2 = 1, and B 1 B 2 = 0. Finally, a more complex haplotypic model codes the four haplotypes as A 1 A 2 = 0, A 1 B 2 = 1, B 1 A 2 = 2, and B 1 B 2 = 3.

We simulated a continuous dependent variable Y using the lin-ear regression model Y = βX + ε , where β represents the vector of coefficient effects of haplotypes on the average trait value, X is the matrix of the haplotypes and ε is the error term with a normal dis-tribution of variance σ and a mean of 0. The variance σ of the error term was set so that the test achieved a statistical power of 90% with a nominal type I error rate of 5% in our reference category for each group of comparisons.

Statistical Analysis The test of association between each genotype and the depen-

dent variable was done using a general linear model in STATA with no dominance between the two haplotypes. The test of asso-ciation for the haplotypes used indicator variables for each haplo-typic category assuming additive effects between the two haplo-types in the same individual throughout.

Results

The statistical properties of a single SNP test are well known. Generally, the statistical power of an association test between a SNP and a trait increases with increasing

Dow

nloa

ded

by:

Nor

ris M

edic

al L

ibra

ry

68.1

81.1

76.1

5 -

4/1/

2014

10:

57:3

8 A

M

Page 3: The Use of Haplotypes in the Identification of Interaction between SNPs

Ken-Dror/Humphries/Drenos

Hum Hered 2013;75:44–51DOI: 10.1159/000350964

46

MAF. This is true for almost all of the different genetic models commonly used, i.e. additive, recessive for the mi-nor allele, and the hypothesis-free model ( fig. 1 ; 5,000 rep-lications). The exception is the dominant model which reaches a plateau around a MAF of 25%. As expected, both the dominant and recessive models follow very different trajectories to the same point when the MAF is 50% and the distinction between minor and major allele is lost. It is also evident that when the additive and the hypothesis-free model are applied on the same data, the extra degree of freedom required in the general model causes a pro-nounced drop of statistical power. This difference be-tween the models is smaller when the MAF is <10% and remains almost constant for higher frequencies.

In the simplest scenario when no LD is modelled be-tween the two SNPs making up the functional haplotype, with the first SNP kept at a MAF of 30% throughout, the statistical power to identify an association between the haplotype and the trait increases with increasing MAF of the second SNP for the additive, complex, cis-recessive and trans models. As in the single SNP case, the cis-dom-inant model does not follow this pattern, with the power of the test increasing initially until a MAF of 25–30% and then decreasing as the MAF reaches 50%. In all models and with all frequencies tested, the statistical power to identify the association was higher when the haplotype was analysed compared to either of the SNPs. The first SNP with a MAF of 30% in all models remains at the same levels of statistical power for the additive and com-plex models irrespective of the MAF of the second SNP. In the cis-dominant and trans models, an increase of the MAF of the second SNP, and thus changes in the fre-quencies of the different haplotypic classes, causes a de-

crease of power for the univariate test of the first SNP, while in the cis-recessive model there is a slight increase. The power to detect the effect of the second SNP increas-es with increasing MAF for all models except the cis-re-cessive , where it remains at the significance level of 5%. The tests for interaction between the two SNPs did not provide statistically significant evidence for interaction in the additive and complex models, while very weak power was seen for the two cis models. The best evidence for interaction between the SNPs was observed in the trans model, particularly for higher SNP frequencies. All results are summarised in figure 2 and online supple-mentary figure S1 (for all online suppl. material, see www.karger.com/doi/10.1159/000350964).

In most cases, we cannot directly identify the specific SNPs that comprise the functional haplotype. To explore the effects of this, we included up to seven non-function-al additional SNPs in our previous example of the two SNP haplotype under the complex model. The extension of the haplotype with an increasing number of non-func-tional SNPs showed a marked decrease of statistical pow-er with each additional SNP. Even the inclusion of a single non-functional polymorphism in the haplotype caused a decrease of statistical power, such that the association of the dependent variable with the haplotype had lower power compared to the test of the individual SNP. When five additional SNPs where considered, the univariate analysis performed better for both of the functional SNPs compared to the haplotype. Interestingly, including SNPs of low frequency (approx. 5%) caused a smaller drop of statistical power compared to the addition of more com-mon SNPs (>10%; see fig. 3 ).

The models presented so far have not included LD be-tween the SNPs tested, but this is rather unrealistic when SNPs close together are considered. Using the example of the two-SNP haplotypes and keeping the first SNP at a MAF of 30%, we permitted the second SNP to vary its LD with the first SNP ranging from r 2 = 0.1 to r 2 = 0.9. In this case, increasing LD between the two SNPs also increased the statistical power for the association between the hap-lotype and the trait in the additive, cis-recessive and com-plex models, while it decreased power in the cis-dominant and trans models. The observed increase can be attributed to the decrease of haplotypic classes in the model and thus a decrease in the degrees of freedom considered in each case. The steep decrease of statistical power seen in the trans model can be explained by the decreasing frequency of the trans haplotype as the LD between the two SNPs is increasing. In terms of individual SNPs, the results follow the patterns seen for the independent SNP example, ex-

Fig. 1. The statistical power of single SNPs among different MAF and genetic models.

Dow

nloa

ded

by:

Nor

ris M

edic

al L

ibra

ry

68.1

81.1

76.1

5 -

4/1/

2014

10:

57:3

8 A

M

Page 4: The Use of Haplotypes in the Identification of Interaction between SNPs

The Use of Haplotypes in the Identification of Interaction between SNPs

Hum Hered 2013;75:44–51DOI: 10.1159/000350964

47

cept that now the power estimates for the two SNPs are converging as the LD between them increases. Interaction between the SNPs is again very weak except for the trans model for which the statistical power is decreasing with increasing LD due to the lower frequency of individuals having a trans configuration of the two SNPs as the LD increases ( fig. 2 , online suppl. fig. S2).

Considering again the example when additional non-functional SNPs are included in the haplotype but model-ling the LD between them, the results are similar to those seen in the simpler example earlier. As expected, each ad-ditional non-functional SNP decreases the statistical

power of the haplotype-trait association. When a small number of SNPs is added, the effect of increasing LD on the overall change of power is negligible. By contrast, when the number of non-functional SNPs is more than the number of functional SNPs, increasing LD between the SNPs leads to an increase of statistical power, proba-bly due to the decrease of haplotypic classes ( fig. 3 ).

Unless family data are available, it is unlikely that the true haplotypes can be determined unambiguously. The common procedure for unrelated individuals is to infer the probable haplotypic phases using a statistical ap-proach. Here we chose to use PHASE to reconstruct the

Fig. 2. The power of two-SNP functional haplotypes with no LD ( a ) and with LD ( b ).

Fig. 3. The power of two-SNP functional haplotypes with non-informative SNPs included with no LD ( a ) and with LD ( b ).

Dow

nloa

ded

by:

Nor

ris M

edic

al L

ibra

ry

68.1

81.1

76.1

5 -

4/1/

2014

10:

57:3

8 A

M

Page 5: The Use of Haplotypes in the Identification of Interaction between SNPs

Ken-Dror/Humphries/Drenos

Hum Hered 2013;75:44–51DOI: 10.1159/000350964

48

haplotypes and test the effect of the extra uncertainty in the statistical tests of association. Using our previous ex-ample of a two-SNP functional haplotype, with LD be-tween them varying between r 2 = 0.1 and r 2 = 0.9, it is obvious that as the LD between the two SNPs decreases, the ability of the algorithm to reconstruct the correct hap-lotypes also decreases. Although this is evident in all the models considered, the effect was more pronounced in the trans and the two cis models. The power of the statis-tical test for the individual SNPs and the interaction be-tween them was, as expected, not affected by the inference of the haplotypes ( fig. 4 , online suppl. fig. S3). Also as ex-pected, when additional non-functional SNPs are includ-ed in the inferred haplotype, the overall power of the hap-lotypic association decreases further, to the point that the univariate analysis is the best approach to identify the ef-fect on the trait of the polymorphisms if more than the two functional SNPs are tested ( fig. 4 ).

The Example of APOE The APOE gene, situated on human chromosome 19,

has three major isoforms, characterised by two amino acid substitutions at residues 112 and 158 [25] . Two SNPs, rs429358 (encoding amino acid 112) and rs7412 (encod-ing amino acid 158), define the APOE status. The LD be-tween the two SNPs is low when r 2 is considered, but higher when the D’ measure is taken into account [26] .

We used six SNPs (rs405509, rs429358, rs7412, rs439401, rs5167, and rs10413089) in the area of APOE and assessed their association with LDL-C in the White-hall II study [27–29] . As with the simulation above, we

tested each SNP in a univariate analysis as well as the in-teraction terms and haplotypes of all pairwise SNP com-binations, and a summary of the results is presented in table 1 . Three of the six SNPs tested were significantly as-sociated with LDL-C in the univariate analysis. There was no evidence for the presence of interaction between the SNPs when the interaction term was fitted in the model. Most of the haplotypes tested were significant, as expect-ed in an area of strong association, but the pair with the lowest p values and highest variance explained was the pair of the two known functional SNPs.

Discussion

Using simulated data, we tested the statistical perfor-mance of haplotype analysis compared to that of univari-ate SNP analysis plus a test of statistical interaction. Dif-ferent models of haplotypic effect were tested including simple models, similar to an additive SNP effect, as well as cis and trans interactions between the SNPs. Progres-sively more realistic conditions were simulated to better represent common practice in the analysis of haplotypes in unrelated individuals. Our simulations suggest that the statistical power to identify a complex association be-tween a pair of SNPs and a trait is higher for haplotypes than the univariate analysis or the interaction term. The inclusion of non-informative SNPs and the uncertainty over the true haplotypic phase are the main limitations for the efficient use of haplotypes in current, large-scale, genetic epidemiology data.

Fig. 4. The power of two-SNP functional haplotypes with LD and inferred haplotypes ( a ) and with non-informative SNPs ( b ).

Dow

nloa

ded

by:

Nor

ris M

edic

al L

ibra

ry

68.1

81.1

76.1

5 -

4/1/

2014

10:

57:3

8 A

M

Page 6: The Use of Haplotypes in the Identification of Interaction between SNPs

The Use of Haplotypes in the Identification of Interaction between SNPs

Hum Hered 2013;75:44–51DOI: 10.1159/000350964

49

Although the direct determination of haplotypes cur-rently is expensive and labour-intensive in unrelated in-dividuals, recent advances in sequencing and bioinfor-matics suggest that reconstruction of a single individual haplotyping is feasible [30, 31] . In this case, overlapping

DNA sequence fragments, originating from one of the two chromosomes, are phased with the help of hetero-zygous SNPs. A comparison between the haplotypesobtained from sequencing and trio-based haplotypes showed that current algorithms work well, phasing more than 90% of the SNPs with less than 2% switch error rates [32] . The resulting haplotypic fragments are of limited length and the observation of entire chromosome haplo-types is still unattainable through these tools, although experimental methods based on the microfluidic separa-tion of each chromosome have succeeded in producing chromosome-long haplotypes [33] . Despite these ad-vances, the observation of haplotypes is still more expen-sive than of the unphased whole-genome data, something that is expected to change with new developments in se-quencing technologies [34] .

Even if the true haplotypes are known, since only a subset of SNPs contains information for the haplotypic effect, inclusion of non-informative SNPs will effectively divide the sample into multiple haplotypic groups, con-sequently decreasing the power of the study, while in-creasing the degrees of freedom of the test as seen in our simulation results. Identifying the most parsimonious haplotype responsible for the association with the pheno-type can be done in two different levels. Variable selection techniques can be used to decrease the number of SNPs we consider to a set of markers that can independently contribute to the association [24, 26] . Although these methods are unlikely to capture complex haplotypic ef-fects where many SNPs are involved, each non-informa-tive SNP excluded will half the number of possible haplo-types to be considered. Alternatively, haplotypic trees, as commonly used in phylogeny, can identify the changes in the haplotype that are most likely to be associated with a change in the phenotype/genotype association [24, 35] . The main issue here is that the complexity of the tree in-creases very fast with an increasing number of SNPs. A combination of the two levels of selection is also possible and easy to implement [24] .

Our study has a number of limitations. To avoid un-necessary complexity, we limited our simulation to two-SNP functional haplotypes and used the APOE gene as an example of such a case. More functional SNPs in larger haplotypes with more complex interactions between them are also possible. Multi-marker tests have not been considered, as this would overlap with the work by Roe-der et al. [36] and Rakovski et al. [37] . Generally, as was also apparent in the use of phased haplotypes, these meth-ods become less useful as the number of SNPs considered increases [38] .

Table 1. Association between six SNPs (rs405509, rs429358, rs7412, rs439401, rs5167, and rs10413089) in the area of APOE and LDL-C in the Whitehall II study

p value r2 AIC BIC

SNPs rs405509 7.0E-04 0.23 46514 46527 rs429358 1.7E-16 1.46 46458 46471 ApoE rs7412 1.2E-51 4.86 46297 46310 ApoE rs439401 0.050 0.06 46522 46535 rs5167 0.092 0.04 46523 46536 rs10413089 0.099 0.04 46523 46536

Interaction-SNPs rs405509*rs429358 0.914 1.38 46468 46526 rs405509*rs7412 0.434 4.85 46303 46348 rs405509*rs439401 0.049 0.79 46496 46553 rs405509*rs5167 0.762 0.20 46522 46580 rs405509*rs10413089 0.117 0.38 46514 46572 rs429358*rs7412 0.436 5.75 46258 46297 ApoE rs429358*rs439401 0.943 1.36 46468 46520 rs429358*rs5167 0.611 1.50 46463 46521 rs429358*rs10413089 0.682 1.46 46465 46522 rs7412*rs439401 0.603 5.41 46276 46321 rs7412*rs5167 0.109 4.96 46299 46357 rs7412*rs10413089 0.376 4.86 46304 46362 rs439401*rs5167 0.424 0.11 46527 46585 rs439401*rs10413089 0.955 0.00 46532 46590 rs5167*rs10413089 0.257 0.14 46525 46583

Haplotype H_1_2 1.4E-16 0.71 105040 105069 H_1_3 5.1E-54 2.37 104866 104895 H_1_4 2.0E-09 0.39 105073 105102 H_1_5 0.010 0.08 105105 105134 H_1_6 2.2E-04 0.16 105097 105126 H_2_3 2.4E-65 2.82 104817 104839 ApoE H_2_4 1.2E-16 0.72 105039 105068 H_2_5 5.3E-19 0.82 105028 105057 H_2_6 2.4E-18 0.79 105032 105061 H_3_4 3.8E-58 2.55 104847 104876 H_3_5 1.1E-53 2.35 104868 104897 H_3_6 1.3E-53 2.35 104868 104897 H_4_5 0.046 0.05 105109 105138 H_4_6 0.195 0.02 105112 105141 H_5_6 0.048 0.05 105109 105138

AIC = Akaike’s information criterion; BIC = Bayesian informa-tion criterion.

Dow

nloa

ded

by:

Nor

ris M

edic

al L

ibra

ry

68.1

81.1

76.1

5 -

4/1/

2014

10:

57:3

8 A

M

Page 7: The Use of Haplotypes in the Identification of Interaction between SNPs

Ken-Dror/Humphries/Drenos

Hum Hered 2013;75:44–51DOI: 10.1159/000350964

50

In summary, haplotypes can be more informative than individual SNPs or interaction terms when interac-tion is present. The inclusion of non-informative SNPs and the uncertainty of phase in unrelated individuals are the two main problems hindering their use. Technology development will soon permit the direct observation of haplotypes, possibly through entire chromosomes and genome. Statistical methods of SNP selection can help to focus on only the most relevant SNPs, but the problem will persist as the number of SNPs and the length of the

haplotypes considered increases. Novel statistical meth-ods will probably be required to reach a point where anal-yses of a phased diploid genome can be achieved.

Acknowledgements

The British Heart Foundation supports F.D. (PG2005/008). S.E.H. is a British Heart Foundation Chair holder.

References

1 Dudbridge F, Gusnanto A, Koeleman BP: De-tecting multiple associations in genome-wide studies. Hum Genomics 2006; 2: 310–317.

2 Ku CS, Loy EY, Pawitan Y, Chia KS: The pur-suit of genome-wide association studies: where are we now? J Hum Genet 2010; 55: 195–206.

3 Avi-Itzhak HI, Su X, De La Vega FM: Selec-tion of minimum subsets of single nucleotide polymorphisms to capture haplotype block diversity. Pac Symp Biocomput 2003; 466–477.

4 Abecasis GR, Noguchi E, Heinzmann A, Tra-herne JA, Bhattacharyya S, Leaves NI, Ander-son GG, Zhang Y, Lench NJ, Carey A, Cardon LR, Moffatt MF, Cookson WO: Extent and distribution of linkage disequilibrium in three genomic regions. Am J Hum Genet 2001; 68: 191–197.

5 Daly MJ, Rioux JD, Schaffner SF, Hudson TJ, Lander ES: High-resolution haplotype struc-ture in the human genome. Nat Genet 2001; 29: 229–232.

6 Reich DE, Cargill M, Bolk S, Ireland J, Sabeti PC, Richter DJ, Lavery T, Kouyoumjian R, Farhadian SF, Ward R, Lander ES: Linkage disequilibrium in the human genome. Nature 2001; 411: 199–204.

7 Gabriel SB, Schaffner SF, Nguyen H, Moore JM, Roy J, Blumenstiel B, Higgins J, DeFelice M, Lochner A, Faggart M, Liu-Cordero SN, Rotimi C, Adeyemo A, Cooper R, Ward R, Lander ES, Daly MJ, Altshuler D: The struc-ture of haplotype blocks in the human ge-nome. Science 2002; 296: 2225–2229.

8 Schaid DJ: Evaluating associations of haplo-types with traits. Genet Epidemiol 2004; 27: 348–364.

9 Zhao H, Pfeiffer R, Gail MH: Haplotype anal-ysis in population genetics and association studies. Pharmacogenomics 2003; 4: 171–178.

10 Cordell HJ, Clayton DG: A unified stepwise regression procedure for evaluating the rela-tive effects of polymorphisms within a gene using case/control or family data: application to HLA in type 1 diabetes. Am J Hum Genet 2002; 70: 124–141.

11 Waldron ER, Whittaker JC, Balding DJ: Fine mapping of disease genes via haplotype clus-tering. Genet Epidemiol 2006; 30: 170–179.

12 Balding DJ: A tutorial on statistical methods for population association studies. Nat Rev Genet 2006; 7: 781–791.

13 Liu N, Zhang K, Zhao H: Haplotype-associa-tion analysis. Adv Genet 2008; 60: 335–405.

14 Salem RM, Wessel J, Schork NJ: A compre-hensive literature review of haplotyping soft-ware and methods for use with unrelated in-dividuals. Hum Genomics 2005; 2: 39–66.

15 Yang Y, Li SS, Chien JW, Andriesen J, Zhao LP: A systematic search for SNPs/haplotypes associated with disease phenotypes using a haplotype-based stepwise procedure. BMC Genet 2008; 9: 90.

16 Stephens JC, Schneider JA, Tanguay DA, Choi J, Acharya T, Stanley SE, Jiang R, Messer CJ, Chew A, Han JH, Duan J, Carr JL, Lee MS, Koshy B, Kumar AM, Zhang G, Newell WR, Windemuth A, Xu C, Kalbfleisch TS, Shaner SL, Arnold K, Schulz V, Drysdale CM, Nan-dabalan K, Judson RS, Ruano G, Vovis GF: Haplotype variation and linkage disequilib-rium in 313 human genes. Science 2001; 293: 489–493.

17 Akey J, Jin L, Xiong M: Haplotypes versus sin-gle marker linkage disequilibrium tests: what do we gain? Eur J Hum Genet 2001; 9: 291–300.

18 Morris RW, Kaplan NL: On the advantage of haplotype analysis in the presence of multiple disease susceptibility alleles. Genet Epidemiol 2002; 23: 221–233.

19 Clark AG: The role of haplotypes in candidate gene studies. Genet Epidemiol 2004; 27: 321–333.

20 Fullerton SM, Clark AG, Weiss KM, Nicker-son DA, Taylor SL, Stengard JH, Salomaa V, Vartiainen E, Perola M, Boerwinkle E, Sing CF: Apolipoprotein E variation at the se-quence haplotype level: implications for the origin and maintenance of a major human polymorphism. Am J Hum Genet 2000; 67: 881–900.

21 Schaid DJ, Rowland CM, Tines DE, Jacobson RM, Poland GA: Score tests for association between traits and haplotypes when linkage phase is ambiguous. Am J Hum Genet 2002; 70: 425–434.

22 Excoffier L, Slatkin M: Maximum-likelihood estimation of molecular haplotype frequen-cies in a diploid population. Mol Biol Evol 1995; 12: 921–927.

23 Stephens M, Donnelly P: A comparison of bayesian methods for haplotype reconstruc-tion from population genotype data. Am J Hum Genet 2003; 73: 1162–1169.

24 Smith AJ, Palmen J, Putt W, Talmud PJ, Humphries SE, Drenos F: Application of sta-tistical and functional methodologies for the investigation of genetic determinants of coro-nary heart disease biomarkers: lipoprotein li-pase genotype and plasma triglycerides as an exemplar. Hum Mol Genet 2010; 19: 3936–3947.

25 Rall SC Jr, Weisgraber KH, Mahley RW: Hu-man apolipoprotein E. The complete amino acid sequence. J Biol Chem 1982; 257: 4171–4178.

26 Ken-Dror G, Talmud PJ, Humphries SE, Dre-nos F: APOE/C1/C4/C2 gene cluster geno-types, haplotypes and lipid levels in prospec-tive coronary heart disease risk among UK healthy men. Mol Med 2010; 16: 389–399.

27 Talmud PJ, Drenos F, Shah S, Shah T, Palmen J, Verzilli C, Gaunt TR, Pallas J, Lovering R, Li K, Casas JP, Sofat R, Kumari M, Rodriguez S, Johnson T, Newhouse SJ, Dominiczak A, Samani NJ, Caulfield M, Sever P, Stanton A, Shields DC, Padmanabhan S, Melander O, Hastie C, Delles C, Ebrahim S, Marmot MG, Smith GD, Lawlor DA, Munroe PB, Day IN, Kivimaki M, Whittaker J, Humphries SE, Hingorani AD: Gene-centric association sig-nals for lipids and apolipoproteins identified via the HumanCVD BeadChip. Am J Hum Genet 2009; 85: 628–642.

28 Marmot M, Brunner E: Cohort Profile: the Whitehall II study. Int J Epidemiol 2005; 34: 251–256.

Dow

nloa

ded

by:

Nor

ris M

edic

al L

ibra

ry

68.1

81.1

76.1

5 -

4/1/

2014

10:

57:3

8 A

M

Page 8: The Use of Haplotypes in the Identification of Interaction between SNPs

The Use of Haplotypes in the Identification of Interaction between SNPs

Hum Hered 2013;75:44–51DOI: 10.1159/000350964

51

29 Zhao JH, Brunner EJ, Kumari M, Singh-Ma-noux A, Hawe E, Talmud PJ, Marmot MG, Humphries SE: APOE polymorphism, socio-economic status and cognitive function in mid-life – the Whitehall II longitudinal study. Soc Psychiatry Psychiatr Epidemiol 2005; 40: 557–563.

30 Levy S, Sutton G, Ng PC, Feuk L, Halpern AL, Walenz BP, Axelrod N, Huang J, Kirkness EF, Denisov G, Lin Y, MacDonald JR, Pang AW, Shago M, Stockwell TB, Tsiamouri A, Bafna V, Bansal V, Kravitz SA, Busam DA, Beeson KY, McIntosh TC, Remington KA, Abril JF, Gill J, Borman J, Rogers YH, Frazier ME, Scherer SW, Strausberg RL, Venter JC: The diploid genome sequence of an individual hu-man. PLoS Biol 2007; 5:e254.

31 Bansal V, Bafna V: HapCUT: an efficient and accurate algorithm for the haplotype assem-bly problem. Bioinformatics 2008; 24:i153–i159.

32 Duitama J, McEwen GK, Huebsch T, Pal-czewski S, Schulz S, Verstrepen K, Suk EK, Hoehe MR: Fosmid-based whole genome haplotyping of a HapMap trio child: evalua-tion of Single Individual Haplotyping tech-niques. Nucleic Acids Res 2012; 40: 2041–2053.

33 Fan HC, Wang J, Potanina A, Quake SR: Whole-genome molecular haplotyping of sin-gle cells. Nat Biotechnol 2011; 29: 51–57.

34 Browning SR, Browning BL: Haplotype phas-ing: existing methods and new developments. Nat Rev Genet 2011; 12: 703–714.

35 Templeton AR, Maxwell T, Posada D, Sten-gard JH, Boerwinkle E, Sing CF: Tree scan-ning: a method for using haplotype trees in phenotype/genotype association studies. Ge-netics 2005; 169: 441–453.

36 Roeder K, Bacanu SA, Sonpar V, Zhang X, Devlin B: Analysis of single-locus tests to de-tect gene/disease associations. Genet Epide-miol 2005; 28: 207–219.

37 Rakovski CS, Xu X, Lazarus R, Blacker D, Laird NM: A new multimarker test for family-based association studies. Genet Epidemiol 2007; 31: 9–17.

38 Laird NM, Lange C: The Fundamentals of Modern Statistical Genetics. New York, Springer Science, 2011.

Dow

nloa

ded

by:

Nor

ris M

edic

al L

ibra

ry

68.1

81.1

76.1

5 -

4/1/

2014

10:

57:3

8 A

M