6
BioMed Central Page 1 of 6 (page number not for citation purposes) BMC Genetics Open Access Methodology article A pragmatic suggestion for dealing with results for candidate genes obtained from genome wide association studies David Curtis* 1 , Anna E Vine 1 and Jo Knight 2 Address: 1 Centre for Psychiatry, Queen Mary's School of Medicine and Dentistry, London E1 1BB, UK and 2 Social Genetic & Developmental Psychiatry Centre, Institute of Psychiatry, De Crespigny Park, London SE5 8AF, UK Email: David Curtis* - [email protected]; Anna E Vine - [email protected]; Jo Knight - [email protected] * Corresponding author Abstract Background: Researchers may embark on a genome-wide association study before fully investigating candidate regions which have been reported to produce evidence to suggest that they harbour susceptibility loci. If the genome wide study had not been carried out then results which demonstrated only modest statistical significance from candidate regions would be judged to be of interest and would stimulate further investigation. However if hundreds of thousands of markers are typed then inevitably very large numbers of such results will occur by chance and those from candidate regions may attract no special attention. Results: An approach is proposed in which differential treatment is afforded to markers from candidate regions and from those that are routinely typed in the context of a genome wide scan. Different prior probabilities are assigned to the two types of marker. A likelihood ratio is derived from the reported p value for each marker, calculated as LR = e chiinv(1,p)/2 , and the posterior odds in favour of a true positive association are obtained. These odds can be used to rank the markers with a view to suggesting the regions in which further genotyping is indicated. We suggest that prior probabilities be specified such that a candidate marker significant at p = 0.01 and a routine marker significant at p = 0.00001 will yield similar values for the posterior odds. We show that this can be achieved by setting a value for prior probability of association to 0.1 for candidate markers and to 0.00018 for routine markers. Conclusion: It is essential that formal procedures be adopted in order to avoid modestly positively results from candidate regions being swamped by the huge number of nominally significant results which will be obtained when very many markers are genotyped. Software to carry out the conversion from p values to posterior odds is available from http://www.mds.qmul.ac.uk/ statgen/grpsoft.html . Background The ability to carry out so-called genome wide association studies using a standard panel of single nucleotide poly- morphisms (SNPs) presents obvious difficulties. For many diseases it will be the case that particular genes have already been implicated as possibly or probably being involved. Typically, there will be some positive and some negative association studies, some groups will report pos- itive results with particular markers or haplotypes while others will report negative results with those markers but Published: 10 May 2007 BMC Genetics 2007, 8:20 doi:10.1186/1471-2156-8-20 Received: 4 December 2006 Accepted: 10 May 2007 This article is available from: http://www.biomedcentral.com/1471-2156/8/20 © 2007 Curtis et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

A pragmatic suggestion for dealing with results for candidate genes obtained from genome wide association studies

Embed Size (px)

Citation preview

Page 1: A pragmatic suggestion for dealing with results for candidate genes obtained from genome wide association studies

BioMed CentralBMC Genetics

ss

Open AcceMethodology articleA pragmatic suggestion for dealing with results for candidate genes obtained from genome wide association studiesDavid Curtis*1, Anna E Vine1 and Jo Knight2

Address: 1Centre for Psychiatry, Queen Mary's School of Medicine and Dentistry, London E1 1BB, UK and 2Social Genetic & Developmental Psychiatry Centre, Institute of Psychiatry, De Crespigny Park, London SE5 8AF, UK

Email: David Curtis* - [email protected]; Anna E Vine - [email protected]; Jo Knight - [email protected]

* Corresponding author

AbstractBackground: Researchers may embark on a genome-wide association study before fullyinvestigating candidate regions which have been reported to produce evidence to suggest that theyharbour susceptibility loci. If the genome wide study had not been carried out then results whichdemonstrated only modest statistical significance from candidate regions would be judged to be ofinterest and would stimulate further investigation. However if hundreds of thousands of markersare typed then inevitably very large numbers of such results will occur by chance and those fromcandidate regions may attract no special attention.

Results: An approach is proposed in which differential treatment is afforded to markers fromcandidate regions and from those that are routinely typed in the context of a genome wide scan.Different prior probabilities are assigned to the two types of marker. A likelihood ratio is derivedfrom the reported p value for each marker, calculated as LR = echiinv(1,p)/2, and the posterior oddsin favour of a true positive association are obtained. These odds can be used to rank the markerswith a view to suggesting the regions in which further genotyping is indicated. We suggest that priorprobabilities be specified such that a candidate marker significant at p = 0.01 and a routine markersignificant at p = 0.00001 will yield similar values for the posterior odds. We show that this can beachieved by setting a value for prior probability of association to 0.1 for candidate markers and to0.00018 for routine markers.

Conclusion: It is essential that formal procedures be adopted in order to avoid modestlypositively results from candidate regions being swamped by the huge number of nominallysignificant results which will be obtained when very many markers are genotyped. Software to carryout the conversion from p values to posterior odds is available from http://www.mds.qmul.ac.uk/statgen/grpsoft.html.

BackgroundThe ability to carry out so-called genome wide associationstudies using a standard panel of single nucleotide poly-morphisms (SNPs) presents obvious difficulties. Formany diseases it will be the case that particular genes have

already been implicated as possibly or probably beinginvolved. Typically, there will be some positive and somenegative association studies, some groups will report pos-itive results with particular markers or haplotypes whileothers will report negative results with those markers but

Published: 10 May 2007

BMC Genetics 2007, 8:20 doi:10.1186/1471-2156-8-20

Received: 4 December 2006Accepted: 10 May 2007

This article is available from: http://www.biomedcentral.com/1471-2156/8/20

© 2007 Curtis et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Page 1 of 6(page number not for citation purposes)

Page 2: A pragmatic suggestion for dealing with results for candidate genes obtained from genome wide association studies

BMC Genetics 2007, 8:20 http://www.biomedcentral.com/1471-2156/8/20

positive results from some other markers nearby and athird set of groups will report positive results with a differ-ent, though related, phenotype. There may be general con-sensus that the gene is worthy of further investigation. Inan ideal world groups possessing appropriate datasetswould test additional polymorphisms within the gene inan attempt to provide some definitive answer as towhether or not it influences susceptibility. However, theworld is not ideal and instead what may happen is thatgroups who have yet to fully investigate the gene may sub-mit their datasets for genotyping with a standard, verylarge, set of SNPs which will include a few markers locatedin the gene in question along with a few hundred thou-sand which are not. The problem is that markers in thecandidate gene may produce relatively modest evidence infavour of association, perhaps at the level of p < 0.01 or so,even with hundreds of subjects. If one were to study a can-didate gene on its own and obtain results of this natureone might declare them of interest and as supporting fur-ther investigation. However if a genome wide scan of 500K markers is performed then one can expect 5,000 to besignificant at p < 0.01 and the evidence supporting thecandidate gene will look very unimpressive indeed.

We should emphasise at this point that a marker having areal, but modest, effect is not expected to produce asmaller p value than markers producing apparently signif-icant results by chance. If the expected significance, basedon effect size and sample size, for a given truly associatedmarker is 0.01 then the p value actually obtained willprobably be fairly close to this value. However among 500K markers one may expect 50 or so to be significant at p <0.0001 by chance and there is no reason to suppose thatany of these will represent a true positive effect. If there areparticular markers which are incorporated in the genomescan which do happen to be strongly associated with thedisease then it is true that they may produce very highlysignificant p values. However our concern is that markerswhich are truly associated to some degree may yield onlymodest p values [1,2] and that their effect may beswamped if they are considered alongside the largenumber of other markers which are genotyped.

We propose that formal methods should be used for treat-ing markers in candidate regions which have been previ-ously implicated in a different fashion from the manythousands of SNPs spread across the genome which willbe routinely typed in the course of a genome scan. Oneapproach which would be theoretically sound would beto declare in advance a set of genes, and hence markers,which one regarded as being worthy of special considera-tion and to go on to analyse them first, before giving anyconsideration to those routinely genotyped. One could goon to publish the results from these analyses before ana-lysing the other markers. However we believe that in prac-

tice it would be difficult to implement this approach.Typically, when a genome scan is performed all markersare analysed at once. It would be difficult to persuadereaders and reviewers that special attention should bepaid to certain markers giving results at p < 0.01 when onehas the results for 5,000 more which are just as significant.

One approach which is relevant in this context is to esti-mate the false positive report probability (FPRP) [3]. Thisaims to use the prior probability for a marker to be asso-ciated in order to come up with a threshold p value suchthat markers achieving that threshold will have less thanthe declared probability of being false positive. Guidelineshave been proposed in which candidate genes andgenome screens can be treated differently when the FPRPapproach is applied in order to satisfy different criteria fora finding to be "considered noteworthy" [4]. Howeverthese guidelines did not explicitly tackle the issue of deal-ing with candidate gene polymorphisms genotypedwithin the context of a genome wide association study. Analternative suggestion was to carry out weighting using theresults of previous linkage studies [5]. This used a quanti-tative score in favour of linkage as a weighting factor,rather than declaring particular genes or markers as beinga priori of interest.

We believe that the FPRP approach is not quite correctfrom a conceptual point of view when assessing the resultsof genome wide association studies. One reason is thatsuch a study must always be regarded as an intermediatestep. This is because the polymorphisms which are geno-typed have been selected on the grounds of being appro-priately spaced or because they tag other markers but theyare not themselves expected to directly influence suscepti-bility to a disease. Rather, it is hoped that a genotypedmarker showing association with the phenotype may bein linkage disequilibrium with a polymorphism whichdoes influence susceptibility. Thus a positive findingshould lead on to further genotyping of other polymor-phisms in the region. In this situation, there is no point infretting over what the probability is that one has a "note-worthy" finding. Once the scan has been performed onewill inevitably want to go on to perform further genotyp-ing and the question will be which regions appear themost promising to pursue. Thus the aim of the scan is toassist in the ranking of regions for further genotypingrather than to come up with any definitive answers as towhether particular regions are implicated or not. Oncereasonable attempts have been made to type all availablepolymorphisms within a region then one can address thequestion of the overall strength of evidence in favour ofassociation. One can then make a judgement as towhether it is worth proceeding to more demanding inves-tigations such as functional studies. At this point one mayfeel that a method which explicitly aims to quantify the

Page 2 of 6(page number not for citation purposes)

Page 3: A pragmatic suggestion for dealing with results for candidate genes obtained from genome wide association studies

BMC Genetics 2007, 8:20 http://www.biomedcentral.com/1471-2156/8/20

probability that a finding represents a true positive is ofvalue. However we do not believe that such judgementsare necessary when the question is not whether, butwhere, to perform further genotyping. Additional disad-vantages of the FPRP approach are that one needs todeclare the power of each test, that is the probability thata marker near a susceptibility locus will support associa-tion at a certain significance level, and that one mustdeclare a plausible prior probability for each marker beingtruly associated. It is doubtful whether either of these val-ues can be realistically quantified but the application ofFPRP seems to yield an artificially concrete estimate of aprobability which is in reality quite uncertain.

An alternative approach using a Bayesian strategy hasbeen proposed which uses the prior probability for a testto detect association in order to weight the p valueobtained [6]. The weighting scheme given as an exampleuses the number of potentially pathogenic polymor-phisms detected by each tagging marker or group of mark-ers to accord more weight to more informative markersalthough other weighting schemes are suggested, forexample considering whether or not an SNP is coding orlies under a linkage peak. Deriving the necessary weightsis mathematically relatively complex compared to thesimple scheme we describe below.

Here we propose a related method which pragmaticallyseeks to rank results in a comparative way rather than tocome up with absolute judgements as to whether theyreflect true or false positive results. It also explicitly aimsto afford special treatment for already implicated candi-date genes within the context of a genome wide associa-tion study. Although it utilises "prior probabilities" forassociation of markers within candidate genes and of rou-tinely typed markers it is recognised that these probabili-ties are essentially arbitrary and they are used mainly as away of distinguishing the two sets of markers. It does notrequire any assumptions about the genetic effect size orpower to detect association.

ResultsThe procedure we arrive at consists of a number of stages.The first, and perhaps most problematic, step is to dividethe markers into those which are candidate markers andthose which are routinely genotyped. Candidate markersmust either all be declared in advance or must be defina-ble by some explicit rule. They must not have been alreadytyped in the current sample. (Markers which have notbeen previously typed but which are in linkage disequilib-rium with markers which have produced positive resultsin the current sample must be treated as a special case.)The authors reporting the study must report the basis onwhich they have declared markers as candidates. An exam-ple of a rule for defining a candidate marker might be to

say that one will include all markers within 200 kb of agene which has shown evidence from either three studiesat p < 0.01, two at 0.001 or one at 0.0001 and for whichthe ratio of negative to positive association studies doesnot exceed 2:1. We believe that such a rule might reason-ably reflect how association studies are commonly inter-preted but we emphasise that any rule can be chosen, aslong as it is explicit. Of course, one could set up hierar-chies of candidate regions, whereby some were regardedas more strongly supported than others, but given the dif-ficulty in making firm judgements about the weight of evi-dence any particular study provides we doubt that suchcomplex schemes would be justified.

Next, appropriate values for the prior probabilities, andhence odds, of association for candidate and routinelytyped markers should be chosen. From the argument setout in the Methods section we would suggest prior proba-bilities of PPrCAND = 0.1 and PPrROUT = 0.00018 but obvi-ously these values are to a large extent arbitrary.

Finally, for each marker and the p value we obtain for it wewrite:

OPo = OPr* echiinv(p,1)/2

We can use either OPrCAND = PPrCAND/(1-PPrCAND) orOPrROUT = PPrROUT/(1-PPrROUT) as the prior odds depend-ing on how we have categorised the marker and we canthen go ahead to rank all results according to the posteriorodds of association. If we wish to obtain the posteriorprobability rather than odds we can write ProbPo = OPo/(OPo+1), which for the example above is equal to 0.755for either the candidate marker significant at 0.01 or theroutine marker significant at 0.00001. However we againemphasise that we do not regard the posterior odds orprobability as representing an absolute value but rather asacting as a means to rank the results from different mark-ers in order to guide further investigation.

ConclusionThe main emphasis of our proposal is that approaches areadopted which provide differential treatment to markersin candidate genes and those which are routinely geno-typed within a genome wide association study. Althoughwe do not think the differences are critically important,our procedure does involve a slightly different emphasisfrom the FPRP approach. In order to obtain a value for theFPRP one must declare the power of the test to detect asso-ciation. We believe it is fair to say that in most circum-stances, at least within the context of markers typedroutinely in the context of a genome scan, one can saynothing about the minimum value for the true geneticeffect. Yet if the power of the test is declared to be higherthan it actually is then the FPRP value will be too low.

Page 3 of 6(page number not for citation purposes)

Page 4: A pragmatic suggestion for dealing with results for candidate genes obtained from genome wide association studies

BMC Genetics 2007, 8:20 http://www.biomedcentral.com/1471-2156/8/20

Ultimately, if the true genetic effect is minimal and thepower to detect it is close to zero then the actual probabil-ity that one has a false positive finding will be close to one– all positives will be false positives. Yet a value for theFPRP based on some over-optimistic assessment of powermay be quite small. We believe that in practice the valuecalculated for the FPRP may well not reflect the true prob-ability that a finding is a false positive.

The alternative approach which we suggest does notrequire that any value for the power be specified. Rather,our method explicitly use a "best case" scenario. Ratherthan take the likelihood for the observations conditionalon a pre-specified alternative hypothesis, our approachconsiders the alternative hypothesis which provides thebest fit to the data, the maximum likelihood hypothesis.The criticism of this approach would be that it does notrepresent a true Bayesian scenario in which explicit state-ments are made about the prior probabilities of alterna-tive hypotheses and the associated likelihoods of theobserved data. The advantage of the approach is that it isnot sensitive to any user-defined values regarding thegenetic model, linkage disequilibrium parameters, samplesize and associated power. It always explicitly deals withthe alternative hypothesis which, post hoc, is determinedto best fit the data. We have emphasised throughout thatwe would not regard the posterior odds or probabilityarising from this approach as truly reflecting the actualchances that a locus is involved in susceptibility. Webelieve that the FPRP value is in more danger of such a lit-eral interpretation. In order to avoid such a literal inter-pretation we suggest that in applying our approach it maybe preferable to quote the posterior "odds in favour ofassociation" rather than "probability of association". Thatsaid, we do offer another approach to interpreting the pos-terior odds or probability obtained from our procedure.We suggest that it might be explicitly recognised as repre-senting a best case scenario. Thus, if we were obtain a pos-terior probability of association of 0.75 we might make astatement along the lines of "the probability that thisresult reflects a true positive association is no more than0.75". Arguing along these lines, we could say that theFPRP approach seeks to declare an estimate for the proba-bility that a result is a false positive whereas ours seeks todeclare a minimum probability for a result being a falsepositive, with fewer prior assumptions. We emphasisethat we do not regard this distinction as critical.

What we do regard as being of critical importance is tohave some formal, public system for treating results fromcandidate regions differently from routinely typed mark-ers within the context of whole genome scans. Whetherour method or FPRP or some related procedure is used, itis essential that markers from candidate regions can betreated as special cases. If this is not done then important

findings from such markers will be swamped. Large sam-ples will typically have been collected over a number ofyears at considerable effort. Once they have been geno-typed for hundreds of thousands of markers they will beat risk as being regarded as "burned out". Any subsequentstudies carried out on these samples yielding modest pvalues will be overshadowed by the huge numbers of gen-otypes already obtained. There is a sense that researcherssubmitting their painstakingly acquired datasets forgenome-wide studies are sleep-walking into a desolatespace from which nothing will emerge but a set of p valuesconforming pretty much to chance expectation. Theapproach we propose aims to mitigate some of the worsteffects of this.

We note that our proposed procedure will do nothing tohighlight the importance of true positive markers whichyield modest p values but which are not yet regarded ascandidates. Once genome scans have been completed andtheir results reported it will become necessary to reviewfindings retrospectively from markers which "becomeinteresting" after the scan is complete.

Doubtless these issues will be hotly debated, but we hopethat the scheme we propose may represent a useful contri-bution to this debate and we look forward to the issuesbeing progressed further.

MethodsThe situation we envisage is as follows. We have a datasetsuitable for association studies containing perhaps severalhundred cases and controls. We are about to submit thesample for a genome wide association scan using, say, 500K SNPs. For the phenotype under consideration there exista few genes for which there is quite strong, but not com-pletely compelling, evidence for association. Alterna-tively, there may be genes which are very strongcandidates for other reasons such as the results of expres-sion studies or pathway analysis. If we were not perform-ing the genome wide scan we would be typing additionalpolymorphisms in these genes. As it is, there are some 40–50 SNPs in these genes which will be typed in the genomewide study which have not already been typed. If we weretyping these SNPs in isolation then we would take a resultsignificant at p < 0.01 as being quite interesting and assupporting further investigation of this gene, for exampleby finding and typing additional polymorphisms withinit. We realise that there are many theoretical problemswith taking such an approach but we believe that it rea-sonably reflects the interpretation which in practice manyinvestigators apply when assessing the results of associa-tion studies. We know that we expect 5 of the 500 K mark-ers to be significant at p < 0.00001 purely by chance. It isalso possible that some highly significant results fromroutinely typed markers may represent true genetic effects

Page 4 of 6(page number not for citation purposes)

Page 5: A pragmatic suggestion for dealing with results for candidate genes obtained from genome wide association studies

BMC Genetics 2007, 8:20 http://www.biomedcentral.com/1471-2156/8/20

from hitherto unsuspected regions. We propose that wewould like to achieve a scheme of ranking results such thata p value of 0.01 from a candidate marker will be consid-ered similarly to a p value of 0.00001 from a routinelytyped marker in terms of providing an indication of wherefurther efforts may directed. We recognise that some gen-uinely associated markers may produce even smaller p val-ues than this and they will receive favourableconsideration whether or not they are recognised inadvance as occurring in candidate genes.

All of this fits very comfortably into a Bayesian frame-work. Taking a relatively sceptical view we can say that if agene has been the subject of a few positive and some neg-ative association studies then the probability that thatgene really does influence susceptibility and that a previ-ously untyped polymorphism within it demonstratesassociation might be in the region of 0.1. Then we wouldsay that the prior odds for a true positive association are0.1/0.9. Then we could we carry out our genome wideassociation study and obtain the likelihood ratio for theresults for a candidate polymorphism to be observedassuming association compared with the null hypothesisof no association. To apply a true Bayesian approach wewould need to be able to calculate a Bayes factor consist-ing of the ratio of the cumulative probability distributionsfor the observed results under the alternative and nullhypotheses. Instead, we propose a modified approachwhich uses instead the likelihood ratio obtained in thecourse of carrying out a likelihood ratio test for heteroge-neity of allele frequencies between cases and controls.This test uses the two likelihoods maximised over allelefrequencies under the alternative and null hypotheses. Weuse these maximised likelihoods as proxies for the distri-bution of likelihoods which might be obtained over theuniverse of alternative and null hypotheses and whichwould be needed in a true Bayesian approach. Multiplyingthis likelihood ratio by the prior odds provides an indica-tion of the posterior odds for the polymorphism to beassociated with the disease. We can do the same thing fora routinely typed polymorphism but assign it a lowerprior probability of being truly associated. Writing OPrand OPo for prior and posterior odds and CAND andROUT for a candidate or routine polymorphism, we have:

OPoCAND = OPrCAND*LRCAND

OPoROUT = OPrROUT*LRROUT

Our task would then be to select suitable prior odds forassociation for a routinely typed marker such that a candi-date marker with a likelihood ratio yielding a p value of0.01 would yield similar posterior odds to a routinemarker yielding a p value of 0.00001.

It is helpful to realise that, if one considers allele-wise testsfor association with biallelic markers, there is a simplerelationship between the p value and the likelihood ratio.Although in practice the test for association used may con-sist of a Pearson chi-squared statistic derived from a 2 × 2contingency table it is safe to assume that this will be verysimilar to a likelihood ratio statistic calculated as2*ln(LR) formulated as test for heterogeneity of allele fre-quencies between cases and controls. In the context of agenome wide association study this obviates the need togo through all the genotypings recalculating likelihoodratios because we can instead use the p values output fromthe analysis and convert each into the associated likeli-hood ratio. We can write p = chidist(x,d) to indicate the pvalue associated with a chi-squared statistic of x having ddegrees of freedom and likewise we can write the inversefunction x = chiinv(p,d) to indicate the chi-squared statis-tic, x, which would yield the stated p value. We can use thisto select appropriate values for the prior odds for candi-date markers and routine markers by considering the pvalues which we wish to make equivalent to each other.The allele-wise test for association yields a chi-squared testwith one degree of freedom, so that we can state that a sig-nificance value of p = 0.01 from a candidate marker isequivalent to a chi-squared of chiinv(0.01,1) = 6.65 and alikelihood ratio of e6.65/2 = 27.8, while a significance valueof p = 0.00001 from a routine marker is equivalent to achi-squared of chiinv(0.00001,1) = 19.5 and a likelihoodratio of e19.5/2 = 17150. If our aim is to make both sets ofresults yield similar posterior odds for association, and ifwe assume prior odds for the candidate marker of 0.1/0.9then we can write:

OPrROUT = OPrCAND * LRCAND/LRROUT

= 0.1/0.9 * 27.8/17150

= 0.00018

As in this case the odds are small they are practically equalto the probability, so using this scheme we would bedeclaring that a marker routinely genotyped as part of agenome wide association study has a prior probability of0.00018, or approximately 1 in 5,000, of being truly asso-ciated. If one considers that there are 25,000 or so humangenes and that several may be involved in the susceptibil-ity to a particular disease then this estimate may not beterribly wide of the mark.

Availability and requirementsSoftware to carry out the conversion from p values to pos-terior odds is available from http://www.mds.qmul.ac.uk/statgen/grpsoft.html. It is provided in the form of an Excelspreadsheet and also as the C source and MSDOS execut-able for a program which will read in output from the

Page 5 of 6(page number not for citation purposes)

Page 6: A pragmatic suggestion for dealing with results for candidate genes obtained from genome wide association studies

BMC Genetics 2007, 8:20 http://www.biomedcentral.com/1471-2156/8/20

Publish with BioMed Central and every scientist can read your work free of charge

"BioMed Central will be the most significant development for disseminating the results of biomedical research in our lifetime."

Sir Paul Nurse, Cancer Research UK

Your research papers will be:

available free of charge to the entire biomedical community

peer reviewed and published immediately upon acceptance

cited in PubMed and archived on PubMed Central

yours — you keep the copyright

Submit your manuscript here:http://www.biomedcentral.com/info/publishing_adv.asp

BioMedcentral

PLINK program, such as might be available from agenome wide association study.

AcknowledgementsAV was supported by Wellcome Trust Project Grant, Grant No. 076392. JK was supported by an MRC Bioinformatics Training Fellowship, Grant No. G0501329.

References1. Farrall M, Morris AP: Gearing up for genome-wide gene-associ-

ation studies. Hum Mol Genet 2005, 14 Spec No. 2:R157-62.2. Ioannidis JP, Trikalinos TA, Khoury MJ: Implications of small

effect sizes of individual genetic variants on the design andinterpretation of genetic association studies of complex dis-eases. Am J Epidemiol 2006, 164:609-614.

3. Wacholder S, Chanock S, Garcia-Closas M, El Ghormli L, Rothman N:Assessing the probability that a positive report is false: anapproach for molecular epidemiology studies. J Natl CancerInst 2004, 96:434-442.

4. Freimer NB, Sabatti C: Guidelines for association studies inHuman Molecular Genetics. Hum Mol Genet 2005,14:2481-2483.

5. Roeder K, Bacanu SA, Wasserman L, Devlin B: Using linkagegenome scans to improve power of association in genomescans. Am J Hum Genet 2006, 78:243-252.

6. Pe'er I, de Bakker PI, Maller J, Yelensky R, Altshuler D, Daly MJ: Eval-uating and improving power in whole-genome associationstudies using fixed marker sets. Nat Genet 2006, 38:663-667.

Page 6 of 6(page number not for citation purposes)