3
The Future of Genetic Studies of Complex Human Diseases 1IIiiiiil .. 1IiiiII@ Neil Risch; Kathleen Merikangas Science, New Series, Vol. 273, No. 5281 (Sep. 13, 1996), 1516-1517. Stable URL: http://links.jstor.org/sici?sici=0036-8075%2819960913%293%3A273%3A5281%3C1516%3ATFOGS O%3E2.0.CO%3B2-K Science is currently published by American Association for the Advancement of Science. Your use of the JSTOR archive indicates your acceptance of JSTOR' s Terms and Conditions of Use, available at http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at http://www.j stor.org/journals/aaas.html. Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. JSTOR is an independent not-for-profit organization dedicated to creating and preserving a digital archive of scholarly journals. For more information regarding JSTOR, please contact [email protected]. http://www .j stor.org/ Wed Oct 26 03:50:552005

The Future ofGenetic Studies ofComplex Human …...Neil Risch and Kathleen Merikangas The Future of Genetic Studies of Complex Human Diseases Association h = pq (y+ 1)2 2(py + q)2

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The Future ofGenetic Studies ofComplex Human …...Neil Risch and Kathleen Merikangas The Future of Genetic Studies of Complex Human Diseases Association h = pq (y+ 1)2 2(py + q)2

The Future of Genetic Studies of Complex Human Diseases1IIiiiiil..1IiiiII@

Neil Risch; Kathleen Merikangas

Science, New Series, Vol. 273, No. 5281 (Sep. 13, 1996), 1516-1517.

Stable URL:http://links.jstor.org/sici?sici=0036-8075%2819960913%293%3A273%3A5281%3C1516%3ATFOGS O%3E2.0.CO%3B2-K

Science is currently published by American Association for the Advancement of Science.

Your use of the JSTOR archive indicates your acceptance of JSTOR' s Terms and Conditions of Use, available athttp://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you

have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, andyou may use content in the JSTOR archive only for your personal, non-commercial use.

Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained athttp://www.j stor.org/journals/aaas.html.

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen orprinted page of such transmission.

JSTOR is an independent not-for-profit organization dedicated to creating and preserving a digital archive ofscholarly journals. For more information regarding JSTOR, please contact [email protected].

http://www.j stor.org/Wed Oct 26 03:50:552005

Page 2: The Future ofGenetic Studies ofComplex Human …...Neil Risch and Kathleen Merikangas The Future of Genetic Studies of Complex Human Diseases Association h = pq (y+ 1)2 2(py + q)2

Neil Risch and Kathleen Merikangas

The Future of Genetic Studies ofComplex Human Diseases

Association

h = pq (y + 1)22(py + q)2 + pq(y_1)2

linkage analysis for loci conferring GRR ofabout 2 or less will never allow identificationbecause the number of families required(more than - 2500) is not practically achiev­able.

Although tests oflinkage for genes ofmod­est effect are of low power, as shown by theabove example, direct tests ofassociation witha disease locus itself can still be quite strong.To illustrate this point, we use the transmis­sion/disequilibrium test ofSpielman et al. (3).In this test, transmission of it particular alleleat a locus from heterozygous parents to theiraffected offspring is examined. Under Mende­lian inheritance, all alleles should have a 50%chance of being transmitted to the next gen­eration. In contrast, if one of the alleles isassociated with disease risk, it will be trans­mitted more often than 50% of the time.

For this approach, we do not need familieswith multiple affected siblings, but can focusjust on single affected individuals and theirparents. For the same model given above, wecan calculate the proportion of heterozygousparents as pq(y + 1)/(py + q)(4). Similarly,the probability for a heterozygote parent totransmit the high risk A allele is justy/(l + y).Association tests can also be perfonned forpairs of affected siblings. When the locus isassociated with disease, the transmission excessover 50% is the same as for single offspring, butthe probability ofparental heterozygosity is in­creased at low values ofPi for higher values ofp,the probability ofparental heterozygosity is de­creased. The fonnula for parental heterozygos­ity for an affected pair of siblings for the samegenetic model as used in the first example is

Linkage

age analysis we have chosen for this argu­ment is a popular current paradigm in whichpairs of siblings, both with the disease, areexamined for sharing of alleles at multiplesites in the genome defined by genetic mark­ers. The more often the affected siblingsshare the same allele at a particular site, themore likely the site is close to the diseasegene. Using the fonnulas in (1), we calculatethe expected proportion Y of alleles shared bya pair of affected siblings for the best possiblecase-that is, a closely linked marker locus(recombination fraction e = 0) that is fullyinfonnative (heterozygosity = 1) (2)-as

Y = 1 + W where w = pq (y_1)22 + w (py + q)2

If there is no linkage of a marker at aparticular site to the disease, the siblingswould be expected to share alleles 50% ofthetime; that is, Y would equal 0.5. Values of Yfor various values of pand yare given in thethird column of the table. For an allele ofmoderate frequency (p is 0.1 to 0.5) that con­fers a GRR (y) of fourfold or greater, there is adetectable deviation ofY from the null value of0.5. On the other hand, for an allele conferringaGRR of2 or less, the expected marker-sharingonly marginally exceeds 50%, for any allelefrequency (p). Thus, it is clear that the use of

Singletons Sib pairs

Genotypic Frequency Probability No. of Probability of Proportion ofrisk ratio of disease of allele families transmitting heterozygous

allele A sharing required disease allele A parents( ) (Y) (N) p(tr-A) Het) (N) (Het (N)

0.01 0.520 4260 0.800 0.048 1098 0.112 2350.10 0.597 185 0.800 0.346 150 0.537 480.50 0.576 297 0.800 0.500 103 0.424 61

0.80 0.529 2013 0.800 0.235 222 0.163 161

2.0 0.01 0.502 296,710 0.667 0.029 5823 0.043 1970

0.10 0.518 5382 0.667 0.245 695 0.323 2640.50 0.526 2498 0.667 0.500 340 0.474 180

0.80 0.512 11,917 0.667 0.267 640 0.217 394

1.5 0.01 0.501 4.620.80 0.600 0.025 19,320 0.031 7776

0.10 0.505 67,816 0.600 0.197 2218 0.253 9410.50 0.510 17.997 0.600 0.500 949 0.490 4840.80 0.505 67,816 0.600 0.286 1663 0.253 941

Comparison of linkage and association studies. Number of families needed for identification of adisease gene.

SCIENCE • VOL. 273 • 13 SEPTEMBER 1996

N. Risch is in the Department of Genetics, StanfordUniversity School of Medicine, Stanford, CA 94305-5120,USA. E-mail: [email protected]. K. Merikangasis in the Departments of Epidemiology and Psychiatry,Unit, Yale University School of Medicine, New Haven,CT 06510, USA. E-mail: [email protected]

Geneticists have made substantial progress inidentifying the genetic basis of many humandiseases, at least those with conspicuous deter­minants. These successes include Huntington'sdisease, Alzheimer's disease, and some fonns ofbreast cancer. However, the detection of ge­netic factors for complex diseases--such asschizophrenia, bipolar disorder, and diabetes-­has been far more complicated. There havebeen numerous reports of genes or loci thatmight underlie these disorders, but few ofthesefindings have been replicated. The modest na­ture ofthe gene effects for these disorders likelyexplains the contradictory and inconclusiveclaims about their identification. Despite thesmall effects of such genes, the magnitude oftheir attributable risk (the proportion ofpeopleaffected due to them) may be large because theyare quite frequent in the population, makingthem of public health significance.

Has the genetic study ofcomplex disordersreached its limits? The persistent lack ofreplicability of these reports of linkage be­tween various loci and complex diseasesmight imply that it has. We argue below thatthe method that has been used successfully(linkage analysis) to find major genes has lim­ited power to detect genes of modest effect,but that a different approach (associationstudies) that utilizes candidate genes has fargreater power, even if one needs to test everygene in the genome. Thus, the future of thegenetics ofcomplex diseases is likely to requirelarge-scale testing by association analysis.

How large does a gene effect need to be inorder to be detectable by linkage analysis?We consider the following model: Suppose adisease susceptibility locus has two alleles Aand a, with population frequencies Pand q =1 - p, respectively. There are three geno­types: AA, Aa, and aa. We define genotypicrelative risks (GRR, the increased chancethat an individual with a particular genotypehas the disease) as follows: Let the risk forindividuals ofgenotype Aa be y times greaterthan the risk for individuals with genotypeaa, a GRR of y. We assume a multiplicativerelation for two A alleles, so that the GRRfor genotype AA is y2. The method oflink-

1516

Page 3: The Future ofGenetic Studies ofComplex Human …...Neil Risch and Kathleen Merikangas The Future of Genetic Studies of Complex Human Diseases Association h = pq (y+ 1)2 2(py + q)2

On the right side of the table, we presentthe proportion ofheterozygous parents (Het)and the probability of transmission of the A

'f allele from a heterozygous parent to an af­fected child [P( tr-A)] for the same values ofGRR as considered above for the example oflinkage analysis. The deviation from the nullhypothesis of 50% transmission from het­erozygous parents is substantially greaterthan the excess allele sharing that is found bylinkage analysis in sibling pairs. This dispar­ity between the methods is particularly truefor lower values ofy (that is, with lower rela­tive risk). FQr example, for y = 1.5, allelesharing is at most 51 %, while the A allele istransmitted 60% of the time from heterozy­gous parents.

In this respect then, association studiesseem to be of greater power than linkagestudies. But of course, the limitation of as­sociation studies is that the actual gene orgenes involved in the disease must be tenta­tively identified before the' test can be per­formed. In fact, the actual polymorphismwithin the gene (or at least a polymorphism instrong disequilibrium) must be available.However, we show that this requirement isonly daunting becau~e of limitations tmposedby current technological capabilities, not be­cause sufficient families with the disease arenot available or the statistical power is inad­equate (5). For example, imagine the timewhen all human genes (say 100,000 in total)have been found and that simple, diallelicpolY1TIorphisms in these genes have beenidentified. Assume that five such diallelicpolymorphisms have been identified withineach gene, so that a total of lOx 105 = 106

alleles need to be tested. The statistical prob­lem is that the large number of tests that needto be made leads to an inflation of the type 1error probability. For a linkage test with pairsof affected siblings, we use a lod score (loga­rithm of the odds ratio for linkage) criterionof3.0, which asymptotically corresponds to atype 1 error probability a of about 10-4. In alinkage genome screen with 500 markers,this significance level gives a probabilitygreater than 95% of no false positives. Theequivalent false positive rate for 1,000,000independent association tests can be ob­tained with a significance level a = 5 x 10-8.

We illustrate the power of linkage versusassociation tests at different significance lev­els by determining the sample size N (num­ber of families) necessary to obtain 80%power (the probability of rejecting the nullhypothesis when it is false) (6) (see table).With a linkage approach and a disease genewith a ORR of 4 or greater, the nUluber ofaffected sibling pairs necessary to detect link­age is realistic (185 or 297), provided theallele frequency p is between 5 and 75 0/0. Fora gene with a GRR of 2 or less, however, thesample sizes are generally beyond reach (well

over 2000), precluding their identificationby this approach. In contrast, the requiredsample size for the association test, even al­lowing for the smaller significance level, isvastly less than for linkage, especially for af­fected sibling pairfamilies when the value ofp is small. Even for a GRR of 1.5, the samplesizes are generally less than 1000, well withinreason.

Thus, the primary limitation of genome­wide association tests is not· a statistical onebut a technological one. A large number ofgenes (up to 100,000) and polymorphisms(preferentially ones that create alterations inderived proteins or their expression) must firstbe identified, and an extremely lc~.rge numberof such polymorphisms will need to be tested.Although testing such a large number ofpoly­morphisms on several hundred, or even athousand families, might currently seem im­plausible in scope, more efficient methods ofscreening a large number of polymorphisms(for example, sample pooling) may be pos­sible. Furthermo~e, the number of tests wehave used as the basis for our calculations(1,000,000) is likely to be far larger than nec­essary if one allows for linkage disequilibrium,which could substantially reduce the requirednumber of markers and families needed forinitial screening.

Some of the important loci for complexdiseases will undoubtedly be found by link­age analysis. However, the limitations to de­tecting many of the remaining genes by link­age studies can be overcome; numerous ge­netic effects too weak to identify by linkagecan be detected by genomic association stud­ies. Fortunately, the samples currently col­lected for linkage studies (for example, af­fected pairs of siblings and their parents) canalso be used for such association studies.Thus, investigators should preserve theirsamples for future large-scale testing.

The human genome project can havemore than one reward. In addition to se­quencing the entire human genome, it canlead to identification of polymorphisms forall the genes in the hllman genome and thediseases to which they contribute. It is acharge to the molecular technologists to de­velop the' tools to meet this challenge andprovide the information necessary to identifythe genetic basis of complex human diseases.

References and Notes

1. N. Risch, Am. J. Hum. Genet. 40, 1 (1987); ibid.46,229 (1990).

2. From the formulas in (1), we have AO = 1 + 0.5 VAlK2 and AS = 1 + (0.5 VA + 0.25 Vo)/K2 , where K =p2y2 + 2pqy + q2= (QY + q)2,VA = 2pq(y- 1)2 (py+ q)2, and Vo = p2q2(y_ 1)2. Hence, AO = 1 + W

and AS = (1 + 0.5w)2, where w = pq(y - 1)2. Theproportion of alleles shared is given by Y = 1 ­0.5z1 - zO' where z1 and Zo are the probabilities ofthe sib pair sharing 1 and 0 disease alleles ibd, re­spectively. From (1), Zo = 0.25/AS and z1 = 0.5AO/AS' Thus, after some algebra, Y= 1 - 0.25(AO + 1)/

SCIENCE • VOL. 273 • 13 SEPTEMBER 1996

PERSPECTIVES

AS = (1 = w)/(2 + w).3. R. Spielman, R. E. McGinnis, W. J. Ewens, Am. J.

Hum. Genet. 52, 506 (1993).4. By Bayes theorem, the probability of a parent of an

affected child being heterozygous is given byP(HetlAff child) = P(Het)P(Aff ChildIHet)/P{Aff Child)= 2pq[0.5p(y2 + y)] + 0.5q(y + 1)/(py + q)2 =pq(y +1)/(py+ q).

5. E. S. Lander and N. J. Schurk, Science 265,2037(1994).

6. Consider a set of M independent, identically dis­tributed random variables Bj of discrete value. Un­der the null hypothesis Ho' assume E(Bj ) = 0 andVar(Bj) = 1. Under the alternative hypothesis Hi'let E(Bj ) = Il and Var(Bj) = (52. For a sample of sizeM, let T= I,Bj/m: Then under Ho' Talso has meanoand variance 1, while under Hi' it has mean {MIl and variance (52. We assume that T is approxi­mately normally distributed both under Hoand Hi'Then the sample size M required to obtain a powerof 1 - ~ for a significance level a is given by

M = (Za - (5Z1 _ (3)2/p2 (1)

For each affected sib pair, we score the number ofalleles shared ibd from each of 2N parents. DefineBj = 1 if an allele is shared from the ith parent andB j = -1 if unshared. Under the null hypothesis ofno linkage, P(Bi = 1) = P(Bj = -1) = 0.5, so E(B

j) =

oand Var(Bj) = 1. For the genetic model describedabove with genotypic relative risks of y2, y, and 1,allele sharing by affected sibs is independent forthe two parents; thus, we can consider sharing ofalleles one parent at a time. Thus, for affected sibpairs assuming e= 0 and no linkage disequilibrium,the formula is

where

Jl = 2Y-1

(j2 = 4Y( 1- Y)

Y= 1 + W2+ W

pq(y_1)2w=-----

'(py + q)2

Za = .3.72 (corresponding to a = 10-4), and Z1 _ ~= -0.84 (corresponding to 1 - ~ = 0.80)._ For anassociation test using the transmission/disequili­brium test, with the disease locus or a nearby lo­cus in complete disequilibrium, the number (N) offamilies with affected singletons required for 80%power is also calculated from formula 1. For thiscase, we score the number of transmissions of alleleA from heterozygous parents. Let h be the probabil­ity a parent is heterozygous under the alternativehypothesis, namely, h = pq(Y+ 1)/(py + q). Then de­fine Bj = h-O.5 if the parent is heterozygous and al­lele A is transmitted; Bj = 0 if the parent is homozy­gous; and Bj = _h-D·5 if the parent is heterozygousand transmits allele a. Under the null hypothesis,E(B

j) = 0 and Var(Bj) = 1. Under the alternative hy­

pothesis, ~ = E(Bj) = {h(y - 1)/(y + 1) and (J2 =Var(Bi) = 1 - h(y -1)2/(y + 1)2. In this case, there aretwo parents per family and they act independently,so the required number (N) of families is given byhalf of formula 1 where ~ and (52 are given above.Here, Za = 5.33 (corresponding to a = 5 x 10-8). Forthe same test but with affected sib pairs instead ofsingletons, the number of families required is givenby half of formula 1 (transmissions from two parentsto two children) with the same formulas for Il and (52as for singleton families but now using the heterozy­gote frequency for parents of affected sib pairs. Us­ing the above formulas, we can calculate samplesizes for the three study designs.

27 October 1995; accepted 6 June 1996.

1517