10
Genetic Susceptibility to Prostate, Breast, and Colorectal Cancer among Nordic Twins Author(s): Stuart G. Baker, Paul Lichtenstein, Jaakko Kaprio and Niels Holm Source: Biometrics, Vol. 61, No. 1 (Mar., 2005), pp. 55-63 Published by: International Biometric Society Stable URL: http://www.jstor.org/stable/3695647 . Accessed: 25/06/2014 04:26 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. . International Biometric Society is collaborating with JSTOR to digitize, preserve and extend access to Biometrics. http://www.jstor.org This content downloaded from 185.44.78.31 on Wed, 25 Jun 2014 04:26:52 AM All use subject to JSTOR Terms and Conditions

Genetic Susceptibility to Prostate, Breast, and Colorectal Cancer among Nordic Twins

Embed Size (px)

Citation preview

Page 1: Genetic Susceptibility to Prostate, Breast, and Colorectal Cancer among Nordic Twins

Genetic Susceptibility to Prostate, Breast, and Colorectal Cancer among Nordic TwinsAuthor(s): Stuart G. Baker, Paul Lichtenstein, Jaakko Kaprio and Niels HolmSource: Biometrics, Vol. 61, No. 1 (Mar., 2005), pp. 55-63Published by: International Biometric SocietyStable URL: http://www.jstor.org/stable/3695647 .

Accessed: 25/06/2014 04:26

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

International Biometric Society is collaborating with JSTOR to digitize, preserve and extend access toBiometrics.

http://www.jstor.org

This content downloaded from 185.44.78.31 on Wed, 25 Jun 2014 04:26:52 AMAll use subject to JSTOR Terms and Conditions

Page 2: Genetic Susceptibility to Prostate, Breast, and Colorectal Cancer among Nordic Twins

BIOMETRICS 61, 55-63 March 2005

Genetic Susceptibility to Prostate, Breast, and Colorectal Cancer among Nordic Twins

Stuart G. Baker,''* Paul Lichtenstein,2 Jaakko Kaprio,3 and Niels Holm4,5

1Biometry Research Group, Division of Cancer Prevention, National Cancer Institute, EPN 3131, 6130 Executive Boulevard, MSC 7354, Bethesda, Maryland 20892-7354, U.S.A.

2Department of Medical Epidemiology, Karolinska Institute, 17177 Stockholm, Sweden 3Department of Public Health, University of Helsinki, FIN-00014 Helsinki, Finland

4The Danish Twin Registry, Institute of Public Health, University of Southern Denmark, DK-5000 Odense C, Denmark

5Department of Oncology, Odense University Hospital, DK-5000 Odense C, Denmark * email: [email protected]

SUMMARY. To investigate the role of genetics in the development of cancer, we developed a new approach to analyze data on prostate, breast, and colorectal cancer from the Swedish, Danish, and Finnish twin registries on monozygotic (MZ) and same-sex dizygotic (DZ) twins. In the spirit of a sensitivity analysis, we modeled genetic inheritance as either an autosomal recessive or dominant cancer susceptibility (CS) genotype that involves either a single gene, many genes with equal allele frequencies, or three genes with a ninefold range of allele frequencies. We also modeled the joint probability of cancer incidence among five age categories, conditional on the presence or absence of the CS genotype. The main assumptions are: (1) The joint distribution of unobserved environmental effects in a twin pair conditional on the presence or absence of the CS genotype is the same for MZ and DZ twins, (2) the probability of cancer conditional on the presence or absence of the CS genotype and the unobserved environmental effects (i.e., the gene- environment interaction) is the same for MZ and DZ twins, and (3) the probability of cancer is independent between twins with the CS genotype. Estimation was maximum likelihood via a search over allele frequency and two levels of EM algorithms. Models had acceptable or good fits. Variability was estimated using a bootstrap approach, but only 50 replications were feasible. The 94th percentile of bootstrap replications for the estimated fraction of cancers with the CS genotype ranged, over the various genetic models, from 0.16 to 0.45 for prostate cancer, 0.12 to 0.30 for breast cancer, and 0.08 to 0.27 for colorectal cancer. We conclude that genetic susceptibility makes only a small to moderate contribution to the incidence of prostate, breast, and colorectal cancer.

KEY WORDS: Breast cancer; Colorectal cancer; EM algorithm; Genetics; Hardy-Weinberg; Latent class; Prostate cancer; Twins.

1. Introduction Twin studies are conducive to investigating environmental and genetic components of cancer. Most twin studies involve the analysis of data from monozygotic (MZ) and same-sex dizygotic (DZ) twins. The key assumption is that the en- vironmental component of cancer is the same for MZ and DZ twins. Under this assumption a difference between the association of cancer incidence among MZ twins and the as- sociation of cancer incidence among DZ twins is likely due to genetic factors that can be estimated with sufficient data. The traditional analysis assumes a polygenic model in which susceptibility to cancer is regarded as an unobserved continu- ous variable that is the sum of underlying continuous genetic and environmental variables, presumably from the contribu- tion of multiple genes and environmental influences. Under the key assumption, the variance contribution from the un- derlying environmental variables is assumed to be the same

in MZ and DZ twins. Also the variance of the underlying ge- netic susceptibility variable differs in a known way between MZ and DZ twins. With an additional assumption that the

genetic mechanism is additive, the variance can be uniquely partitioned into variance components for underlying genetic and environmental variables.

Recently Lichtenstein et al. (2000) applied the tradi- tional polygenic model to analyze data from the largest twin study of cancer to date, involving approximately 45,000 twin

pairs from the Swedish, Danish, and Finnish twin registries. Lichtenstein et al. (2000) found that the proportion of the variance attributed to genetic susceptibility was small or mod- erate. In reanalyzing summary data from Lichtenstein et al.

(2000), Risch (2001) found (i) estimates favoring a single gene model over a polygenic model, (ii) estimates of risk ratios that under a single gene model would be consistent with a

large proportion of cancer attributed to genetic susceptibility

55

This content downloaded from 185.44.78.31 on Wed, 25 Jun 2014 04:26:52 AMAll use subject to JSTOR Terms and Conditions

Page 3: Genetic Susceptibility to Prostate, Breast, and Colorectal Cancer among Nordic Twins

56 Biometrics, March 2005

if the allele frequency were low, and (iii) nonsensical esti- mates when allele frequency for breast cancer is low. These results cast doubt on the conclusions of Lichtenstein et al. (2000).

To better investigate the contribution of genetic suscepti- bility to cancer, we proposed a novel approach to the analysis of twin data based on the presence or absence of a latent bi- nary cancer susceptibility (CS) genotype in a model involving a single gene, many genes with equal allele frequencies, and three genes with a ninefold range of allele frequencies, all ei- ther autosomal dominant or recessive. We fit the model to readily available twin data on prostate, breast, and colorectal cancer from the Swedish, Danish, and Finnish twin registries. As will be discussed, we obtained qualitatively similar con- clusions as Lichtenstein et al. (2000), and we attribute the difference with Risch (2001) to fitting a more general model.

2. Basic Formulation with Key Assumption All Danish, Swedish, and Finnish twins identified within a specified age range and time period were followed a spec- ified number of years (Lichtenstein et al., 2000). To avoid specifying a parametric model for relationship between age and probability of cancer and to simplify computations, we grouped the data into five age intervals: 15-49, 50-59, 60- 69, 70-79, and 80-115. Each individual was recorded as hav- ing incident cancer or being censored (by end of follow-up or death from competing risk) in an age interval. The origi- nal data can be viewed as three 5 x 5 tables for incident- incident, incident-censored, and censored-censored for MZ and DZ twins, with zeros in the lower diagonals for incident- incident and censored-censored tables, as the order of the twins is arbitrary. To simplify the mathematics we (1) created symmetric incident-incident and censored-censored tables by adding the counts to their transpose and dividing by two and

(2) defined censoring in age interval 70-79 to include any in- cident cancers over age 80. (A technical report is available at http://stat.tamu.edu/Biometrics/.)

Let Cancer = (a, b) denote the joint age intervals of cancer incidence in a twin pair, for a = 1, 2, 3, 4 and b = 1, 2, 3, 4. Let zyg denote zygosity, which is either MZ or DZ. The basic multinomial probability is Pr (Cancer I zyg). To introduce a genetic model, we let CS (with realization cs) indicate the joint presence and absence of the cancer susceptibility (CS) genotype. For MZ and DZ twins CS = (1, 1) if both twins have the CS genotype and CS = (0, 0) if neither twin has the CS genotype. For DZ twins, CS = (1, 0) if only the first twin has the CS genotype, and CS = (0, 1) if only the second twin has the CS genotype. Because only cancer ages and zygosity are observed, we incorporate CS as a latent variable in the following mathematical identity:

Pr(Cancer zyg) = Pr(Cancer i cs, zyg) Pr(CS = cs zyg).

(1)

A model based directly on (1) is not identifiable without some assumptions. Our key assumption is that Pr(Cancer I cs, zyg) = Pr(Cancer I cs). This assumption is best understood as arising from two more fundamental assumptions about unob- served environmental effects. Let Envir = (Envirl, Envir2)

where Enviri is the unobserved environmental effect in the ith twin. Starting with the mathematical identity,

Pr(Cancer I cs, zyg) = E Pr(Cancer I cs, envir, zyg) envir

x Pr(Envir = envir I cs, zyg), (2)

we make the following two assumptions. First we assume that the joint distribution of environmen-

tal risk factors in a twin pair conditional on the presence or absence of the CS genotype does not depend on zygosity, i.e., Pr(Envir j cs, zyg) = Pr(Envir I cs). There is some sup- port for this assumption from other studies. Stunkard et al.

(1990) found that the intrapair correlation for body mass in- dex among twins reared apart was similar to that for twins reared together, indicating that environment has little influ- ence on the association of body mass index within a twin

pair. Bouchard et al. (1990) found that among a wide variety of variables related to mental abilities and attitudes, and also

height, weight, and blood pressure, the correlation among MZ twins reared apart was similar to that for MZ twins reared to-

gether. In so far as these variables are related to cancer risk, perhaps through lifestyle choices, they also support our as-

sumption. In addition if the CS genotype is rare (which is one of our results), a comparison of the joint distributions of ob- served environmental risk factors between MZ and DZ twins is basically a comparison in the absence of the CS genotype. Although we do not know of any studies reporting the joint distribution of environmental risk factors in MZ and DZ twins, data from two studies (Andrew et al., 2001; Kujala, Kaprio, and Koskenvuo, 2002) indicate similar marginal distributions of environmental risk factors (such as obesity, smoking, alco- hol consumption) in MZ and DZ twins.

Second, we assume that the probability of cancer condi- tional on CS genotype and environmental exposure (i.e., a

gene-environment interaction) does not depend on zygosity, i.e., Pr(Cancer I cs, envir, zyg) = Pr(Cancer I cs, envir). This

assumption would be violated if screening for cancer in one twin given the other was screened was more likely in MZ than DZ twins. However, large differential screening rates between MZ and DZ twins were unlikely because there was generally either little or widespread screening at any given time prior to the end of follow-up in the mid-1990s. It was only in 1996 that

population-based studies for Hemocult screening for colorec- tal cancer were set up in Sweden and Denmark (Tazi, Faivre, and Benhmiche, 1996), and, as of 1997, mass screening for col- orectal cancer was not implemented as public health policy in

any Nordic country (Hristova and Hakama, 1997). Systematic breast cancer screening was widespread in Sweden by the late 1980s (Hristova and Hakama, 1997; Rostgaard et al., 2001) and by 1991 all birth cohorts in Finland had been invited for at least one screening (Hristova and Hakama, 1997). Sys- tematic breast cancer screening started in Denmark in 1991 in one region that covered about 8%- of the eligible popula- tion (Olsen et al., 2003). There is relatively little information about prostate cancer screening; in a 1996 review, the Swedish Council for the Technology Assessment in Health Care did not recommend PSA screening for prostate cancer (Johnson, Banta, and Shersten, 2001). Another mitigating factor is that even if there were different screening rates among MZ and DZ

This content downloaded from 185.44.78.31 on Wed, 25 Jun 2014 04:26:52 AMAll use subject to JSTOR Terms and Conditions

Page 4: Genetic Susceptibility to Prostate, Breast, and Colorectal Cancer among Nordic Twins

Genetic Susceptibility to Prostate, Breast, and Colorectal Cancer 57

twins, some cancers that were detected by screening would have surfaced later clinically and have been included in the data (although possibly in an older age category), so that cancer incidence rates would have been less affected.

Substituting the assumptions into (2) gives Pr(Cancer I cs, zyg) = Pr(Cancer I cs), so the unobserved environmental ef- fects are not included in the model. Further substitution into (1) gives our basic model

Pr(Cancer I zyg) = Pr(Cancer I cs) Pr(CS = cs I zyg).

(3)

In Sections 3 and 4, we formulate specific models for Pr(CS I zyg) and Pr(Cancer I cs), respectively.

3. Joint Probabilities of the CS Genotype In the spirit of a sensitivity analysis, we consider six models for the joint probabilities of the CS genotype. Two models assume that the CS genotype arises from a single autosomal recessive or dominant gene. Two novel models assume the CS genotype arises from many autosomal recessive or dom- inant genes, where, for identifiability, the allele frequencies

are assumed to be equal. Two related models assume the CS

genotype arises from three autosomal recessive or dominant

genes, where the allele frequencies have ratios (1/3):1:3.

3.1 Models for a Single Gene

Let C and c denote alleles for a single autosomal gene. Under autosomal recessive inheritance, the CS genotype is present if an individual is CC. Under autosomal dominant inheritance the CS genotype is present if an individual is CC or Cc. Let 0 denote the population frequency of cancer allele C. As de- rived in Table 1, under Hardy-Weinberg equilibrium the joint probabilities of the CS genotype in the two twins are

02 if recessive 1 - (1 - 0)2 if dominant'

Pr{CS = (0, 0) 1 MZ; 0} = 1 - Pr{CS = (1, 1) 1 MZ; 0},

1 - Pr(CS = cslDZ;0) cs$(o,o)

Pr{CS = (0, 0) I DZ; 0} = if recessive

0.25(2 - 30 + 02)2

if dominant,

Table 1 Genotype frequencies for MZ and DZ twins

Probability of twin genotype conditional

Parent 0 = allele frequency Twin on parent genotype

genotype Probability of parent genotype genotype MZ DZ

CC, CC fl = 04 CC, CC 1.00 1.00

CC, Cc f2 - 403 (1 - 0) CC, CC 0.50 0.25 CC, Cc 0.50 Cc, Cc 0.50 0.25

CC, cc f3 = 202(1 - 0)2 Cc, Cc 1.00 1.00

Cc, Cc f4 = 402(1 - 0)2 CC, CC 0.25 0.0625 cc, Cc 0.250 CC, cc 0.125 Cc, Cc 0.50 0.250 Cc, cc 0.250 cc, cc 0.25 0.0625

Cc, cc f5 - 40(1 - 0)3 Cc, Cc 0.50 0.25 Cc, cc 0.50 cc, cc 0.50 0.25

cc, cc f6 = (1 - 0)4 cc, cc 1.00 1.00

Autosomal recessive (CS genotype is CC): Pr{CS = (1, 1)1 MZ} = fl + f20.5 + f40.25 Pr{CS = (0, 0)1 MZ} = 1 - Pr{CS = (1, 1) IMZ} Pr{CS = (1, 1) DZ} = fl + f20.25 + f40.0625 Pr{CS = (1, 0) DZ} = Pr{CS = (1, 0) IDZ} = 0.5(f20.50 + f40.375) Pr{CS = (0, 0) DZ} =

Ecs$(o,o) Pr(CS = cs I DZ). Autosomal dominant (CS genotype is CC or Cc):

Pr{CS = (1, 1) 1 MZ} = 1 - Pr(CS = (0, 0) I MZ) Pr{CS = (0, 0) 1 MZ} = fs + f50.50 + f40.25 Pr{CS = (1, 1) 1DZ} =

Ecs$(1,1) Pr(CS = cs I DZ), Pr{CS = (1, 0) 1 DZ} = Pr{CS = (1, 0) I DZ} = 0.5(f50.50 + f40.375) Pr{CS = (0, 0) 1 DZ} = f6 + f50.25 + f40.0625.

This content downloaded from 185.44.78.31 on Wed, 25 Jun 2014 04:26:52 AMAll use subject to JSTOR Terms and Conditions

Page 5: Genetic Susceptibility to Prostate, Breast, and Colorectal Cancer among Nordic Twins

58 Biometrics, March 2005

(1/2)0.50(3 - 20 - 02)02 if recessive

Pr{CS = (0,11) 1 DZ; 01- (1/2)0.50(4 - 0)(1 - 0)2

if dominant,

Pr{CS= (1,0) DZ; 0} = Pr{CS = (0, 1) 1DZ; 0},

0.2502(1 + 0)2 if recessive

Pr{CS= (1,1) DZ;} = 1- > Pr(CS= cs DZ;0) cs#(1,1)

if dominant.

(4)

3.2 Models for Many Genes with Equal Allele Frequencies Let CS(,) indicate the joint presence or absence of the CS genotype in an individual based on inheritance from n genes. Let Gene indicate the joint presence or absence of the CS genotype in a particular gene for a twin pair, either (0, 0), (0, 1), (1, 0), and (1, 1). The probabilities of the CS genotype in a particular gene for a twin pair are given by the formulas in (4) with Gene substituted for CS.

For a multiple gene autosomal recessive model, we assume a multistage model in which the CS genotype in an individual requires the presence of the CS genotype in all n autosomal recessive genes. For reasons discussed below, we set the com- mon allele frequency for each of n genes equal to the geometric mean 01/n. Under this model, the probability both twins have the CS genotype equals the probability that each gene in both twins has the CS genotype,

Pr{CS(n) = (1,1) z; 0} = [Pr{Gene = (1,1)1 z; 01/"n}] if recessive. (5)

From (4) and (5), Pr{CS(n) = (1, 1) 1 MZ; 0} = 02 which equals Pr{CS = (1, 1) MZ; 0}. Thus the choice of 01/n as the com- mon allele frequency implies that the prevalence of the CS genotype in MZ twins is the same under the single gene and multiple genes models. For only the first DZ twin to have the CS genotype, each pair of twin genes must have CS geno- type (1, 0) or (1, 1) without all having CS genotype (1, 1). Therefore, we can write

Pr{CS(,) = (1, 0) DZ; 0}

= Pr{CS(,) = (1, 0) or (1, 1)1 DZ; 0} - Pr{CS(,, = (1, 1)(DZ; 0}

= [Pr{Gene = (1, 0)1 DZ; 01/")}

+ Pr{Gene = (1, 1) DZ; 01/"n}"n

-[Pr{Gene= (1, 1)IDZ; 01/")}]" if recessive. (6)

The probability only the second DZ twin has the CS geno- type equals (6). The probability neither DZ twin has the CS genotype equals 1 minus the sum of the probabilities for the other possible CS genotypes.

For a multiple gene autosomal dominant model, we assume a multiple pathway model in which the CS genotype in an

individual requires the presence of the CS genotype in at least one of n autosomal dominant genes. For reasons discussed below, we set the common allele frequency for each of n genes equal to 1 - (1 - 01/n). Under this model, the probability neither twin has the CS genotype equals the probability that each gene in both twins does not have the CS genotype,

Pr{CS(n) = (0, 0) I z; 01

= [Pr{Gene = (0, 0) z; 1 - (1 - 0)1/l}]n if dominant.

(7)

From (4) and (7), Pr{CS(n)= (1, 1) I MZ; 0} =1 - (1 - 0)2, which equals Pr{CS = (1, 1) MZ; 0}. Thus the choice of 1 - (1 - 01/n) as the common allele frequency implies that the prevalence of the CS genotype in MZ twins is the same under the single gene and multiple genes models. By analogy with (6), the probability that only the first DZ twin has the CS genotype is

Pr{CS() = (1, 0)1 DZ; 0}

= [Pr{Gene = (1, 0)I DZ; 1 - (1 - 0)1/n"

+ Pr{Gene = (0, 0) 1DZ; 1 - (1 - 0)1/"n}]

- [Pr{Gene = (0, 0) IDZ; 1 - (1 - )'/nl}]n if dominant.

(8)

Had we formulated the autosomal recessive model with

multiple pathways or the autosomal dominant model with multiple stages, rather than vice versa, we would have ob- tained unrealistic probabilities of 0 and 1. To simplify the formulas for DZ twins we made the following approximations (which are very good for n > 10) by taking the limit as the number of genes approaches oo,

1 - 1 Pr {CS(n) = cs IDZ; 0} Pr{CS = (O,0) 1 DZ; 0} cs (OO) (,) DZ; 0 if recessive

(1 - 0)2 if dominant,

{I1 - 0)02 if recessive Pr{CS(n) = (1,0) DZ; 0}

(1 - 0)20 if dominant, (1 - 0)20 if dominant,

Pr{CS(n) = (0, 1)1 DZ;0} = Pr{CS(n) = (1, 0) 1DZ; 0}

03 if recessive

1 - /

Pr(CS(n)

= cs j DZ; 0) cs#(1,1)

if dominant.

(9)

3.3 Models for Three Genes with Specified Unequal Allele Frequencies

To broaden the scope of the models, we also considered a three-gene model in which the allele frequencies have ratios (1/3):1:3. By analogy with (5), for our autosomal recessive three-gene model, the probability both twins have the CS genotype is Pr{CS(agene) = (1, 1) z; 0} = Pr{Gene = (1, 1) z;

This content downloaded from 185.44.78.31 on Wed, 25 Jun 2014 04:26:52 AMAll use subject to JSTOR Terms and Conditions

Page 6: Genetic Susceptibility to Prostate, Breast, and Colorectal Cancer among Nordic Twins

Genetic Susceptibility to Prostate, Breast, and Colorectal Cancer 59

(1/3)01/3} x Pr{Gene = (1, 1) 1z; 01/3} x Pr{Gene = (1, 1) I z; 301/3}. Analogous equations are derived for (6)-(8).

To simplify later notation we drop the subscript on CS with the understanding that Pr(CS I zyg; 0) could refer to ei- ther the models for a single gene, many genes with equal al- lele frequencies, or three genes with a ninefold range of allele frequencies.

4. Joint Probability of Cancer Incidence Given the CS Genotype

The joint probability of cancer incidence depends on the pres- ence or absence of the CS genotype. When neither twin has the CS genotype we make no assumptions except symmetry because the order of the twins is arbitrary,

Pr{Cancer - (a, b) I CS = (0, 0)} -= /ab

ifa < b and Oba if a > b, (10)

where Ea b /ab = 1. When both twins have the CS geno-

type we assume the probabilities of cancer for each twin with the CS genotype are independent. It is mathematically con- venient to parameterize these probabilities using hazard func- tions. Let aa for t = 1, 2, 3, 4 denote the discrete-time hazard for cancer incidence in age interval a. The model is

a-1 b-1

Pr{Cancer = (a, b)I CS = (1, 1)} = (1 - at)aa J(1 --

at)ab. t=1 t=l

(11)

When only one twin has the CS genotype, we assume in- dependence and obtain

a-1 Pr{Cancer = (a, b) I CS = (0, 1)} = (1

- at)aa ,~ ab.

t=l a

(12)

5. Maximum Likelihood Estimation We estimated the parameters 0, aa, and /ab using the method of maximum likelihood. Let nil(a, b, zyg), n2(a, b, zyg), and n3(a, b, zyg) denote the number of twins of zygosity zyg, with joint outcomes in age intervals a and b of (cancer, cancer), (cancer, censoring), and (censoring, censoring), respectively. The likelihood kernel is

L= IIzyg aIb Pr{Cancer = (a, b)I zyg; 8, a, /}nl(a,b,zyg)

n 1 n2(a,b,zyg)

[+x Pr{Cancer = (a, u) I zyg; 0, a, )}

n a+ 1n(a,b,zyg) x Pr{Cancer - (a, u)I zyg; 0, a, ) },

u= -+1 u=b+i

(13)

where Pr{Cancer = (a, b) Izyg; 0, a, 3} is based on (3), (4), and (9)-(12). The likelihood kernel for colorectal cancer is the product of separate likelihood kernels for men and women with the same 0 and a for both sexes but different /3's. (A preliminary test for different a's could not reject the null hy- pothesis of the same a for men and women.)

Using software written in Mathematica (Wolfram Research, 2002) we maximized L by adaptively searching over al- lele frequency 0 and invoking an EM algorithm (Dempster, Laird, and Rubin, 1977) for maximizing a and / with 0 treated as known. (See the technical report available at http://stat.tamu. edu./Biometrics/.) The E-step imputes the counts for each possible realization of the CS variable. The M-step computes a using closed-form estimates and / using a secondary EM algorithm. To minimize numerical problems we added 0.01 to cells with zero counts.

Our summary measure for goodness of fit is the p-value based on a X2 distribution for twice the difference between the maximized log likelihood of the saturated model and the maximized log likelihood of the genetic model, assuming non- informative censoring. The likelihood for the saturated model is (8) with Pr{Cancer - (a, b) Izyg} = yab(zyg) if a < b, and -Yba(zyg), if a > b, where La Lb Ezyg Yab(zyg) = 1. For prostate and breast cancer, there are nine degrees of freedom based on 19 parameters (4 for {at }, 14 for {/fab}, and 1 for 0) from the genetics model and 28 parameters from the satu- rated model. For colorectal cancer, there are 23 degrees of freedom based on 33 parameters (4 for at, 28 for {fLabx}, and 1 for 0) from the genetics model and 58 parameters from the saturated model.

Using the maximum likelihood estimates of the fundamen- tal parameters, we computed the following primary estimates of interest,

7 = estimated prevalence of the CS genotype

= Pr(CS = (1, 1) I MZ; 0}, (14)

F, = estimated fraction of cancers with CS genotype 7rR1

R •i

? (1 - i)R0' (15)

F0 = estimated fraction of noncancers with CS genotype

t(1 -- R1)

=(1 - Rji) + (1 - #r)(1 - Rox)' (16)

R = relative risk of cancer for CS genotype versus

no CS genotype = R1/Ro, (17)

where

R1 = estimated cumulative risk of cancer if CS genotype 4 a-1

= ZWa I(1- i)&a, (18) a=1 ii=1

Ro = estimated cumulative risk of cancer if no CS genotype

4 (19)

a=l i=1

the cumulative risks and fraction of cancers applies to age 79, and Wa is the probability of surviving to age interval a for the appropriate sex, obtained from population life tables. Also for colorectal cancer we averaged the estimates over men and women.

This content downloaded from 185.44.78.31 on Wed, 25 Jun 2014 04:26:52 AMAll use subject to JSTOR Terms and Conditions

Page 7: Genetic Susceptibility to Prostate, Breast, and Colorectal Cancer among Nordic Twins

60 Biometrics, March 2005

Because some parameter values were near the boundary of the parameter space, we could not compute asymptotic vari- ances using the observed information matrix. Instead, to es- timate uncertainty, we used a bootstrap approach (Efron and

Gong, 1983). To smooth the bootstrap distribution and avoid

sampling cells with probabilities of zero, we computed multi- nomial bootstrap replications using expected counts under the model rather than the observed counts. Because compu- tations were very slow, we were limited to only 50 bootstrap replications.

For some bootstrap replications some likelihoods had two

modes, one at a small value for 0 and one at a large value of 0, often corresponding to a very high prevalence of the CS geno- type. We thought that modes corresponding to prevalences over 0.5 were unrealistic. By analogy, in standard models for two latent classes, there are always two maximums and typ- ically one is discarded as being inappropriate. Therefore, we restricted our search over 0 to values with CS genotype preva- lences of <0.5. Also when fitting autosomal dominant models, we found it important to concentrate much of the search on

very small values of allele frequencies; otherwise one may miss the maximum.

6. Results

In light of the multiple comparisons, all the models fit ade-

quately or very well (Table 2). Therefore, in the spirit of a

sensitivity analysis we report a range of estimates based on all models (Table 2 and Figure 1). The large value of some estimates compared to the mean of the bootstrap distribution is likely a consequence of the smoothing in the bootstrap cal- culations. The 94th percentile (fourth highest out of 50) for

prevalence of the CS genotype ranged from 0.01 to 0.13 for

prostate cancer, 0.01 to 0.15 for breast cancer, and 0.01 to 0.51 for colorectal cancer. The closely related 94th percentile for the fraction without cancer who have the CS genotype ranged from 0.04 to 0.13 for prostate cancer, 0.01 to 0.14 for breast cancer, and less than 0.01 to 0.51 for colorectal can- cer. The 94th percentile of the bootstrap replications for the estimated fraction of cancers with the CS genotype ranged from 0.16 to 0.45 for prostate cancer, 0.12 to 0.30 for breast

cancer, and 0.08 to 0.27 for colorectal cancer. The large 94th

percentile of 0.51 on the latter two estimates for colorectal cancer is likely due to chance as the 90th percentiles (sixth highest out of 50) were only 0.09. Estimates of relative risk for cancer given CS genotype versus no CS genotype had a wide bootstrap distribution (Figure 1, bottom) with the mean values ranging from 20 to 25 for prostate cancer, 12 to 18 for breast cancer, and 30 to 43 for colorectal cancer.

For most models, the estimated joint probabilities of cancer

among subjects without the CS genotype was less than 0.01 in the time periods before the last, and the estimated joint probability of no cancer through the penultimate time pe- riod was at least 0.90. Also for prostate and colorectal cancer,

Table 2 Main results from twin study

Recessive Dominant

One Many Three One Many Three

gene genes genes gene genes genes

Prostate Cancer Goodness of fit p-value 0.05 0.14 0.07 0.22 0.01 0.01 Prevalence of CS genotype Estimate 0.01 0.10 0.01 0.01 <0.01 0.09

Mean 0.01 0.01 0.01 0.02 0.04 0.04 94th percentile 0.01 0.01 0.01 0.06 0.13 0.03

Fraction of cancers with CS Estimate 0.09 0.28 0.12 0.12 0.04 0.22 Mean 0.15 0.16 0.16 0.25 0.11 0.11 94th percentile 0.20 0.22 0.22 0.45 0.17 0.16

Breast Cancer Goodness of fit p-value 0.16 0.22 0.15 0.12 0.08 0.08 Prevalence of CS genotype Estimate 0.01 0.01 0.01 0.02 0.01 0.01

Mean 0.03 0.02 0.02 0.04 0.01 0.01 94th percentile 0.09 0.09 0.03 0.15 0.01 0.06

Fraction of cancers with CS Estimate 0.30 0.28 0.12 0.12 0.10 0.10 Mean 0.10 0.11 0.10 0.13 0.08 0.12 94th percentile 0.28 0.30 0.18 0.30 0.12 0.26

Colorectal Cancer Goodness of fit p-value 0.74 0.80 0.80 0.57 0.80 0.57 Prevalence of CS genotype Estimate <0.01 0.01 <0.01 <0.01 <0.01 <0.01

Mean <0.01 <0.01 <0.01 <0.02 0.05 0.04 94th percentile <0.01 <0.01 0.01 0.04 0.51 0.18

Fraction of cancers with CS Estimate 0.05 0.13 0.05 0.06 0.06 0.07 Mean 0.05 0.05 0.06 0.10 0.07 0.09 94th percentile 0.09 0.08 0.11 0.24 0.24 0.27

Mean and 94th percentile refer to 50 bootstrap replications. Many-gene model assumes equal allele frequencies. Three-gene model assumes allele frequencies in ratio of 1/3 : 1 : 3.

This content downloaded from 185.44.78.31 on Wed, 25 Jun 2014 04:26:52 AMAll use subject to JSTOR Terms and Conditions

Page 8: Genetic Susceptibility to Prostate, Breast, and Colorectal Cancer among Nordic Twins

Genetic Susceptibility to Prostate, Breast, and Colorectal Cancer 61

Estimated prevalence of the CS genotype prostate L

0 .1 .2+ 0 .1 .2+ 0 .1 .2+ 0 .1 .2+ 0 .1 .2+ 0 .1 .2+ cancer _ __

_

L

, 0 .1 .2+ 0 .1 .2+ 0 .1 .2+ 0 .1 .2+ 0 .1 .2+ 0 .1 .2+

0c'?n 1 0 .

.. . 0. .1 .2 0

.1.. 0 .1 .2+ 0 .1 .2+ 0 .1 .2+ 0 .1 .2+ 0 .1 .2+ 0 .1 .2+

recessive recessive recessive dominant dominant dominant one gene many genes three genes one gene many genes three genes

Estimated fraction of cancers with CS genotype

prostate cancer

0 .5 1 0 .5 1 0 .5 1 0 .5 1 0 .5 1 0 .5 1

cbreast _

I L .

i

___ 0 .5 1 0 .5 1 0 .5 1 0 .5 1 0 .5 1 0 .5 1

cancer 0 .5 1 0 .5 1 0 .5 1 0 .5 1 0 .5 1 0 .5 1

recessive recessive recessive dominant dominant dominant one gene many genes three genes one gene many genes three genes

Relative risk for cancer for CS genotype versus no CS genotype prostate cancer

L,.-LL , l L L1J ]

0 25 50 0 25 50 0 25 50 0 25 50 0 25 50 0 25 50

cnbreast M M

.L cancer -LAL

L.L, KJ.L 0 25 50 0 25 50 0 25 50 0 25 50 0 25 50 0 25 50

0 25 50 0 25 50 0 25 50 0 25 50 0 25 50 0 25 50

recessive recessive recessive dominant dominant dominant one gene many genes three genes one gene many genes three genes

Figure 1. Histograms of 50 bootstrap replications showing the range of results under the various models. Values greater than 0.2 are displayed as 0.2+. Many-gene model assumes equal allele frequencies. Three-gene model assumes allele frequencies in ratio of 1/3 : 1 : 3.

but not breast cancer, the estimated cancer incidence rates in subjects with the CS genotype were generally increasing with age.

To investigate the effect of violating the assumption of a common gene-environment interaction for MZ and DZ twins, we conducted the following sensitivity analysis. We gener- ated hypothetical counts based on the parameter estimates for prostate cancer, but with different values for the incidence cancer rates with the CS genotype in MZ and DZ twins. In particular, we specified a "true" incidence cancer rate in MZ (DZ) twins by adding (subtracting) 0.05 and 0.10 to the es- timated rates at time periods 2 and 3, and then reversed the adjustment for MZ and DZ twins. Compared with true val-

ues generated under the scenario, the estimated prevalences of the CS genotype differed by less than 0.01 and the esti- mated fraction of subjects with the CS genotype differed by 0.06 or less.

We also investigated possible bias because subjects with in- cident cancer prior to the start of follow-up were classified as

having no cancer (although in the Finnish cohort, the fraction misclassified in this manner was minimal). We increased the number of cancer-cancer subjects by 10% and the number of cancer-censored subjects by 20%, to hypothetically compen- sate for the shortfall of missed incident cancers, and estimated the parameters. Compared with the original estimates, the es- timated prevalences of the CS genotype differed by 0.04 or less

This content downloaded from 185.44.78.31 on Wed, 25 Jun 2014 04:26:52 AMAll use subject to JSTOR Terms and Conditions

Page 9: Genetic Susceptibility to Prostate, Breast, and Colorectal Cancer among Nordic Twins

62 Biometrics, March 2005

and the estimated fraction of subjects with the CS genotype differed by 0.06 or less.

7. Discussion

Based on the estimates and the sensitivity analysis, our pri- mary conclusion is that the contribution of genetic suscepti- bility to prostate, breast, and colorectal cancers is small to moderate. This result agrees qualitatively with Lichtenstein et al. (2000) and differs from Risch (2001) who argued that a large genetic component was plausible. We think the un- derlying reason for our different result is that we fit a more general model than Risch (2001). In particular, we modeled the dependence among cancer incidence rates in twins with- out the CS genotype while Risch (2001, Appendix) assumed a reduced model with independence of cancer incidence among twins without the CS genotype. This reduced model (along with the use of summary rather than age-specific data) could explain the results in Risch (2001) that a large proportion of cancers could be attributed to genetic susceptibility if al- lele frequency were high and that the estimated penetrance is >1 if the allele frequency for breast cancer were small. In contrast, under our more general model based on age-specific data, we estimated low prevalences of the CS genotype and a small fraction of cancers with the CS genotype. As a check, we also fit our models under the same reduced model as Risch (2001), and found high estimated fractions of cancers with the CS genotype in some models and cancers. Our low estimated prevalence of the CS genotype for breast and colorectal can- cer is consistent with the known rare genes that predispose women to breast cancer (Hopper, 2001) and colorectal cancer (Calvert and Frucht, 2002). With prostate cancer the genetic etiology is less clear (Gronberg, 2003).

ACKNOWLEDGEMENTS

We thank Grant Izmirlian, Victor Kipnis, Barnett S. Kramer, Don Corle, Jian-Lun Xu, Mitchell Gail, Joe Gatswirth, Ruth Pfeiffer, Neil Risch, and Hongyu Zhao for the helpful com- ments, and Emma Nilsson for creating the data tables from the original data files. The Swedish Twin Registry is funded by a grant from the Department of Higher Education, the Swedish Scientific Council, and ASTRA Zeneca. The Dan- ish Twin study has been funded by research grants from the Danish Cancer Society (36/79), the National Cancer Insti- tute (R35 CA 42581), and the U.S. National Institute on Ag- ing NIA-PO1-AG08761. The Finnish Twin Cohort study has been supported by grants from the Finnish Cancer Society.

R?SUM

Afin d'4tudier le rgle des facteurs g~nitiques dans le d~veloppement des pathologies canc~reuses, nous avons d~velopp6 une nouvelle mithode pour analyser les donnies sur le cancer de la prostate, le cancer su sein et les cancers colorectaux A partir des registres suddois, danois et finlandais de jumeaux monozygotes (MZ) et dizygotes (DZ) de mime sexe. Dans l'esprit d'une analyse de sensibilit6, nous avons modilis6 un facteur de susceptibilit4 gn~ntique au cancer (CS) autosomique dominant ou r~cessif impliquant un ghne unique, de nombreux ghnes avec des frdquences all4liques 6gales ou trois ghnes avec des frdquence alldliques pouvant varier d'un facteur neuf. Nous avons aussi mod6lisd la probabilith jointe de l'incidence de cancer parmi cinq catigories d'age condi-

tionnellement A la pr6sence ou non du genotype CS. Les prin- cipales hypotheses sont 1) la distribution jointe des effets en- vironnementaux non observes au sein d'une paire de jumeaux conditionnellement A la presence ou l'absence du genotype CS est identique pour les jumeaux MZ et DZ, 2) la probabilite de survenue d'un cancer conditionnellement a la pr6sence ou l'absence du genotype CS et aux effets environnementaux non

observes (i.e. une interaction gene*environnement) est iden- tique chez les jumeaux MZ et DZ, et 3) la probabilite de cancer est independante entre les jumeaux porteur du genotype CS. L'estimation s'est faite par maximisation de la vraisemblance sur la frequence alldlique en utilisant deux types d'algorithme EM. Les moddles pr6sentaient une adequation acceptable ou bonne. La variabilite a ete estimee en utilisant une approche de bootstrap sur 50 replicats seulement pour des raisons de faisabilit6. Le 94eme percentile de la proportion de cancers avec un genotype CS variait selon les modiles genitiques de 0.16 0.45 pour le cancer de la prostate, de 0.12 a 0.30 pour le cancer du sein, et de 0.08 A 0.27 pour le cancer colorectal. Nous concluons que la susceptibilite genetique contribue au mieux moderement A l'incidence des cancers de laprostate, du sein, et colorectaux.

REFERENCES

Andrew, T., Hart, D. J., Snieder, H., de Lange, M., Spector, T. D., and MacGregor, A. J. (2001). Are twins and

singletons comparable? A study of disease-related and

lifestyle characteristics in adult women. Twin Research

4, 464-477. Bouchard, T. J., Jr., Lykken, D. T., McGue, M., Segal, N. L.,

and Tellegen, A. (1990). Sources of human psychological differences: The Minnesota study of twins reared apart. Science 268, 223-228.

Calvert, P. M. and Frucht, H. (2002). The genetics of colorec- tal cancer. Annals of Internal Medicine 137, 603-612.

Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Max- imum likelihood from incomplete data via the EM algo- rithm. Journal of the Royal Statistical Society, Series B

38, 1-38. Efron, B. and Gong, G. (1983). A leisurely look at the Boot-

strap, the Jackknife, and Cross-Validation. American Statistician 37, 36-48.

Gronberg, H. (2003). Prostate cancer epidemiology. Lancet

361, 859-864. Hopper, J. L. (2001). More breast cancer genes? Breast Cancer

Research 3, 154-157. Hristova, L. and Hakama, M. (1997). Effect of screening for

cancer in the Nordic countries on deaths, cost and quality of life up to the year 2017. Acta Oncologica Supplement 9, 1-60.

Johnson, E., Banta, H. D., and Shersten, T. (2001). Health

technology assessment and screening in Sweden. Interna- tional Journal of Technology Assessment in Health Care 17, 380-386.

Kujala, U. M., Kaprio, J., and Koskenvuo, M. (2002). Mod- ifiable risk factors as predictors of all-cause mortality: Role of genetics and childhood environment. American Journal of Epidemiology 156, 985-993.

Lichtenstein, P., Holm, N. V., Verkasalo, P. K., Iliadou, A., Kaprio, J., Koskenvuo, M., Pukkala, E., Skytthe, A., and Hemminki, K. (2000). Environmental and her- itable factors in the causation of cancer. Analyses of

This content downloaded from 185.44.78.31 on Wed, 25 Jun 2014 04:26:52 AMAll use subject to JSTOR Terms and Conditions

Page 10: Genetic Susceptibility to Prostate, Breast, and Colorectal Cancer among Nordic Twins

Genetic Susceptibility to Prostate, Breast, and Colorectal Cancer 63

cohorts of twins from Sweden, Denmark, and Finland. New England Journal of Medicine 343, 78-85.

Olsen, A. H., Jensen, A., Njor, S. J., Villadsen, E., Schwartz, W., Vejborg, I., and Lynge, E. (2003). Breast cancer inci- dence after the start of mammography screening in Den- mark. British Journal of Cancer 88, 362-365.

Risch, N. (2001). The genetic epidemiology of cancer: Inter- preting family and twin studies and their implications for molecular genetic approaches. Cancer Epidemiology, Biomarkers, and Prevention 10, 733-741.

Rostgaard, K., Vaeth, M., Host, H., Madsen, M., and Lynge, E. (2001). Age-period-cohort modelling of breast cancer incidence in the Nordic countries. Statistics in Medicine 20, 47-61.

Stunkard, A. J., Harris, J. R., Pedersen, N. L., and McClearn, G. E. (1990). The body-mass index of twins who have been reared apart. New England Journal of Medicine 322, 1483-1487.

Tazi, M. A., Faivre, J., and Benhmiche, A. M. (1996). Mass screening for colorectal cancer: What stage are we up to? Bulletin du Cancer 83, 746-749.

Wolfram Research. (2002). Mathematica, Version 4.2, Cham- paign, Illinois.

Received September 2003. Revised May 2004. Accepted May 2004.

This content downloaded from 185.44.78.31 on Wed, 25 Jun 2014 04:26:52 AMAll use subject to JSTOR Terms and Conditions