12
An Age-Stratified Poisson Model for Comparing Trends in Cancer Rates Across Overlapping Regions Yi Li *; 1 , Ram C. Tiwari 2 , and Zhaohui Zou 3 1 Department of Biostatistics, Harvard University and Dana-Farber Cancer Institute, 44 Binney Street, LW211 Boston, MA 02115, USA 2 Statistical Research and Applications Branch, National Cancer Institute, Rockvile, MD 20878, USA 3 Information Management Services, 12501 Prosperity Drive, Silver Spring, MD 20904, USA Received 5 November 2007, revised 25 April 2008, accepted 29 April 2008 Summary The annual percent change (APC) has been used as a measure to describe the trend in the age-adjusted cancer incidence or mortality rate over relatively short time intervals. The yearly data on these age- adjusted rates are available from the Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute. The traditional methods to estimate the APC is to fit a linear regression of logarithm of age-adjusted rates on time using the least squares method or the weighted least squares method, and use the estimate of the slope parameter to define the APC as the percent change in the rates between two consecutive years. For comparing the APC for two regions, one uses a t-test which assumes that the two datasets on the logarithm of the age-adjusted rates are independent and normally distributed with a common variance. Two modifications of this test, when there is an overlap between the two regions or between the time intervals for the two datasets have been recently developed. The first mod- ification relaxes the assumption of the independence of the two datasets but still assumes the common variance. The second modification relaxes the assumption of the common variance also, but assumes that the variances of the age-adjusted rates are obtained using Poisson distributions for the mortality or inci- dence counts. In this paper, a unified approach to the problem of estimating the APC is undertaken by modeling the counts to follow an age-stratified Poisson regression model, and by deriving a corrected Z-test for testing the equality of two APCs. A simulation study is carried out to assess the perform- ance of the test and an application of the test to compare the trends, for a selected number of cancer sites, for two overlapping regions and with varied degree of overlapping time intervals is presented. Key words: Age-adjusted incidence/mortality rates; Age-stratified Poisson Regression; An- nual percent change (APC); Hypothesis testing; Surveillance; Trends. 1 Introduction The American Cancer Society (ACS) in its annual publication Cancer ACS 2007 (http:// www.cancer.org/) reports that in 2007 about 1.5 million new cancer cases are expected to be diag- nosed, and approximately 560,000 Americans are expected to die of cancer. Cancer is the most com- mon cause of death in US, exceeded only by heart disease, and accounts for 1 of every 4 deaths. The same report also reveals that, for a number of cancer sites (such as breast, stomach, colon and rectum, lung and bronchus and leukemia), the age-adjusted cancer mortality rates have been steadly decreasing in recent years. In addition, the National Surveillance, Epidemiology, and End Results (SEER) Pro- gram of the National Cancer Institute (NCI) periodically publishes similar reports on trends of cancer incidence at http://seer.cancer.gov/csr; see Ries et al. (2003) So much has been at stake in terms of * Corresponding author: e-mail: [email protected], Phone: +1 617 632 5134, Fax: +1 617 632 2444 608 Biometrical Journal 50 (2008) 4, 608–619 DOI: 10.1002/bimj.200710430 # 2008 WILEY-VCH Verlag GmbH &Co. KGaA, Weinheim

An Age-Stratified Poisson Model for Comparing Trends in Cancer Rates Across Overlapping Regions

  • Upload
    yi-li

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

Page 1: An Age-Stratified Poisson Model for Comparing Trends in Cancer Rates Across Overlapping Regions

An Age-Stratified Poisson Model for Comparing Trendsin Cancer Rates Across Overlapping Regions

Yi Li*; 1, Ram C. Tiwari2, and Zhaohui Zou3

1 Department of Biostatistics, Harvard University and Dana-Farber Cancer Institute, 44 Binney Street,LW211 Boston, MA 02115, USA

2 Statistical Research and Applications Branch, National Cancer Institute, Rockvile, MD 20878, USA3 Information Management Services, 12501 Prosperity Drive, Silver Spring, MD 20904, USA

Received 5 November 2007, revised 25 April 2008, accepted 29 April 2008

Summary

The annual percent change (APC) has been used as a measure to describe the trend in the age-adjustedcancer incidence or mortality rate over relatively short time intervals. The yearly data on these age-adjusted rates are available from the Surveillance, Epidemiology, and End Results (SEER) Program ofthe National Cancer Institute. The traditional methods to estimate the APC is to fit a linear regression oflogarithm of age-adjusted rates on time using the least squares method or the weighted least squaresmethod, and use the estimate of the slope parameter to define the APC as the percent change in the ratesbetween two consecutive years. For comparing the APC for two regions, one uses a t-test which assumesthat the two datasets on the logarithm of the age-adjusted rates are independent and normally distributedwith a common variance. Two modifications of this test, when there is an overlap between the tworegions or between the time intervals for the two datasets have been recently developed. The first mod-ification relaxes the assumption of the independence of the two datasets but still assumes the commonvariance. The second modification relaxes the assumption of the common variance also, but assumes thatthe variances of the age-adjusted rates are obtained using Poisson distributions for the mortality or inci-dence counts. In this paper, a unified approach to the problem of estimating the APC is undertaken bymodeling the counts to follow an age-stratified Poisson regression model, and by deriving a correctedZ-test for testing the equality of two APCs. A simulation study is carried out to assess the perform-ance of the test and an application of the test to compare the trends, for a selected number of cancersites, for two overlapping regions and with varied degree of overlapping time intervals is presented.

Key words: Age-adjusted incidence/mortality rates; Age-stratified Poisson Regression; An-nual percent change (APC); Hypothesis testing; Surveillance; Trends.

1 Introduction

The American Cancer Society (ACS) in its annual publication Cancer ACS 2007 (http://www.cancer.org/) reports that in 2007 about 1.5 million new cancer cases are expected to be diag-nosed, and approximately 560,000 Americans are expected to die of cancer. Cancer is the most com-mon cause of death in US, exceeded only by heart disease, and accounts for 1 of every 4 deaths. Thesame report also reveals that, for a number of cancer sites (such as breast, stomach, colon and rectum,lung and bronchus and leukemia), the age-adjusted cancer mortality rates have been steadly decreasingin recent years. In addition, the National Surveillance, Epidemiology, and End Results (SEER) Pro-gram of the National Cancer Institute (NCI) periodically publishes similar reports on trends of cancerincidence at http://seer.cancer.gov/csr; see Ries et al. (2003) So much has been at stake in terms of

* Corresponding author: e-mail: [email protected], Phone: +1 617 632 5134, Fax: +1 617 632 2444

608 Biometrical Journal 50 (2008) 4, 608–619 DOI: 10.1002/bimj.200710430

# 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

Page 2: An Age-Stratified Poisson Model for Comparing Trends in Cancer Rates Across Overlapping Regions

human life and cost – for example, the government agencies such as the National Health of Institutes(NIH), and many private sectors spend billions of dollars every year on cancer research, health insur-ance and medical and other costs – that there is an urgent need for new methods that produce moreaccurate and reliable estimates of measures of cancer trends.

The annual percent change (APC) has been used as a measure of cancer trends over short timeperiods, and to compare the recent cancer trends by gender or by geographic regions, one comparestheir APC values using the two-sample pooled t-test (Kleinbaum et al., 1988) that assumes that thedatasets on age-adjusted rates under the study are independent. However, a fundamental statisticaldifficulty arises when such comparisons, largely for policy making purposes, have to be made forregions or time intervals that overlap, e.g. comparing the most recent changes in trends of cancer ratesin a local area (e.g. the mortality rate of breast cancer in California) with a more global level (i.e. thenational mortality rate) over two overlapping time periods. For example, as detailed in the data analy-sis section, it is of substantial interest to compare the changes in California cancer mortality rates withthe national cancer mortality rates in the last 15 years.

Recently, Li and Tiwari (2007) and Li et al. (2007) developed Z-tests which adjust for the dependencebetween the two APCs, and are more efficient than the naive test which assumes independence. How-ever, these tests are based on the logarithmic transformation of the age-adjusted rates, and fits a simplelinear regression model of the transformed data on time using either the ordinary least squares (OLS) orthe other weighted least squares (WLS) procedures. The proposed test procedure is based on the naturalassumption that the age-specific mortality or incidence counts are results of underlying Poisson pro-cesses (Brillinger, 1986), and hence are realizations of independent Poisson random variables. The age-specific instantaneous hazards are modeled by a log-link function, thus leading to an age-stratified Pois-son model. The estimation of the parameters is then carried out using a likelihood-based approach.

The rest of the paper is organized as follows. In Section 2, we briefly review the existing tests, andderive the new test in Section 3. To compare the performance of the proposed test with respect to theabove mentioned tests, a simulation study is carried out in Section 4. In this section, we also giveapplication to breast cancer mortality data from California (CA) and the US extracted from theSEER*STAT software of the SEER Program. Section 5 ends this paper with a short discussion.

2 A Brief Review of Existing Tests

Consider two regions, and let dkji denote the number of counts (deaths or new cancer cases) from thepopulation at risk nkji observed in Region k (k ¼ 1; 2) in age-group j (j ¼ 1; . . . ; J) and at timesT1; . . . ; Tm for Region 1 and Tsþ1; . . . ; Tsþn for Region 2, where T1 � Tsþ1 < Tm � Tsþn, with0 � s < m leading to overlapping time intervals. Note that this formulation is general and allows oneregion to have fewer time points than the other. In the SEER program, it is common to choose nkji (atyear Ti) to be the mid-year population representing the total person-years in one year, with the as-sumption of “drop-outs” being uniform over the unit-intervals. The age-adjusted rates are defined as

rki ¼PJj¼1

wjdkji

nkji;

where wj > 0; j ¼ 1; . . . ; J; are the known standards for the age group j so thatPJ

j¼1 wj ¼ 1. For theSEER analysis, there are J ¼ 19 standard age-groups consisting of 0–1, 1–4, 5–9, . . . ; 85þ, and wj

are chosen to be the year 2000 population standards (Fay et al., 2006).Let yki ¼ log ðrkiÞ, be the logarithmic transformations of the age-adjusted rates. Consider the linear

regression models

yki ¼ b0k þ b1ktki þ eki; i ¼ 1; . . . ; Ik ; ð1Þfor k ¼ 1; 2, flagging Regions 1 and 2, respectively. Here eki are random errors with mean 0, and tki

corresponds to the calendar times of data collection in region k with I1 ¼ m and I2 ¼ n. More specifi-

Biometrical Journal 50 (2008) 4 609

# 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

Page 3: An Age-Stratified Poisson Model for Comparing Trends in Cancer Rates Across Overlapping Regions

cally, ðt11; . . . ; t1I1Þ ¼ ðT1; . . . ; TmÞ, while ðt21; . . . ; t2I2Þ ¼ ðTsþ1; . . . ; TsþnÞ. For the two regions, theannual percent change (APC) are defined as APCk ¼ 100ðeb1k � 1Þ ¼: 100b1k, for a small b1k, e.g. inthe order of 10�2 (Kim et al., 2000; Fay et al., 2006; Tiwari et al., 2006).

Then under the assumptions that eki are independent and have common variance s2, the two-samplepooled t-test (Kleinbaum et al., 1988) for testing the null hypothesis H0 : APC1 ¼ APC2 versus thealternative Ha : APC1 6¼ APC2 is given by

Tt ¼bb11� bb12ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

ss2ððPI1

i¼1 ðt1i � �tt1Þ2Þ�1 þ ðPI2

i¼1 ðt2i � �tt2Þ2Þ�1Þq � tðI1þI2�4Þ ; ð2Þ

where �ttk ¼PIk

i¼1 tki=Ik for k ¼ 1; 2, and ss2 is the “pooled” unbiased estimate of s2 given by

ss2 ¼PI1

i¼1 ðy1i � yy1iÞ2 þPI2

i¼1 ðy2i � yy2iÞ2

I1 þ I2 � 4;

where yyki ¼ bb0k þ bb1k tki are the predictions for k ¼ 1; 2. Here, bb0k and bb1k are obtained from the leastsquares estimation. That is,

bb1k ¼PIk

i¼1 ðtki � �ttkÞðyki � �yykÞPiki¼1 ðtki � �ttkÞ2

; bb0k ¼ �yyk� bb1k �ttk ;

where �yyk ¼PIk

i¼1 yki=Ik.The above test is not appropriate, however, when there is an overlap between the two regions or the

two time periods. For this case, Li and Tiwari (2007) proposed the following corrected Z-test

ZCT ¼bb11� bb12

ss2 s�21 þ s�2

2 � 2s12s�21 s�2

2ðnðOÞÞ2n1n2

� �n o1=2; ð3Þ

where s2k ¼

Pi¼1 ðtki � �ttkÞ2, s12 ¼

Pmsþ1 ðTi � �tt1ÞðTi � �tt2Þ, nk ¼

Pmi¼sþ1

PJj¼1 nkji for k ¼ 1; 2,

nðOÞ ¼Pm

i¼sþ1

PJj¼1 nðOÞji and and nðOÞji are the numbers of at-risk population in the overlapping region.

Note that there is no suffix k in nðOÞji . The sign of s12 determines, whether the covariance between bb11and bb12 is positive or negative, and when there is no overlap in time intervals, s12 ¼ 0. Under the log-normal model, the corrected ZCT test was shown to follow a standard normal distribution under thenull hypothesis, and to be more efficient than the pooled t-test; see Li and Tiwari (2007).

However, one assumption in Li and Tiwari (2007) is the equal variance in both regression models,which may not be realistic, especially for rare cancers. A further refinement has been made to derivethe variance of yki by using the Poisson assumptions on the first two moments of the counts dkji, i.e.EðdkjiÞ ¼ var ðdkjiÞ. Under these assumptions, the consistent estimate of the error variance of eki is

given by v2ki ¼ 1

r2ki

PJj¼1 w2

jdkji

n2kji

, leading to the following weighted least squares test proposed by Li et al.

(2007), referred to as ZWLS:

ZWLS ¼~bb11� ~bb12

~ss�21 þ ~ss�2

2 �2 ~ss12 ~ss�21 ~ss�2

2ðnðOÞÞ2n1n2

n o1=2: ð4Þ

with

~ss21 ¼

Pmi¼1ðTi � ~tt1Þ2=v2

1i and ~ss22 ¼

Psþn

i¼sþ1ðTi � ~tt2Þ2=v2

2i ;

~ss12 ¼Pm

i¼sþ1ðTi � ~tt1Þ ðTi � ~tt2Þ

vðoÞ12i

v21iv

22i

;

610 Y. Li et al.: An Age-Stratified Poisson Model for Comparing Trends

# 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

Page 4: An Age-Stratified Poisson Model for Comparing Trends in Cancer Rates Across Overlapping Regions

where

vðoÞ12i ¼

1r1ir2i

PJj¼1

w2j

dðoÞji

ðnðoÞji Þ2;

and dðoÞji are the counts in the overlapping region and during the overlapping period. Here,

~tt1 ¼Pm

i¼1 Ti=v21iPm

i¼1 1=v21i

; ~tt2 ¼Psþn

i¼sþ1 Ti=v22iPsþn

i¼sþ1 1=v22i

;

and ~bb11;~bb12 are weighted least square estimates of b11; b12.

Under the null hypothesis, because of the normal approximation, ZWLS approximately follows astandard normal distribution. The ZWLS has been shown to be more conservative than ZCT in retainingthe size of the test, but is more powerful for the common cancer sites; see Li et al. (2007). However,there are several disadvantages of the existing methods. First, one key step of Li et al. (2007) is thenormal approximation of the age-adjusted rates. Secondly, both Li et al. (2007) and Li and Tiwari(2007) need adjustments for zero counts.

3 Age-stratified Poisson Regression Model

As the existing approaches to dealing with age-adjusted cancer rates were all based on the normalapproximation, we take a more natural route in the sequel by considering the Poisson nature of theunderlying count data and propose an age-stratified Poisson regression to describe the change trend ofincident (or death) counts on time. Based on this model, a proper test that accounts for overlapping isproposed.

Specifically, since dkji, the number counts (deaths or new cancer cases) observed in Region k(k ¼ 1; 2) in age-group j, is a count, we assume that

dkji �ind

Pois ðnkjilkjiÞ ;with

log lkji ¼ b0kj þ b1ktki ; ð5Þ

which is referred to as the Age-stratified Poisson Regression Model as the age-specific intercept b0kj isassumed for age-group j. The common slope b1k is of particular importance as it transcribes the trendsof mortality or incidence and, in particular, determines the APC value.

Again let APC1 and APC2 be the corresponding APC values for these two Poisson regressions. Anatural test for the null hypothesis H0 : APC1 ¼ APC2 versus the alternative hypothesisH1 : APC1 6¼ APC2 would be

ZPOIS ¼bb11� bb12ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

Var fbb11� bb12gq ;

where bb11 and bb12 are the maximum likelihood estimates of b11 and b12 derived in the Appendix.

Because of the possible overlapping of Regions 1 and 2, bb11 and bb12 may be correlated. Thus the

key to the derivation of the test lies in a correct evaluation of Cov ðbb11; bb12Þ.

3.1 Derivation of the Test

To proceed, we let bk ¼ ðb1k; b0k1; . . . ; b0kJÞ0, k ¼ 1; 2, whose estimates bk can be obtained by solving

the score equations [based on (5)]

UkðbkÞ ¼ 0 ;

Biometrical Journal 50 (2008) 4 611

# 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

Page 5: An Age-Stratified Poisson Model for Comparing Trends in Cancer Rates Across Overlapping Regions

for k ¼ 1; 2, where Uk ¼ ðU1k;1;U0k;1; . . . ;U0k;JÞ0 and

U1k;1 ¼PIk

i¼1

PJj¼1

nkjitki exp ðb0kj þ b1ktkiÞ �PIk

i¼1

PJj¼1

dkjitki ;

U0k;j ¼PIk

i¼1nkji exp ðb0kj þ b1ktkiÞ �

PIk

i¼1dkji ;

for j ¼ 1; . . . ; J.As UkðbbkÞ ¼ 0, expanding it around the true value bk, and ignoring the higher order terms yields

0 � 1ffiffiffiffiIkp UkðbbkÞ ¼

1ffiffiffiffiIkp UkðbkÞ þ

1Ik

Uð1Þk ðbkÞ fffiffiffiffiIkpðbbk �bkÞg þ opð1Þ ;

where Uð1Þk ¼ @UkðbkÞ=@bk are ðJ þ 1Þ � ðJ þ 1Þ matrices with its 1-st row as

ðUð1Þ1k;11;Uð1Þ1k;01; . . . ;Uð1Þ1k;0JÞ and the ðjþ 1Þ-th row ðj ¼ 1; . . . ; JÞ as ðUð1Þ0k;j1;U

ð1Þ0k;0j1; . . . ;Uð1Þ0k;0jJÞ, for

k ¼ 1; 2. Here

Uð1Þ1k;11 ¼@U1k;1

@b1k¼PIk

i¼1

PJj¼1

nkjit2ki exp ðb0kj þ b1ktkiÞ ;

Uð1Þ1k;0j ¼@U1k;1

@b0kj¼PIk

i¼1nkjitki exp ðb0kj þ b1ktkiÞ ;

Uð1Þ0k;j1 ¼@U0k;j

@b1k¼PIk

i¼1nkjitki exp ðb0kj þ b1ktkiÞ ;

Uð1Þ0k;0jj0 ¼@U0k;j

@b0kj0¼ djj0

PIk

i¼1nkji exp ðb0kj þ b1ktkiÞ ;

and djj0 ¼ 1 if j ¼ j0 and 0 otherwise.Denote by Ak ¼ plimIk!1 � Uð1Þk ðbkÞ=Ik for k ¼ 1; 2, where plim denotes the limit in probability.

Then for large I1ð� mÞ and I2ð� nÞ, standard probabilistic arguments yield

fffiffiffiffiI1pðbb1�b1Þ;

ffiffiffiffiI2pðbb2�b2Þg�

dA�1

11ffiffiffiffiI1p U1ðb1Þ; A�1

21ffiffiffiffiI2p U2ðb2Þ

� �:

Here, �d denotes approximate equivalence in joint distribution functions. Hence,

Var ðffiffiffiffimpðbb1�b1Þ;

ffiffiffiffimpðbb1�b1ÞÞ ¼

:A�1

11m

Cov ðU1ðb1Þ;U1ðb1ÞÞ A�T1 ;

Var ðffiffiffinpðbb2�b2Þ;

ffiffiffinpðbb2�b2Þ

0Þ ¼: A�12

1n

Cov ðU2ðb2Þ;U2ðb2ÞÞ A�T2 ;

Cov ðffiffiffiffimpðbb1�b1Þ;

ffiffiffinpðbb2�b2Þ

0Þ ¼: A�11

1ffiffiffiffiffiffimnp Cov ðU1ðb1Þ;U2ðb2ÞÞ A�T

2 :

Let V ¼ 1m Cov ðU1ðb1Þ;U1ðb1ÞÞ, W ¼ 1

n Cov ðU2ðb2Þ;U2ðb2ÞÞ, S ¼ 1ffiffiffiffiffimnp Cov ðU1ðb1Þ;U2ðb2ÞÞ, and

VV ; WW ; SS be the corresponding estimates, whose derivations are given in the Appendix.Then, dCovCov ðbb1; bb1Þ ¼ mfUð1Þ1 ðbb1Þg

�1 VVfUð1Þ1 ðbb1Þg�T ;dCovCov ðbb2; bb2Þ ¼ nfUð1Þ2 ðbb2Þg

�1 WWfUð1Þ2 ðbb2Þg�T ;dCovCov ðbb1; bb2Þ ¼

ffiffiffiffiffiffimnp

fUð1Þ1 ðbb1Þg�1 SSfUð1Þ2 ðbb2Þg

�T :

612 Y. Li et al.: An Age-Stratified Poisson Model for Comparing Trends

# 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

Page 6: An Age-Stratified Poisson Model for Comparing Trends in Cancer Rates Across Overlapping Regions

These are three ðJ þ 1Þ � ðJ þ 1Þ matrices, the ð1; 1Þ entries of which are ss21 ¼dVarVar ðbb11Þ;

ss22 ¼dVarVar ðbb12Þ and ss12 ¼dCovCov ðbb11; bb12Þ, respectively. From this we compute dVarVar ðbb11� bb12Þ¼dVarVar ðbb11Þ þdVarVar ðbb12Þ � 2 dCovCov ðbb11; bb12Þ. Hence, the Z-test for comparing APC values is given by

ZPOIS ¼bb11� bb12

ðss21þ ss2

2�2 ss12Þ1=2; ð6Þ

which follows the standard normal distribution under H0 : b11 ¼ b12. The computation of bb11; bb12 isgiven in the Appendix.

3.2 ARE Comparison with the WLS test

It is of substantial interest to evaluate the gains in efficiency of the proposed test compared with the WLS

test. First note (5) implies that EðrkiÞ � EP

j wjdkji

nkji

� �¼ ðP

j wj eb0kjÞ eb1kTi . This in turn implies that

Eðlog rkiÞ ¼: log E ðrkiÞ ¼ logP

jwj eb0kj

� �þ b1ktki ; ð7Þ

when Var ðrkiÞ is small, which is often the case for the cancer incidence and mortality data (Kim et al.,2000).

A comparison between (1) and (7) reveals that models (1) and (5) approximately specify the samefirst moment of the age-adjusted cancer rates, making it possible to compare the efficiency of the testsbased on these two models via the measure of the Pitman Asymptotic Relative Efficiency. Specifi-cally, standard asymptotic analysis will yield

bb11� bb12 � Nðb11 � b12; ss21þ ss2

2�2 ss12Þ ;

while, for the WLS estimates,

~bb11� ~bb12 � N b11 � b12; ~ss�21 þ ~ss�2

2 �2 ~ss12 ~ss�21 ~ss�2

2ðnðOÞÞ2n1n2

� �:

Hence, the Pitman Asymptotic Relative Efficiency (ARE) comparing tests (6) and (4), which is theratio of the noncentralities of the above two normal distributions, is given by

ARE ¼~ss�2

1 þ ~ss�22 �2 ~ss12 ~ss�2

1 ~ss�22ðnðOÞÞ2n1n2

ss21þ ss2

2�2 ss12

:

The evaluation of (8) typically involves numerical computations.

4 SEER Mortality Data Analysis and Simulations

Li et al. (2007) demonstrated that ZWLS performs better than ZCT, via the calculation of ARE, as ZCT

relies on the common variance assumption, which may not be realistic. Hence, we focus this paper oncomparing ZPOIS with ZWLS. To comprehensively evaluate these tests, we consider several scenarios ofoverlap in two regions and in two different time intervals. Specifically, we assume that Region 1consists of Georgia (GA), South Carolina (SC), and North Carolina (NC), and that Region 2 consistsof NC, Virginia (VA) and Maryland (MD); with NC as the overlapping state between the two regions.The three different time intervals, with varying degree of overlap in the intervals, are taken to be: (a)[1980, 1989] for Region 1, and [1990, 1999] for Region 2 so that there is no overlap between the twotime intervals and, hence, s12 ¼ 0; (b) [1980, 1989] for Region 1, and [1983, 1992] for Region 2 sothat there a considerable overlap of six years between the two intervals and s12 ¼ 12:25; (c) [1980,1989] for Region 1, and [1987, 1996] for Region 2 so that there is a little overlap of three years’between the two intervals and s12 ¼ �34:75.

Biometrical Journal 50 (2008) 4 613

# 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

Page 7: An Age-Stratified Poisson Model for Comparing Trends in Cancer Rates Across Overlapping Regions

The counts, dkji were generated based on model (5) with tki taking values in the intervals corre-sponding to the two regions stated above. More specifically, the t1i take values of f0; 1; . . . ; 9g, whilethe t2i take values of f10; . . . ; 19g, f3; . . . ; 12g, and f7; . . . ; 16g, respectively for cases (a)–(c).

In order to fully specify lkji in (5), we assume that b0kj ¼ log ðdkj1=nkj1Þ � b1ktk1, where dkj1 andnkj1 are respectively the observed number of deaths and the number of at-risk population at the begin-ning of the intervals considered, and take b1k ¼ log ðAPCk=100þ 1Þ, based on the specified values ofAPCk. The age-specific counts for the overlapping state, NC, are generated from Poisson distributionswith means, nðoÞji � 1

2 ðl1ji þ l2jiÞ, where nðoÞji denotes the at-risk population in the overlapping region.When l1ji ¼ l2ji, this reduces to the situation specified by the null hypothesis. The number of at riskpopulation and the observed number of deaths were obtained from the SEER database for all malig-nant male cancers and prostate cancer. The values of APC were assumed to range from �0.3% to3.0%. For each parameter configuration, a total of 1000 simulated data were obtained. The results forthe three time-overlapping cases are summarized in Tables 1–3.

We remark that that, even though both ZWLS and ZPOIS are derived under different model assumptions,they are both valid tests for testing the equality of two APCs and hence the ARE defined in (8) is valid.The tables show that the ARE of ZPOIS with respect to ZWLS is greater than 1 for all the three cases, mean-ing that ZPOIS would be more powerful than ZWLS when the alternative hypothesis is true. The tables alsoshow that in most situations ZPOIS outperforms the ZWLS in retaining the Type I error probabilities and,hence, yields a more valid test. Also the powers of both WLS and Poisson-based tests are sensitive to thedelta values (the differences of APC values). The larger the delta values are, the more powerful the testsare. The larger delta values also lead to slightly larger AREs, though the differences are not so obvious.

It is of substantial interest to compare the changes in cancer mortality rates in California with thenational levels starting late 1980’s as a California law (Health and Safety Code, Section 103885) was

614 Y. Li et al.: An Age-Stratified Poisson Model for Comparing Trends

Table 1 Comparison of the power functions under various hypotheses betweenthe Poisson-based test ZPOIS and the weighted-least-squares based test ZWLS fortwo overlapping regions over disjoint time intervals, Region 1 (1980–1989) vs.Region 2 (1990–1999). APC1 and APC2 are the annual percent changes in Re-gions 1 and 2, respectively.

Cancer Sites APC1 APC2 ARE ZWLS ZPOIS

All Malignant 0.100 0.100 1.1490 0.059 0.050�0.300 �0.300 1.1491 0.049 0.045

0.500 0.500 1.1492 0.060 0.0491.000 1.000 1.1493 0.048 0.0533.000 3.000 1.1504 0.047 0.0570.100 0.500 1.1507 0.877 0.907�0.300 0.300 1.1509 0.998 1.000

1.000 2.000 1.1515 1.000 1.0001.000 3.000 1.1518 1.000 1.000

Prostate 0.100 0.100 1.1610 0.043 0.045�0.300 �0.300 1.1611 0.041 0.047

0.500 0.500 1.1612 0.051 0.0411.000 1.000 1.1613 0.046 0.0513.000 3.000 1.1614 0.045 0.0510.100 0.500 1.1617 0.182 0.212�0.300 0.300 1.1619 0.357 0.40

1.000 2.000 1.1622 0.773 0.8301.000 3.000 1.1634 1.000 1.000

# 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

Page 8: An Age-Stratified Poisson Model for Comparing Trends in Cancer Rates Across Overlapping Regions

passed then, which mandated the reporting of malignancies diagnosed throughout the state. For thispurpose, we applied the proposed methodology to compare the annual percent change (APC) in theage-adjusted mortality rates for the United States (US) for the period from 1988–2002 to that ofCalifornia (CA) for the period from 1990 to 2004. We fitted the weighted linear models as well as theage-stratified Poisson model, and applied both tests to compare the age-adjusted mortality rates offemale breast cancer in CA for the 16-year period from 1989–2004 to that of US for the 16-yearperiod from 1987–2002, for which the national mortality data were available. The observed values ofthe log-transformed annual age-adjusted rates and fitted regression lines from the ZPOIS test procedureare plotted in Figure 1. The parameter estimates and the values of the test statistics are summarized inTable 4. The results indicated the mortality rates of Breast cancer for California and the US havedecreased. Both tests reject the null hypothesis of equality of the two APCs, indicating that the annualpercent change (APC) of California, is significantly different from the national level. However, thep-value for ZPOIS is much smaller than that of ZWLS, rendering more evidence again the null hypothesis.

5 Discussion

In this paper, we have considered an important problem where comparisons have to be made forregions or time intervals that overlap. As opposed to the existing work, e.g Li and Tiwari (2007) andLi et al. (2007), this project advances this area in two distinct ways. First, the developed test does notrely on the normal approximation of the cancer rate, but directly model the counts to follow a Poissonregression model. The parameters are then estimated using their maximum likelihood estimates, andthe Z-test is derived for testing the equality of two APCs. Secondly, the developed Poisson regressionmodel can easily accommodate 0 count data (for the rare cancers), as opposed to the log normal

Biometrical Journal 50 (2008) 4 615

Table 2 Comparison of the power functions between the Poisson based ZPOIS

and the weighted-least-squares based ZWLS for two overlapping regions overroughly the same time intervals, Region 1 (1980–1989) vs. Region 2 (1983–1992). APC1 and APC2 are the annual percent changes in Regions 1 and 2,respectively.

Cancer Sites APC1 APC2 ARE ZWLS ZPOIS

All Malignant 0.100 0.100 1.1710 0.055 0.057�0.300 �0.300 1.1710 0.056 0.057

0.500 0.500 1.1711 0.048 0.0501.000 1.000 1.1712 0.047 0.0493.000 3.000 1.1723 0.051 0.0560.100 0.500 1.1726 0.872 0.908�0.300 0.300 1.1729 0.994 0.997

1.000 2.000 1.1733 1.000 1.0001.000 3.000 1.1737 1.000 1.000

Prostate 0.100 0.100 1.1820 0.043 0.047�0.300 �0.300 1.1821 0.043 0.043

0.500 0.500 1.1822 0.046 0.0411.000 1.000 1.1823 0.035 0.0533.000 3.000 1.1823 0.034 0.0550.100 0.500 1.1825 0.170 0.197�0.300 0.300 1.1827 0.341 0.393

1.000 2.000 1.1832 0.742 0.8021.000 3.000 1.1835 1.000 1.000

# 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

Page 9: An Age-Stratified Poisson Model for Comparing Trends in Cancer Rates Across Overlapping Regions

model (Li and Tiwari, 2007; Li et al., 2007) which needs to involve extra zero-corrected adjustment.We have applied the developed methodology to the analysis of the major cancer sites from the SEERProgram and have found that the corrected Z-test renders more power than the existing tests. A Baye-sian Poisson regression would be a useful approach. However, choice of priors is always difficult and

616 Y. Li et al.: An Age-Stratified Poisson Model for Comparing Trends

Table 3 Comparison of the power functions between the Poisson based ZPOIS

and the weighted-least-squares based ZWLS for two overlapping regions overslightly overlapping time intervals, Region 1 (1980–1989) vs. Region 2 (1987–1996). APC1 and APC2 are the annual percent changes in Regions 1 and 2,respectively.

Cancer Sites APC1 APC2 ARE ZWLS ZPOIS

All Malignant 0.100 0.100 1.1350 0.044 0.044�0.300 �0.300 1.1350 0.044 0.046

0.500 0.500 1.1351 0.046 0.0511.000 1.000 1.1351 0.045 0.0513.000 3.000 1.1352 0.043 0.0470.100 0.500 1.1354 0.834 0.891�0.300 0.300 1.1355 0.994 0.994

1.000 2.000 1.1362 1.000 1.0001.000 3.000 1.1367 1.000 1.000

Prostate 0.100 0.100 1.1410 0.045 0.051�0.300 �0.300 1.1410 0.043 0.049

0.500 0.500 1.1412 0.060 0.0511.000 1.000 1.1423 0.061 0.0553.000 3.000 1.1426 0.052 0.0520.100 0.500 1.1428 0.170 0.184�0.300 0.300 1.1431 0.319 0.361

1.000 2.000 1.1436 0.706 0.7591.000 3.000 1.1439 0.998 1.000

Figure 1 Observed and fitted log-transformed age-adjusted breastcancer mortality rates in CA [1989–2004] and US [1987–2002].

# 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

Page 10: An Age-Stratified Poisson Model for Comparing Trends in Cancer Rates Across Overlapping Regions

computation may not be so straightforward compared to this current work, wherein analytical solutionshave been derived. Hence, we envision that the proposed method would be preferable because of simpleinterpretation of the model parameters, natural choice of the model and computational readiness.

In our technical development, we have modelled the logarithm transformation of the age-adjusted ratesas a linear regression on time in (5) and have indeed explicitly assumed parallelism across age groups.That is, the growth curves of the cancer rates for various age groups share the same slope, which carriesthe information for the APC. Indeed, linearity parallelism for the cancer rates could be a debatable issuein cancer surveillance, which is likely to be violated for some cancers. One alternative, along the line ofgeneralized mixed models, is to assume a random slope (as opposed to a constant slope) across agegroups. This ongoing work will be reported in a subsequent communication.

Appendix A: Derivation of VV; WW; SS

Write

V ¼ V11 V12

V12 V22

� �ðJþ1Þ�ðJþ1Þ

; W ¼ W11 W12

W12 W22

� �ðJþ1Þ�ðJþ1Þ

;

S ¼ S11 S12

S12 S22

� �ðJþ1Þ�ðJþ1Þ

where

V11 ¼1m

Var ðU11;1Þ ¼1m

Pmi¼1

PJj¼1

T2i Var ðd1jiÞ ¼

1m

Pmi¼1

PJj¼1

T2i Eðd1jiÞ :

Hence a consistent estimate is VV11 ¼ 1m

Pmi¼1

PJj¼1 T2

i d1ji:Similarly,

VV12 ¼ ðVV12;1; . . . ; VV12;JÞwhere

VV12;j ¼dCovCov ðU11;1;U01;jÞ ¼1m

Pmi¼1

Tid1ji :

Also, V22 ¼ ðV22;jj0 Þ with VV22;jj0 ¼dCovCov ðU01;j;U01;j0 Þ ¼ 0; j 6¼ j0; 1m

Pmi¼1 d1ji; j ¼ j0:

Next compute the estimate of W:

W11 ¼1n

Var ðU12;1Þ ¼1n

Psþn

i¼sþ1

PJj¼1

T2i Var ðd2jiÞ ¼

1n

Psþn

i¼sþ1

PJj¼1

T2i Eðd2jiÞ

Biometrical Journal 50 (2008) 4 617

Table 4 Comparing APC of Breast Cancer Mortality Be-tween CA and the US.

CA US

APCWLS �2.33 �1.94SEWLS 0.084 0.027ZWLS �4.95 (p ¼ 0.000000757)APCPOIS �2.29 �1.84SEPOIS 0.083 0.026ZPOIS �5.62 (p ¼ 0.000000019593)

# 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

Page 11: An Age-Stratified Poisson Model for Comparing Trends in Cancer Rates Across Overlapping Regions

so that WW11 ¼ 1n

Psþni¼sþ1

PJj¼1 T2

i d2ji Similarly, WW12 ¼ ðWW12;1; . . . ; WW12;JÞ where WW12;j ¼dCovCov ðU12;1;U02;jÞ ¼ 1

n

Pni¼sþ1 Tid2ji. Also, WW22 ¼ ðWW22;jj0 Þ with WW22;jj0 ¼ 0; j 6¼ j0; ¼ 1

n

Psþni¼sþ1 d2ji; j ¼ j0. Finally,

the estimate of S is computed as follows:

S11 ¼1ffiffiffiffiffiffimnp Cov ðU11;1;U12;1Þ

¼ 1ffiffiffiffiffiffimnp Cov

Pmi¼1

PJj¼1

Tid1ji;Psþn

i¼sþ1

PJj¼1

Tid2ij

!

¼ 1ffiffiffiffiffiffimnp Cov

Pmi¼1

PJj¼1

TiðdðNOÞ1ji þ dðOÞji Þ;

Psþn

i¼sþ1

PJj¼1

TiðdðNOÞ2ji þ dðOÞji Þ

!

where dkji ¼ dðNOÞjki þ dðOÞji and the superscripts “NO” and “O” denote the non-overlapping and overlap-

ping regions, respectively. Thus, SS11 ¼ 1ffiffiffiffiffimnp

Pmi¼sþ1

PJj¼1 T2

i dðOÞji . Let SS12 ¼ ðSS12;1; . . . ; SS12;JÞ whereSS12;j ¼ 1ffiffiffiffiffi

mnp

Pmi¼sþ1 Tid

ðOÞji : Also, SS22 ¼ ðSS22;jj0 Þ with SS22;j;j0 ¼ 0; j 6¼ j0; 1ffiffiffiffiffi

mnp

Pmi¼sþ1 dðOÞji ; j ¼ j0:

Appendix B: Computation of the MLEs

Note that the MLEs of b0jk and b1k satisfy:

eb0jk ¼P

i dkjiPi eb1kTi nkji

;

~UUðb1kÞ ¼Pj;i

nkjiTi

Pi djkiP

i eb1kTi nkji

� �eb1kTi �

Pj;i

dkjiTi

¼P

jdkj

Pi nkjiTi eb1kTiPi eb1kTi nkji

� ��P

idk:iTi

¼P

jdkj½Akjð1ÞA�1

kj ð0Þ� �P

idk:iTi ¼ 0 ; k ¼ 1; 2 ;

where dkj: ¼P

i dkji; dk:i ¼P

j dkji; and AkjðaÞ ¼P

i nkjiTai eb1kTi for a ¼ 0; 1; 2.

Since ~UUðb1kÞ is a monotonic function of b1k, there is a unique solution to this equation. Let bb1k bethe solution. This is the MLE of b1k. We can use a Newton–Raphson method to obtain bb1k as follows.Let bb1kðlÞ be the estimate at the lth iteration, then the estimate at the ðlþ 1Þ-th iteration is given bybb1kðlþ 1Þ ¼ bb1kðlÞ � ½ ~UU

ð1Þðbb1kðlÞÞ��1 ~UUðbb1kðlÞÞ; where ~UU

ð1ÞðbÞ is the first (partial) derivative of ~UUðbÞwith respect to b evaluated at b ¼ b. We stop iterating when j bb1kðlþ 1Þ � bb1kðlÞj < e for some pre-specified value of e. Note that

~UUð1Þðb1kÞ ¼

Pj;i

njkiT2i eb1kTi

Pi dkjiP

i eb1kTi nkji

� ��Pj;i

njkiti eb1kti

Pi nkjiti eb1kti

ðP

i eb1kti nkjiÞ2

¼P

j

ðP

i nkjit2i eb1ktiÞ ð

Pi dkjiÞP

i eb1kti nkji

� ��P

j

ðP

i nkjiti eb1ktiÞ2

ðP

i eb1kti nkjiÞ2

¼P

jdkj:

Pi nkjit2

i eb1ktiPi eb1kti nkji

� ��P

j

ðP

i nkjiti eb1ktiÞ2

ðP

i eb1kti nkjiÞ2

¼P

jdkj:½Akjð2Þ ðAkjð0ÞÞ�1� �

Pj½ðAkjð1ÞÞ2ðAkjð0ÞÞ�1� ; k ¼ 1; 2 :

618 Y. Li et al.: An Age-Stratified Poisson Model for Comparing Trends

# 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

Page 12: An Age-Stratified Poisson Model for Comparing Trends in Cancer Rates Across Overlapping Regions

Substituting the MLE bb1k in place for b1k in eb0kj ¼P

i dkjiPi eb1k ti nkji

gives the MLEbb0kj ¼ log ð

Pi dkjiÞ � log ð

Pi ebb1k ti nkjiÞ.

Acknowledgements The authors thank the editor, an AE and two referees for their insightful suggestion, whichled to a better version of this manuscript.

Conflict of interests statementThe authors have declared no conflict of interest.

References

American Cancer Society (2007). Cancer Facts & Figures. Atlanta, Georgia.Brillinger, D. R. (1986). The natural variability of vital rates and associated statistics (with discussion). Bio-

metrics 42, 693–734.Fay, M., Tiwari, R., Feuer, E., and Zou, Z. (2006). Estimating Average Annual Percent Change for Disease Rates

without Assuming Constant Change. Biometrics 62, 847–854.Kim, H., Fay, M., Feuer, E., and Midthune, D. (2000). Permutation tests for joinpoint regression with applications

to cancer rates. Statistics in Medicine 19, 335–351.Kleinbaum, D., Kupper, L., and Muller, K. (1988). Applied Regression Analysis and Other Multivariable Meth-

ods. PWS-Kent, Boston, Mass., 2nd edition.Li, Y. and Tiwari, R. (2007). Comparing Trends in Age-Adjusted Cancer Rates Across Overlapping Regions or

Time Intervals for the NCI SEER Program. Technical Report, http://www.bepress.com/harvardbiostat/pa-per71.

Li, Y., Tiwari, R., Walters, K., and Zou, Z. (2007). A Weighted-Least-Squares Estimation Approach to ComparingTrends in Age-Adjusted Cancer Rates Across Overlapping Regions. Technical Report, http://biowww.dfci.harvard.edu/�yili/apc2.pdf

Ries, L., Eisner, M., Kosary, C., Hankey, B., Miller, B., Clegg, L., Mariotto, A., Feuer, E., and Edwards, B.(2003). SEER Cancer Statistics Review, 1975–2002, National Cancer Institute. Bethesda, MD, http://seer.cancer.gov/csr/1975-2002/.

Tiwari, R., Clegg, L., and Zou, Z. (2006). Efficient interval estimation for age-adjusted cancer rates. StatisticalMethods in Medical Research 15, 547–569.

Biometrical Journal 50 (2008) 4 619

# 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com