7
J Ctin Epidemiol Vol. 41, No. 5, pp. 495-501, 1988 0895-4356/88 S3.00 + 0.00 Printed in Great Britain. All rights resewed Copyright 0 1988 Pergamon Press plc AN APPLICATION OF CAPTURE-RECAPTURE METHODS TO THE ESTIMATION OF COMPLETENESS OF CANCER REGISTRATION SYLVIA C. ROBLES,“* LORAINE D. MARRETT,“~ E. AILEEN CLARKE’ and HARVEY A. RISCH’ ‘Division of Epidemiology and Statistics, Ontario Cancer Treatment and Research Foundation, and Department of Preventive Medicine and Biostatistics and *NCIC Epidemiology Unit, Department of Preventive Medicine and Biostatistics, Universty of Toronto, Ontario, Canada (Received in revised form 13 July 1987) Abstract-Completeness of cancer registration has not been consistently ascertained across different registries. This report describes how capture-recapture methods have been used to estimate completeness at the Ontario Cancer Registry. The method was applied in two fashions; first, using three data sources in a modeling approach: and second, using two data sources and standard, simple capture-recapture methods. The modeling approach is more flexible, since several variables that influence cancer registration can be considered and can be used to identify reporting patterns of different data sources. In the present analysis, estimates of completeness of the registry as a whole were remarkably similar using either two or three data sources, and site-specific comparisons differed by at most 7%. Because of the advantages of capture-recapture methods- estimation of level of completeness, possible comparability of estimates across different registries, and versatility to consider other determinants of cancer registration-a plea for greater use of these methods in cancer registration is made. Cancer incidence Cancer registration Capture-recapture methods INTRODUCTION The usefulness of a population-based cancer registry depends to a large extent on the quality of its data. One of the most important aspects of quality is completeness of registration, that is, the degree to which reportable incident cases of cancer in the population of interest are actually recorded in the registry. However, there appears to be no standard measure of completeness. Since most cancer registries use more than one data source to ascertain as many cases of cancer as possible, capture-recapture methods may be used to estimate the number of incident cases in *Present address: Department of Preventive Medicine, University of Costa Rica, ciudad universitaria “Rodrigo Facie,” San Jose, Costa Rica. tCorrespondence should be addressed to: Dr L. D. Marrett, Division of Epidemiology and Statistics, Ontario Cancer Treatment and Research Foundation, 7 Overlea Boulevard, Toronto, Ontario, Canada M4H lA8. the population, and hence to assess com- pleteness of registration. These methods have the advantage of using data routinely collected by the registry; consequently, no special survey is necessary. Capture-recapture methods have been widely used in zoology to estimate the size of animal populations [l]. In short, sequential independent samples of animals are captured at different stations; the animals are tagged and then allowed to mix with those still untagged; and estimation of the size of the population is based on the number of animals caught in successive samplings and the proportion of those caught that are tagged. Methods similar to this have also been used to estimate the prevalence of certain disorders in human popu- lations [2-61, the number of births and deaths in developing countries [7,8], and the number of false negatives in medical screening [9]. In the present work, the use of capture- recapture methods for assessing completeness 495

An application of capture-recapture methods to the estimation of completeness of cancer registration

Embed Size (px)

Citation preview

Page 1: An application of capture-recapture methods to the estimation of completeness of cancer registration

J Ctin Epidemiol Vol. 41, No. 5, pp. 495-501, 1988 0895-4356/88 S3.00 + 0.00 Printed in Great Britain. All rights resewed Copyright 0 1988 Pergamon Press plc

AN APPLICATION OF CAPTURE-RECAPTURE METHODS TO THE ESTIMATION OF

COMPLETENESS OF CANCER REGISTRATION

SYLVIA C. ROBLES,“* LORAINE D. MARRETT,“~ E. AILEEN CLARKE’ and HARVEY A. RISCH’

‘Division of Epidemiology and Statistics, Ontario Cancer Treatment and Research Foundation, and Department of Preventive Medicine and Biostatistics and *NCIC Epidemiology Unit, Department of Preventive Medicine and Biostatistics, Universty of Toronto, Ontario, Canada

(Received in revised form 13 July 1987)

Abstract-Completeness of cancer registration has not been consistently ascertained across different registries. This report describes how capture-recapture methods have been used to estimate completeness at the Ontario Cancer Registry. The method was applied in two fashions; first, using three data sources in a modeling approach: and second, using two data sources and standard, simple capture-recapture methods. The modeling approach is more flexible, since several variables that influence cancer registration can be considered and can be used to identify reporting patterns of different data sources. In the present analysis, estimates of completeness of the registry as a whole were remarkably similar using either two or three data sources, and site-specific comparisons differed by at most 7%. Because of the advantages of capture-recapture methods- estimation of level of completeness, possible comparability of estimates across different registries, and versatility to consider other determinants of cancer registration-a plea for greater use of these methods in cancer registration is made.

Cancer incidence Cancer registration Capture-recapture methods

INTRODUCTION

The usefulness of a population-based cancer registry depends to a large extent on the quality of its data. One of the most important aspects of quality is completeness of registration, that is, the degree to which reportable incident cases of cancer in the population of interest are actually recorded in the registry. However, there appears to be no standard measure of completeness. Since most cancer registries use more than one data source to ascertain as many cases of cancer as possible, capture-recapture methods may be used to estimate the number of incident cases in

*Present address: Department of Preventive Medicine, University of Costa Rica, ciudad universitaria “Rodrigo Facie,” San Jose, Costa Rica.

tCorrespondence should be addressed to: Dr L. D. Marrett, Division of Epidemiology and Statistics, Ontario Cancer Treatment and Research Foundation, 7 Overlea Boulevard, Toronto, Ontario, Canada M4H lA8.

the population, and hence to assess com- pleteness of registration. These methods have the advantage of using data routinely collected by the registry; consequently, no special survey is necessary. Capture-recapture methods have been widely used in zoology to estimate the size of animal populations [l]. In short, sequential independent samples of animals are captured at different stations; the animals are tagged and then allowed to mix with those still untagged; and estimation of the size of the population is based on the number of animals caught in successive samplings and the proportion of those caught that are tagged. Methods similar to this have also been used to estimate the prevalence of certain disorders in human popu- lations [2-61, the number of births and deaths in developing countries [7,8], and the number of false negatives in medical screening [9].

In the present work, the use of capture- recapture methods for assessing completeness

495

Page 2: An application of capture-recapture methods to the estimation of completeness of cancer registration

496 SYLVIA C. ROBLES et al.

of registration is illustrated by application to the Ontario Cancer Registry (OCR). A mod- eling generalization of the capture-recapture formulas is applied to the three primary data sources of the OCR. Since some registries may have fewer sources available, results are com- pared to those obtained using the two sources which are routinely employed by many population-based cancer registries.

METHODS

The Ontario Cancer Registry

The OCR is a population-based cancer reg- istry serving the entire province of Ontario, with a population of 8.6 million in 198 1. This registry is based on a passive reporting system and registers cases using three primary sources of information: (1) hospital discharge data and reports from cancer treatment centres; (2) path- ology reports from hospital pathology labora- tories; and, (3) death certificate data from the Office of the Registrar General. All source- records belonging to the same person are brought together by the application of a proba- bilistic computerized record linkage system which uses a version of the Generalized Iterative Record Linkage System developed by Statistics Canada in conjunction with the Epidemiology Unit of the National Cancer Institute of Canada [lo]. The data are also linked to previous years’ incidence in order to clear the file of prevalent cases.

cation and receipt of supporting data at the registry. Thus, 1983 source reports relating to 1982 diagnoses are included in the analysis. Individuals are uniquely identified in the reg- istry by their names and other personal infor- mation. To use the methods, ascertained cases are classified according to whether present or absent in each of the data sources, This classification can be represented by a con- tingency table with 2d cells, where d is the number of data sources used by the registry. The observed frequencies thus correspond to the number of individuals with each possible combi- nation of data sources. One of the cells is empty, representing the unknown number of un- registered cases, i.e. those for whom all data sources are absent.

After the linkage is accomplished, primary site(s), histology, date of diagnosis, method of diagnosis, and residence are determined through the application of a set of computer rules based on medical logic, Date of diagnosis is taken to be the earliest date recorded on the source documents linked together for an individual tumour. A patient may appear more than once in the registry if more than one primary mal- ignant neoplasm is present. Information about the methods used by the OCR to generate incidence data have been presented in greater detail elsewhere [ 111.

To model the observed frequencies, dichoto- mous variables are set up to indicate presence or absence in each source. These indicators and their interaction terms allow the estimation of at most 2d- 1 parameters. By choosing a re- gression model which adequately represents the data with fewer parameters, a more precise estimate of the total population size is obtained, A model is selected based on goodness of fit, including the distribution of residuals, the re- sidual “deviance” (- 2 times the log-likelihood for Poisson models or residual sum of squares for linear ones), and if necessary, the choice of an adequate transformation of the observed frequencies so that residuals at large and small observations are of comparable size. After the model is selected, the parameter estimates are used to estimate the number of unregistered individuals, and thence the total number of cancer cases. In the present work, the GLIM statistical program [12] is employed for the model fitting using the three data sources of the OCR.

In addition, to simulate the situation of many registries, the three data sources are collapsed into two: hospital discharges and pathology reports (corresponding to registration reports), and death certificates. Standard two-source capture-recapture methods are then applied, as given in Bishop et al. [13].

Capture-recapture metho&

The population whose size is to be estimated here consists of all incident cases of cancer diagnosed in 1982 among Ontario residents. The year 1982 was chosen so that at the time of study, at least 12 months had elapsed from all 1982 diagnoses, thereby allowing time for notifi- Appendix.

For these analyses, primary sites of malignant neoplasms have been combined into 16 groups according to anatomical location and reporting patterns. Percent completeness is estimated sep- arately for each of the 16 site-groups, and for the total of all sites. Formulas used in both the two- and three-source methods are shown in the

Page 3: An application of capture-recapture methods to the estimation of completeness of cancer registration

Completeness of Cancer Registration 491

Death certificates Hospital discharges Standardized Residual

Pathology reports

Fig. 1. Distribution of sources of registration for the Ontario Cancer Registry, 1982. Percent of the total number of new registrations (n = 31786) is shown under the number

for each combination of sources.

RESULTS

In total, 3 1,786 new cases of cancer were registered in Ontario in 1982 (Fig. 1). A large proportion of these cases appeared on both hospital discharge and pathology reports. Very few cases had a death certificate only or a pathology report along with a death certificate.

Both linear and log-linear models were evalu- ated, and a linear one with natural logarithmic transformation of (1 + observed frequency) ap- peared to best balance the plot of residuals. In order to select relevant terms for inclusion in the model, first, a “full” model was fitted including indicator terms for the 16 site-groups, the three data sources, and all two-way interactions.

2-

I

. l- . .

. o-. * .

.

-1 -

. . -

-. ’ . - . ’ . .

- . . . . :

-. I

.

:. . .

. . .

. . : :

. * -_ . . . . . . :: .

-. . . . . . . . a.. *. ‘- .

.I .” . . . - *.

:. . -- . .

: .

. .

-21 ’ ’ ’ ’ ’ ’ ’ ’ 0 2 4 6 8

Pitted Value

Fig. 2. Standardized residuals plotted according to fitted value, from estimates of model 4 (Table 1). The residuals (observed minus fitted values) and fitted values are on the logarithmic scale (see text); to standardize, the residuals were divided by the square root of the residual mean square, the deviance divided by its degrees of freedom. Ninety-five percent of the standardized residuals should lie within plus

or minus 1.96.

Next, the “basic” model with only terms for site was fitted. Then, all two-way interactions be- tween pairs of data sources and between data sources and site were examined for removal from the full model. The proportion of deviance explained by each model relative to the difference in deviance between the full model and the basic model was estimated and the final model selected. Results for a number of the models that were tested are shown in Table 1.

Table 1. Selected models tested in assessing completeness of cancer registration, Ontario, 1982

Model

Percent of Degrees of deviance

Deviance freedom exnlained

Model 1 (“basic”):

Mtdel 2 (“data sources”): 320.996 96 0

s+h+d+p 83.417 93 78 Model 3:

s+h+dfp+h.d+d.p 70.085 91 82 Model 4:

s+h+d+p+s.d*+h.d+d.p 44.780 76 90 Model 5:

~+h+d+p+s.d~+s.p’+h.d+d.p 32.805 61 94 Model 6 (“full”):

s+h+d+p+s.d*+s,p*+h.d+d.p+s.h*+h.p 15.116 45 100

s = site (16 indicators). h = hospital discharges (yes/no). d = death certificates (yes/no). p = pathology reports (yes/no). *Any interaction term with site results in 15 additional interaction terms

Page 4: An application of capture-recapture methods to the estimation of completeness of cancer registration

498 SYLVIA C. ROBLES el al.

Table 2. Estimated completeness of cancer registration by site, Ontario Cancer Registry, 1982

Site (ICD-9 codes)

Lip, oral cavity, and upper respiratory tract (140-149, 160, 161)

Gastrointestinal tract (150-I 54)

Deep-seated digestive organs (155-159)

Trachea, bronchus, lung and pleura (162, 163, 165)

Total number of registered

cases

1500

6141

1340

4556

Percent completeness (95% confidence interval)

Three-sources* Two-sourcest

92.15 95.00 (8O.~l2;;.18) (92.3c97.87)

95.81 (88.77-98.53) (94.97-96.65)

98.24 98.37 (94.9499.43) (97.56-99.19)

97.69 98.46

Soft tissues, bone and cartilage (164, 170, 171)

Malignant melanoma of skin (172)

Breast (174, 175)

Female genital organs (179-184)

Prostate (185)

Male genital organs (186, 187)

Urinary tract (188, 189)

Nervous system (190-192)

Endocrine glands (193, 194)

Lymphatic/hematopoietic tissue (200-208)

Secondary sites (196-198)

Ill-defined sites (195, 199)

All sites (140-208, excluding 173)

376

838

3903

2292

2387

219

2220

754

311

3744

464

741

31786

(93.5699.21) (98.01-98.91) 92.84 93.55

(81.39-97.57) (87.90-99.98) 91.26 98.30

(78.13-96.87) (94.99-101.80) 94.75 90.61

(86.1498.14) (87.61-93.82) 94.29 92.86

(85.W97.98) (90.49-95.36) 93.52 90.97

(83.23-97.69) (87.34-94.92) 93.86 100.00

(83.46-98.08) -_$ 94.22 96.82

(84.85-97.96) (95.05-98.65) 96.78 97.61

(91.01-98.95) (96.01-99.26) 94.20 94.73

(84.42-98.12) (87.75-102.90) 94.46 95.00

(85.45-98.03) (93.53-96.52) 95.65 100.00

(88.05-98.59) -_$ 96.05 92.54

(89.1698.69) (89.59-95.68) 95.15 95.87

(92.19-98.30) (95.47-96.27)

*Based on regression model 4 of Table 1 (see Appendix, part (b)). tBased on standard two-source capture-recapture methods (Appendix, part (a)). $For those sites with 100% completeness, the standard error is 0, since no cases were reported only on a death certificate.

Although model 5 explained more of the devi- ance than the other models, the large number of terms in the model resulted in higher variances of the parameter estimates. Model 4 explained only slightly less deviance with substantially fewer terms. In addition, the cloud of residuals was less sparse than that of model 3 and was without systematic distribution (Fig. 2). There- fore, estimates (and their standard errors) of the number of missing cases were based on model 4.

Estimated percent completeness by site for the two- and three-source situations are shown in Table 2. Completeness is remarkably similar for all sites together, 95.15% using three sources and 95.87% using two sources. Parenthetically, we note that the three-source estimate ranged only between 94.91 and 96.06% among models 2-5 in Table 1. The estimates of completeness do vary by site-group, from a low of about 91%

(for both breast and prostate using two sources and for malignant melanoma using three), to a high of 100% for both male genital organs and secondary sites using two sources. Some differences in completeness are present for specific sites as computed from the two or three sources. In particular, reporting of cutaneous malignant melanoma is estimated to be over 98% complete using two sources and only 9 1% when three sources are employed: this may be due to the small number of patients reported by death certificate only and the relatively high proportion reported only by pathology. Esti- mates of completeness for the other sites were close, usually within 4%. Two sites, male genital organs and secondary sites, showed 100% two- source completeness because all cases reported through death certificates were also reported by other sources.

Page 5: An application of capture-recapture methods to the estimation of completeness of cancer registration

Completeness of Cancer Registration 499

DISCUSSION

The simple two-source capture-recapture method has been used previously to estimate completeness of population-based registries. For example, Benn et al. [14] considered notifi- cations from a radiotherapy centre and hospital discharges as two sources to estimate the num- ber of cases missing from a cancer registry. However, they based their estimates only on cases not known to be dead, assuming that registration for dead cases was complete. The method has also been used to assess complete- ness of birth and death registries in developing countries [7,8]. There are, however, difficulties with the two-source approach. First, the as- sumption of independence between the two data sources is critical, since either positive or nega- tive dependence may bias the estimate of the size of the target population. Second, if there are no cases reported by one of the source combi- nations (e.g. if no cancers of the male genital organs are registered solely through a death certificate, as was the case in the OCR in 1982) the method leads to an estimate of 100% com- pleteness, even though registration may well not be complete. This point is discussed in more detail below.

The two-source method is appealing because of its mathematical simplicity, so that in situ- ations where more than two data sources are available, the merging of sources with higher degrees of dependence has sometimes been recommended [2]. In our comparative analysis, hospital and pathology reports were combined to simulate the situation of many population- based cancer registries which rely on tumour reports and death certificates as their two pri- mary data sources. However, for cancer regis- tries with more than two data sources, merging is not optimal, since information is lost. Also, since relationships among data sources may vary by site or other relevant covariables, the merging of two specific data sources may be appropriate for one site but not for another. This is particularly important if estimates of site-specific completeness are required.

The OCR actually includes more than two data sources which are provided to the registry independently; in order to fully utilize these data to estimate completeness, the standard two- source capture-recapture method was gener- alized through use of modeling. The modeling approach is versatile and can be applied in many varied situations with adaptations according to

the specifics of each registry. It not only permits use of the information contained in more than two data sources, but also can take into account variations in reporting patterns by site and dependencies between sources. The model has the potential to be further expanded to include parameters for other factors, such as age and sex, which may be related to reporting patterns. Thus, subgroups of the population may be identified where efforts to improve reporting could be directed.

These methods, particularly modeling, have application beyond the area of cancer regis- tration, since they can be applied to any data base that uses more than one data source and that serves a defined population. An example is the estimation of completeness of reporting of infectious diseases, where physician notifications and laboratory reports are received. Cancer registries have existed for several decades, but in recent years more population-based registries of other diseases are being created. These data bases produce descriptive data, generating new hypotheses on which further research is based. Therefore, estimating completeness is important.

One of the limitations of using these methods for estimating completeness of cancer regis- tration is that they do not detect over-reporting. Moreover, the estimates can be affected by over-reporting, particularly if it is different across data sources. Therefore, the quality of the linkage and the routine checks on the quality of the data are essential so that the estimates of completeness are valid. Since capture-recapture methods have not been frequently used in this way in the past, estimates of completeness need to be validated in settings where independent estimates of completeness are available as a result of special studies.

It is impossible to truly validate our estimates of completeness without conducting an indepen- dent study with different methods. However, comment can be made on the reasonableness of the ordering of the site-specific estimates. Patients with highly fatal cancers would be expected to be more completely reported than those having tumours with relatively benign courses, both because many fatal cancers would be reported through the additional, death- certificate source, and because they would be more likely to have multiple reports from the same source (e.g. hospital admissions), thereby increasing the probability of registration. Thus, it is consistent with our expectations that site-

Page 6: An application of capture-recapture methods to the estimation of completeness of cancer registration

500 SYLVIA C. ROBLE~ et al.

groups in 1982 with the highest ratio of deaths to diagnoses (an index known as “deaths in period,” or “DIP” [15]), namely deep-seated digestive organs; ill-defined sites; trachea, bron- chus, lung and pleura; and nervous system, also have the highest estimates of completeness, while sites with low DIP (male genital organs, melanoma of skin, endocrine glands, prostrate, and urinary tract) have low estimates. In fact, the Spearman rank correlation coefficient be- tween the three-source estimates and DIP for the 15 site-groups (omitting secondary sites, since these cannot be coded as the cause of death) is 0.86, which is highly significant (p < 0.01). The two-source estimates do not correlate well with DIP, and thus suggest that for the site-specific estimates, this method using the two particular sources chosen may be less valid.

In our results, the estimates of completeness for the registry as a whole and for specific sites were similar according to both methods, except for malignant melanoma, where the estimate from the two-source approach was higher. This difference is probably due to merging of hospital and pathology reports to create two sources, which may not be desirable for this site. For melanoma, as well as for the two sites estimated to be 100% complete, few or no cases were registered only through a death certificate. This occurs if survival is good and few cases die from the disease, or if the neoplasm is rare and therefore has a low diagnostic suspicion at death, or if the diagnosis cannot be coded as a cause of death (i.e. cause-of-death coding rules do not permit use of secondary site codes).

At present, the lack of a direct measure of completeness makes it difficult to assess the quality of cancer-registry data and therefore to evaluate incidence rates and trends [16, 171. Traditionally, two measures have been used as indices of the quality of cancer registration and also as indicators of completeness: the pro- portion of cases registered for which no infor- mation is available other than a statement on the death certificate that the cause of death was cancer, or “death certificates only” (“DCO”); and the ratio of the number of deaths attributed to a given cancer during a particular period of time to the number of incident cases diagnosed with the same cancer in that period, DIP. These indirect measures of completeness can even have different definitions across registries. For example, some registries routinely contact the attending physician or health-care institution

for cases first known through a death certificate, in hopes of transferring them out of the DC0 category, while other registries just include such cases as DC0 without further inquiry. DIP is subject to variations in incidence, mortality, and fatality rates [18]. In addition, both DC0 and DIP are subject to inaccuracies in the certified cause of death. The extent and importance of these inaccuracies with respect to a diagnosis of cancer has been documented by several studies [19-221. Therefore, percent DC0 and DIP from different registries may not be comparable. Fur- thermore, DC0 and DIP measure something other than what has been measured in this study. DC0 and DIP show areas where active registration may be incomplete, but do not provide an estimate of the number of un- registered cancer cases.

Due to the limitations of the foregoing indices of quality of registration as regards complete- ness, some population-based cancer registries have conducted their own special studies. These studies can be divided into three types: (1) those that use an external source of comparison [24-301; (2) those reabstracting cancer data from one of the sources used by the registry [31]; and (3) those that use data already in the registry to construct indices of completeness [14,32,33]. External and independent sources of comparison are not always available or reliable. The second method, reabstraction, provides more information on accuracy of registration than on completeness and it can be very expen- sive. The most cost-efficient way of estimating completeness is by using registry data. From a review of these special studies all that can be concluded is that each registry tailored the assessment of completeness to its own particular problems and resources.

There is, therefore, a great need for a direct and standard measure of completeness of cancer registration, especially given the current prolif- eration of population-based cancer registries. Application of capture-recapture methods, using either the simple two-source or more complex modeling appoach, could greatly im- prove the current situation and provide much needed estimates of completeness.

Acknowledgement-Dr Loraine Marrett is supported by a National Health Research Scholar award from Health and Welfare Canada.

REFERENCES

1. Cormack RM. The statistics of capturerecapture methods. Oceanogr Mar Biol Ann Rev 1968; 6: 455-506.

Page 7: An application of capture-recapture methods to the estimation of completeness of cancer registration

2.

3.

4.

5.

6.

7.

8.

9.

10.

11.

12.

13.

14.

15.

16.

17.

18.

19.

20.

21.

22.

23.

Wittes JT, Colton T, Side1 VW. Capture-recapture methods for assessing the completeness of case ascertainment when using multiple information sources. J Chron Dis 1974: 27: 25-36. Neuaebauer R. Applicatioh of a capture-recapture me&d (The Ber&lli census) to -historical epi- demioloev. Am J Eoidemiol 1984: 120: 626-634. Chapman CJ. Ethni’c differences in the incidence of cleft lip and/or cleft palate in Auckland, 1960-1976. NZ Med J 1983; 96: 327-329. Hook EB, Albright SG, Cross PK. Use of Bernoulli census and log-linear methods for estimating the prevalence of spina bifida in livebirths and the completeness of vital records reports in New York State. Am J Epidemiol 1980; 112: 750-758. Hewitt D, Milner J. Prevalence of hemophilia in Ontario, 1966. Can Med Assoc J 1970; 102: 174-177. Chandra Sekar C, Ljeming WE. On a method for estimating birth and death rates and the extent of registration. J Am Stat Asaoc 1949; 144: 101-115. Ansiey JC. The design of an experimental procedure for obtaining accurate vital statistics. In: International Population Conference, New York, 1961 London: International Union for the Scientific Study of Population; 1963: Vol. II, pp. 372-375. Goldberg JD, Wittes JT. The estimation of false negatives in medical screening. Biometrics 1978; 34: 77-86. Howe GR, Lindsay J. A generalized iterative record linkage computer system for use in medical follow-up studies. Comput Biomed Rea 1981; 14: 327-340. Clarke EA, Kreiger N, Marrett LD, Soskolne CL. Cancer incidence, mortality and treatment in Ontario. In: Cancer in Ontario. Toronto: Ontario Cancer Treatment and Research Foundation; 1984: 157-186. Baker RJ, Nelder JA. The GLIM System, Rel. 3. Oxford: Numerical Algorithms Group; 1978. Bishop YMM, Fienberg SE, Holland PW. Discrete Multivariate Analysis: Theory and Practice. Cam- bridge, Mass.: MIT Press; 1975. Benn RT. Leek I. Nwene UP. Estimation of com- pleteness bf cancer’ registration. Int J Epidemiol 1982; 11: 362-367. Waterhouse J, Muir C, Shanmugaratnam K, Powell J. Cancer Incidence in Five Continents. Lyon: IARC Scientific Publication No. 42, 1982: Vol. IV. Doll R, Peto R. The Causes of Cancer. Quantitative E&mates of Avoidable Risks of Cancer in the United States Today. Oxford: Oxford University Press; 1981: 1270-1274. Devesa SS, Pollack ES, Young JL. Assessing the validity of observed cancer incidence trends. Am J Epidemiol 1984; 119: 274-291. Waterhouse J. The evolution of Cancer Incidence in Five Continents. Curr Probl Cancer 1985; 10: 22-30. Percy C, Stanek E, Gloeckler L. Accuracy of cancer death certificates and its effects on cancer mortality statistics. Am J Public Health 1981; 71: 242-250. Barclay THC, Phillips AJ. The accuracy of cancer diagnosis on death certificates. Caocer 1962; 15: 5-9. Engel L, Strauchen J, Chiazze L, Heid M. Accuracy of death certification in an autopsied population with specific attention to malignant neoplasms and vascu- lar diseases. Am J Epidemiol 1984; 119: 274-291. Mattson B, Rutqvist LE, Wallgren A. Comparison between diagnoses in the Stockholm regional cancer register and certified underlying cause of death. Acta Radio1 (Oncol) 1985; 24: 219-226. Waterhouse J, Muir C, Correa P, Powell J. Cancer Incidence in Five Continents. Lyon: IARC Scientific Publication No. 15, 1976; Vol. III.

24.

25.

26.

27.

28.

29.

30.

31.

32.

33.

Chiazze L. Morbidity survey and case register esti- mates of cancer incidence. Nat1 Cancer lost Mooogr 1966; 19: 373-384. Shenker S, Spitzer W, Hill G. The investigation of a cancer epidemic: Which numerator data? Can J Public Health 1984; 74: 157-160. Larsson S. Completeness and reliability of lung cancer registration in the Swedish cancer registry. Acta Microbial Stand (Sec. A) 1971; 79: 389-398. Nwene U, Smith A. Assessing completeness of cancer registration in the northwestern region of England by a method of independent comparison. Br J Cancer 1982; 46: 635439. Mattsson B, Rutqvist LE, Wallgren A. Undemotifi- cation of diagnosed cancer cases to the Stockholm cancer registry. Int J Epidemiol 1985; 14: 64-69. Leek I, Birch JM, Marsden HB, Steward JK. Methods of classifying and ascertaining children’s tumours. Br J Cancer 1976; 34: 69-82. Koch M. How complete is cancer registration in Alberta? Can Med Assoc J 1982; 127: 139-140. West RR. Accuracy of cancer registration. Br J Prev Sot Med 1976; 30: 187-192. Freedman LS. Variation in the level of reporting by hospitals to a regional cancer registry. Br J Cancex 1978; 37: 861-865. Sherman G. Cancer Incidence in Canada: Com- pleteness and ecological correlations. Doctoral thesis, Division of Community Health, University of Tor- onto. 1981.

APPENDIX

Notation:

nri = number of individuals ascertained in data source r at site i;

ri = total number of individuals ascertained at site i; N, = estimated total number of individuals at site i; IU, = estimated number of missing individuals at site i; 8, = parameter estimate for site i (obtained from model

fitting; no grand mean in model); vii = estimated covariance of site parameters /$ and j?,

(from model fitting).

(a) Two-source (capture-recapture) approach

N, = (n,,nz,)l(nl, + nzi - 1,) is the maximum-likelihood estimator of the size of the target population. Its variance is given by:

Var(N,) = (n,,n,,(r, - n,,)& - %))/(n,, + nzi - 1,)‘.

Confidence limits are obtained for N, and then trans- formed to percent completeness.

The estimated number of missing cases at each site is M,=N,-ri.

(b) Modelling approach

Under the logarithmic transformation used, the number of missing cases at site i is estimated by:

M, = exe(&) - 1. Confidence limits are obtained for Mi from vi, and then

transformed to percent completeness. The total number of cases missing from the registry (M) is estimated by summing the M, over all sites; the variance of M is given by:

Var(M) =+2x (1 + M,)(l + Mj)vi/ + C (1 + M,)zui,. I>/

Confidence limits are obtained for M ‘and then trans- formed to percent completeness.