10
CENTRAL CANCER REGISTRY DATA MANAGEMENT SYSTEM Thetford, N.A.*, McKernon, R.F.*, Flannery, J.T.**, and Weiss, T.*** *Connecticut Cancer Epidemiology Unit, New Haven, CT **Connecticut Tumor Registry, Department of Health, Hartford, CT ***National Cancer Institute, Biometry Branch, CSS, Bethesda, MD Abstract This paper describes a data management system which has been developed for the Connecticut Tumor Registry to optimize the processing and control of information within the registry and to minimize the manual procedures required. The main features of the system are document control, in which a computer-assigned number follows a tumor report from the time it enters the registry; automated patient linkage, which uses several demographic variables to match incoming patient records against existing records with the same surname phonetic code and sex; and extensive intra- and inter-field editing of all data elements. A num- ber of reports are generated to enable the regis- try staff to evaluate the quality of the data being captured and to assist the hospital-based registries in maintaining their files and in per- forming patient follow-up. To the extent possi- ble, record formats, edit criteria and text asso- ciated with given data values are contained in files external to the main programs to facilitate modifications without the need for extensive pro- gram alterations. Introduction The Connecticut Tumor Registry (CTR), contain- ing records on all cases of cancer diagnosed in Connecticut residents since 1935, is a unique re- source. As the oldest population-based registry in the nation, it is invaluable in the study of cancer epidemiology and has attained national and international prominence in this area. The CTR is a cornerstone in most of the research projects in- volving cancer and its control within the State, and its data play a significant role in the Na- tional Cancer Program as well. As the interest in cancer epidemiology has grown, the need for reliable data on cancer inci- dence and survival has placed increasing demands on the registry's data management system, necessi- tating a comprehensive data management system fully integrated into all aspects of registry operations. The system described below meets the registry's needs by providing support in the fol- lowing areas: 1. Control of documents during and after process- ing. 2. Complete intra- and inter-field editing of the contents of all data elements. 3. Automation of routine registry procedures. 4. A means for adding or modifying data elements and their corresponding edit criteria. 5. Information to assess the quality and adequacy of the data being collected. 6. Reports and forms needed by the hospital reg- istries to maintain their files and perform patient follow-up. 7. Maintenance of the data in a suitable file structure to permit access to individual hospital admission reports and consolidated tumor data in addition to the entire patient record set. Traditionally, registries have used computers to handle many of the routine clerical tasks but have shied away from attempting to automate tasks which require significant thought processes. This system incorporates important advances in several areas previously handled exclusively on a manual basis: linkage of new records with existing patients, geographic coding of place of residence, determination of multiple tumors, and consolida- tion of demographic, diagnostic and treatment data from several reports of the same tumor. The system, as currently implemented, utilizes eleven input document types and is comprised of four main programs (Pre-Edit, Linkage, Geocode, and Edit/Update) and some twenty auxiliary pro- grams for data preparation, file analysis, and reporting. In addition to a Master File and a Linkage File, there are 24 other files containing transactions, report records, edit criteria, rec- ord formats, linkage weights, and report labels. The system is programmed primarily in American National Standard COBOL and is designed to operate on an IBM 370 series OS/VS computer. Design and programming of the system has been performed under the auspices of five institutions (California Department of Health, Connecticut Department of Health Services, SEER program of the National Cancer Institute, the University of Cali- fornia at Berkeley and Yale University), and has been supported in part by the following contracts with the National Cancer Institute: NO1 CP 33235, NO1 CP 33353 and NO1 CP 61002. 804 CH1480-3/79/0000-0804$00.75 © 1979 IEEE

Central Cancer Registry Data Management System

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Central Cancer Registry Data Management System

CENTRAL CANCER REGISTRY DATA MANAGEMENT SYSTEM

Thetford, N.A.*, McKernon, R.F.*, Flannery, J.T.**, and Weiss, T.***

*Connecticut Cancer Epidemiology Unit, New Haven, CT**Connecticut Tumor Registry, Department of Health, Hartford, CT***National Cancer Institute, Biometry Branch, CSS, Bethesda, MD

Abstract

This paper describes a data management systemwhich has been developed for the Connecticut TumorRegistry to optimize the processing and control ofinformation within the registry and to minimizethe manual procedures required. The main featuresof the system are document control, in which acomputer-assigned number follows a tumor reportfrom the time it enters the registry; automatedpatient linkage, which uses several demographicvariables to match incoming patient recordsagainst existing records with the same surnamephonetic code and sex; and extensive intra- andinter-field editing of all data elements. A num-ber of reports are generated to enable the regis-try staff to evaluate the quality of the databeing captured and to assist the hospital-basedregistries in maintaining their files and in per-forming patient follow-up. To the extent possi-ble, record formats, edit criteria and text asso-ciated with given data values are contained infiles external to the main programs to facilitatemodifications without the need for extensive pro-gram alterations.

Introduction

The Connecticut Tumor Registry (CTR), contain-ing records on all cases of cancer diagnosed inConnecticut residents since 1935, is a unique re-source. As the oldest population-based registryin the nation, it is invaluable in the study ofcancer epidemiology and has attained national andinternational prominence in this area. The CTR isa cornerstone in most of the research projects in-volving cancer and its control within the State,and its data play a significant role in the Na-tional Cancer Program as well.

As the interest in cancer epidemiology hasgrown, the need for reliable data on cancer inci-dence and survival has placed increasing demandson the registry's data management system, necessi-tating a comprehensive data management systemfully integrated into all aspects of registryoperations. The system described below meets theregistry's needs by providing support in the fol-lowing areas:

1. Control of documents during and after process-ing.

2. Complete intra- and inter-field editing of thecontents of all data elements.

3. Automation of routine registry procedures.

4. A means for adding or modifying data elementsand their corresponding edit criteria.

5. Information to assess the quality and adequacyof the data being collected.

6. Reports and forms needed by the hospital reg-istries to maintain their files and performpatient follow-up.

7. Maintenance of the data in a suitable filestructure to permit access to individual hospitaladmission reports and consolidated tumor data inaddition to the entire patient record set.

Traditionally, registries have used computersto handle many of the routine clerical tasks buthave shied away from attempting to automate taskswhich require significant thought processes. Thissystem incorporates important advances in severalareas previously handled exclusively on a manualbasis: linkage of new records with existingpatients, geographic coding of place of residence,determination of multiple tumors, and consolida-tion of demographic, diagnostic and treatment datafrom several reports of the same tumor.

The system, as currently implemented, utilizeseleven input document types and is comprised offour main programs (Pre-Edit, Linkage, Geocode,and Edit/Update) and some twenty auxiliary pro-grams for data preparation, file analysis, andreporting. In addition to a Master File and aLinkage File, there are 24 other files containingtransactions, report records, edit criteria, rec-ord formats, linkage weights, and report labels.The system is programmed primarily in AmericanNational Standard COBOL and is designed to operateon an IBM 370 series OS/VS computer.

Design and programming of the system has beenperformed under the auspices of five institutions(California Department of Health, ConnecticutDepartment of Health Services, SEER program of theNational Cancer Institute, the University of Cali-fornia at Berkeley and Yale University), and hasbeen supported in part by the following contractswith the National Cancer Institute: NO1 CP 33235,NO1 CP 33353 and NO1 CP 61002.

804

CH1480-3/79/0000-0804$00.75 © 1979 IEEE

Page 2: Central Cancer Registry Data Management System

Input Documents

Tumor Record Abstract

The Tumor Record Abstract is prepared by thehospital registrars for most of the hospitals inConnecticut and by registry staff for State, Fed-eral, and out-of-state institutions. It is at-tached to the supporting documentation (historyand physical, operative notes, pathology reports,etc.) and submitted to the CTR on a periodic ba-sis, usually monthly. A copy is retained by thehospital for its files.

The Tumor Record Abstract, since it is preparedafter the discharge of the patient, contains allthe information known by the hospital at thattime, i.e., patient identification, demographic,diagnostic, and treatment information.

Upon receipt by the registry, the Tumor Recordis assigned a computer generated document numberlabel and then microfilmed. After completion ofmedical coding, the record is keyed and added tothe Master Tumor File after passing through vari-ous edits, linkage and geocoding. The RegistryNumber assigned by the Linkage Program is used asa key during processing and for accessing thebackup microf ilm rolls in the event reconstructionof the case is ever necessary.

Death Certificate Record

Death Certificate records containing a reportof a malignant neoplasm are supplied by the VitalStatistics Section on a monthly basis. The DeathCertificate information is used to provide deathclearance information for patients currently onthe registry's files and to provide a case-findingmechanism for cancer cases within the State whichwere not previously reported to the registry. Inaddition, the registry obtains Death Certificatesfrom other states, as well as Connecticut, forpatients who have been reported dead by othersources.

The Death Certificate information is added tothe patient's records on the Master Tumor File ona monthly basis and a Death Certificate facsimileis printed for each hospital which reported thepatient. Paper copies of the Death Certificateare filed with other portions of the patient'srecords in the registry f iles.

Follow-up

One month prior to the anniversary of last con-tact, the system generates a set of Follow-upCards, sorted by following hospital, for each eli-gible case. The Follow-up Card is used forobtaining the date of last contact and vital sta-tus only. If additional inf ormation on diagnoses,staging or treatment is available, a SupplementalFollow-up Form is filled out. Both forms containinformation identifying the patient and the tumoralong with the hospital code, following physi-cian's name and the date last seen as reported tothe registry. The Follow-up Card is pre-punchedwith the patient's registry number and tumor

identification and requires only the keying ofdate last seen and vital status code to be proc-essed.

The Follow-up Cards are destroyed after verifi-cation that processing has been completed cor-rectly.

Supplemental Follow-up

The Supplemental Follow-up Form is used forindicating the additional information on diagno-sis, staging or treatment which cannot be includedon the Follow-up Card. In conjunction with theFollow-up Card, it enables the following hospitalto provide the registry with a complete picture ofthe patient's status at the time of follow-up.

The Supplemental Follow-up Form is filed withthe patient's records after verification thatprocessing has been completed successfully.

Master File Change Document

The Master File Change Document is used tochange the contents of the records on the masterfile. It contains the patient's registry number,the key of the affected record, the items to bechanged, and the new contents of the items.

After processing has been completed and thechanges posted to the appropriate Tumor Record inthe paper files, the Master File Change Documentsare destroyed.

Suspense File Change Document

The Suspense File Change Document is used tochange the contents of records on the SuspenseFile. It can be used to modify, delete, listand/or release a Suspense File record and containsthe document identification number of the affectedSuspense File record, the action to be taken, and,for changes, the items to be changed and the newcontents of the items.

After processing has been completed and thechanges posted to the appropriate Tumor Record inthe paper files, the Suspense File Change Docu-ments are des troyed.

Linkage File Change Document

The Linkage File Change Document is used tochange the contents of records on the LinkageFile. It contains the surname, registry number,and sex of the aff ected record, the items to bechanged, and the new contents of the items.

After processing has been completed and thechanges posted to the appropriate Tumor Record inthe paper files, the Linkage File Change Documentsare destroyed.

Add AKA Document

The Add AKA Document is used to add a LinkageFile record representing an alias or maiden sur-name of a patient currently in the registry's

805

Page 3: Central Cancer Registry Data Management System

files. Incoming transactions which match eitherthe primary surname or any of the aliases will beassigned the same registry number. The Add AKADocument contains the registry number, primarysurname, and sex of the patient plus the new aliasor maiden surname, first name, middle initial, andas much of the date and place of birth, SocialSecurity Number, and address pertaining to thealias as are available.

After processing has been completed, the AddAKA Documents are filed with the patient's TumorRecord.

Tumor Consolidation Form

The Tumor Consolidation Form is used to specifythe combination of hospital and follow-up recordsto be associated with each independent neoplasmfor a patient and the consolidated diagnostic andtreatment data, when necessary, for each independ-ent neoplasm. A Tumor Consolidation Form wouldgenerally be completed when a problem in perform-ing consolidation is noted by the system or theregistry staff determines that the decisions ontumor consolidation by the system are incorrect.

The associated transaction causes any previ-ously existing tumor diagnostic and treatment rec-ords to be replaced on the Master Tumor File bythe newly defined tumor diagnostic and treatmentrecords. The hospital diagnostic and treatmentrecords and follow-up records are linked to theappropriate tumor diagnostic and treatment rec-ords.

The Tumor Consolidation Forms are filed by reg-istry number along with the other records for apatient. If a Tumor Consolidation Form is alreadyon file for the patient, it is replaced by the newreport and the old one is destroyed.

Document Control Deletion Document

The Document Control Deletion Document is usedto delete entries on the Document Control Filewhich represent unused document numbers originallycreated by the Document Number Label Program. Itcontains the initial and, if needed, final docu-ment numbers (with check digits) of the documentcontrol records to be deleted.

The Document Control Deletion Documents aredestroyed after processing is completed.

Master File Delete/List Document

The Master File Delete/List Document is used bythe registry staff to delete or list records onthe Master Tumor File. Selected records or anentire patient record set may be deleted. TheMaster File Delete/List Document contains thepatient's registry number, the key of the recordto be acted upon, and an action code to indicatethe action desired.

The Master File Delete/List Documents aredestroyed after processing has been completed.

Programs

Transaction Standardization Program

There are eleven different record types on theraw input file to the Transaction StandardizationProgram. Each record type consists of from one tofive card images which are consolidated and con-verted into the appropriate internal transactionformat for further processing.

After all input records have been processed,the resultant transactions are sorted by documenttype.

Document Control Programs

Critical documents entering the registry arecontrolled by the use of a unique document numberwhich becomes an integral part of the the resul-tant transaction record. The document numbers areproduced on adhesive labels by the Document NumberLabel Program which also makes an entry in theDocument Control File for each generated number.The labels are affixed to the applicable documentsand, in conjunction with the document type, becomethe key by which the record is identified in thesystem. In addition, they serve as a key to themicrofilm reel containing the images of the inputdocuments.

The system recognizes the receipt of a con-trolled document when it enters the Pre-Edit Pro-gram, where the document number on the record iscompared to the document numbers contained in theDocument Control File. If the document numbersmatch, the record is allowed to proceed and theDocument Control File is updated accordingly. Ifthe input document number is not found on the Doc-ument Control File due to the record already hav-ing been entered or the document number never hav-ing been generated, the record is written to theTransaction Error Report File.

Deletion of generated but unused document num-bers is performed in the Pre-Edit Program by meansof Document Control Deletion Documents. A listingof active document numbers, arranged in groupsaccording to their time on the Document ControlFile, is produced by the Document Control FileAnalysis Program.

The final stage of document control is accom-plished by the generation of Registry NumberLabels during the Master Tumor File Edit/Updateprocessing. These labels also contain the admis-sion and document numbers. The registry andadmission numbers become the permanent key to therecord while the document number reverts to beinga key to the backup microfilm rolls.

Pre-Edit Program

The Pre-Edit Program processes the input Trans-action File, the Suspense File and the DocumentControl File, and writes records to the Transac-tion Error Report File, the Linkage File UpdateFile, the Quality Control Report File, and twooutput Transaction Files.

806

Page 4: Central Cancer Registry Data Management System

The program checks off incoming controlled doc-uments against the contents of the Document Con-trol File and performs complete intra-field edit-ing on all input records and those released f romthe Suspense File. In addition, it checks for thepresence and validity of any information neededfor subsequent processing steps, e.g., key infor-mation on change documents and linkage informationon Tumor Records. The program also generates asurname phonetic code for use by the Linkage Pro-gram.

The Pre-Edit Program performs quality controlfunctions by selecting a specified percentage ofprocessed Tumor Records for reabstracting and pro-viding data on coder performance, follow-up re-sults and queries for additional information.

Transactions which are found to contain errorsare written to the Transaction Error Report Filewith the appropriate error flags set. Tumor Rec-ords and Death Certificate Records are also writ-ten to the Suspense File for subsequent correctionby Suspense File Change Documents.

Transactions which pass all the edits are writ-ten to the appropriate output file depending onthe next stage of processing needed. Tumor Rec-ords and Death Certificates needing linkage and/orgeocoding are written to a transaction file to beprocessed by the Linkage and/or Geocode Programs.Linkage File Change Documents and Add AKA Docu-ments are written to the Linkage File Update File.All remaining transactions are written to a trans-action file to be processed by the Master FileEdit/Update Program.

The Pre-Edit Program produces four reports sum-marizing the results of each run:

1. A Status Report containing tabulations by doc-ument type and name, indicating the number oftransactions processed, the number of errorsfound, and the destination of the records passingthe edits.

2. A Suspense File Change Document ApplicationReport detailing the results of the actionsdirected to the Suspense File.

3. An Unknown Document Report listing the con-tents of the first eighty characters of transac-tion records which were not defined to the system.

4. A System Error Report which, if it occurs,indicates an unresolvable problem was encounteredduring processing.

Transaction Error Report Program

The Transaction Error Report Program reads theTransaction Error Report File produced by eitherthe Pre-Edit Program or the Master FileEdit/Update Progr-am and generates a set of reportsdetailing the errors detected. Each document typeis handled with a separate report format; however,the formats are similar and generally consist ofthe patient's identifying information followed bya list of the data elements in error along with

their contents. Errors which are applicable tothe document rather than to a specific data ele-ment are printed after the specific data elementerrors. Due to their complexity, Tumor Consolida-tion Forms are listed in their entirety, arrangedby tumor groups.

Suspense File Change Document Application ReportProgram

The Suspense File Change Document (SFCD) Appli-cation Report Program reads the SFCD ApplicationReport File and produces reports detailing theresults of SFCDs processed by the Pre-Edit Pro-gram. For SFCDs applying changes to records onthe Suspense File, the report indicates thepatient's identifying information and then liststhe specific data elements changed along withtheir new contents. For SFCDs requesting arelease, list, or delete action, the basic patientidentifying information from the affected recordsis listed.

Address Standardization Program

Input to this program consists of a transactionfile containing all the Tumor Records and DeathCertificates which need linkage and/or geocoding.The address, if it exists, must be standardized tofacilitate accurate matching by the Linkage andGeocode Programs. The program standardizes streetabbreviations and attempts to convert the townname to one of the 169 standard towns currentlydefined in Connecticut.

Linkage Program

The Linkage Program processes a transactionfile containing Tumor Records (without registrynumbers) and Death Certificates which has beensorted on the surname phonetic code, sex, and rec-ord type. It writes matched records to a transac-tion file for use by the Master File Edit/UpdateProgram, adds records to the Suspense File whichare possible or multiple matches, and adds LinkageFile Change Documents to the Linkage File UpdateFile. The program performs two basic functions:assigning registry numbers and performing deathclearance.

Assigning Registry Numbers. Before beingincluded in the Master Tumor File, a Tumor Recordmust be assigned a registry number. Since regis-try numbers are assigned to patients rather thantumors, the Linkage Program attempts to decidewhether or not each incoming transaction corre-sponds to a patient already known to the system.If so, the Linkage Program assigns the registrynumber previously assigned to that patient. If,on the other hand, the program decides that therecord represents a patient not previously knownto the system, the program assigns a new registrynumber to it. The decision is made by a matchingalgorithm which employs a file of weights derivedfrom a statistical analysis of a sample of thepopulation from which the Tumor Records areobtained.

807

Page 5: Central Cancer Registry Data Management System

If, for some records, the program cannot decidewhether or not the record corresponds to a previ-ously known patient, i.e., the record may appearto match two or more patients, the program writesthe transaction to the Suspense File and informsthe user, through a report, that no decision canbe made.

Death Clearance. The Linkage Program examinesDeath Certificate Records from the Vital Statis-tics Section to determine whether patients knownto the system have died. The matching algorithmused is similar to the one used above with onemajor difference: when assigning registry numbersto Tumor Records, the program refers multiplematches to the user for manual resolution; how-ever, when doing death clearance, the programattempts to maximize the number of matches andchooses the Death Certificate with the highestmatch weight.

The following reports are produced by the Link-age Program:

1. Positive matches between Linkage File recordsand Tumor Records or Death Certificates.

2. Possible matches between Linkage File recordsand Tumor Records or Death Certificates.

3. Matches within the set of incoming Tumor Rec-ords.

4. A summary report giving record counts and sta-tistical inf ormation.

Suspense File Analysis Program

The Suspense File Analysis Program scans theSuspense File and produces a listing of all therecords on the file grouped by retention timeperiods. The listing may be sorted by documentnumber if desired. The listing contains patientidentifying information, the date the record wasadded to the Suspense File, the program which putit there, and the physical location on the Sus-pense File. A summary report containing a cross-tabulation of document type by retention time isalso produced.

Geocode Programs

These programs are obtained from the U.S. Cen-sus Bureau [1,2] and are a generalized system forgeocoding addresses. Tables and parameters aresupplied to the programs to tailor their perform-ance to this application.

The system consists of Pre-Processor, Matcher,and Post-Processor programs supplemented by astandard sort operation. The purpose of the Pre-processor Program is to create a street address ina standardized format for use by the Matcher Pro-gram. This is done by interpreting, through aseries of tables, the address information in eachinput record and formatting it into a 75-charactermatch key. Records for which the address cannotbe interpreted are written to a reject f ile.

The accepted records from the Pre-processor aresorted on the match key and matched against a geo-code reference file [31 by the Matcher Program.When a match occurs, the census tract is transfer-red from the reference file record to the inputrecord. The Matcher Program produces a listing ofall input records for which it was unable to finda match.

The Post-processor Program reunites the trans-actions rejected by the Pre-processor Program withthose which have gone through the Matcher Programto create a unified transaction file which ispassed to the next system module (Master FileEdit/Update Program -- see System Flowchart).

The census tract is moved into the standardtransaction record and the match key is strippedoff. The program produces a list of rejected rec-ords in a format to facilitate manual geocoding.

Master File Edit/Update Program

The Master File Edit/Update Program maintainsthe Master Tumor File by processing the incomingtransactions against the Master Tumor File fromthe previous cycle.

The Master File Edit/Update Program updates theMaster Tumor File by adding new information fromTumor Record Abstracts, Death Certificates, andFollow-ups to a patient record set; by deleting orlisting patient record sets; and by applying thechanges indicated on Master File Change Documents.For changes to a patient record set that also af-fect the Linkage File, a Linkage File Change Docu-ment is generated. The program insures the con-sistency of the data recorded in a patient recordset by performing inter-field edits within eachrecord in a patient record set and by performinginter-record edits on all records present in apatient record set. To insure that the most accu-rate information is available for analysis, theprogram attempts to consolidate data reported fromseveral sources. The consolidated data is main-tained in the patient header record and tumordiagnostic and treatment records for the patient.If the program is unable to consolidate the inf or-mation due to inconsistencies in the data, a man-ual resolution is requested. The necessarychanges are made by either a Master File ChangeDocument or a Tumor Consolidation Form.

The Master File Edit/Update Program generatesfour files:

1. A new Active Master Tumor File.

2. A Consolidation Report File consisting ofinformation on all problems found by the programduring consolidation processing along with all thedecisions made by the program.

3. An Operational Report File consisting of thedata to be included in the current set of opera-tional reports.

4. A Transaction Error Report File containing therecords rejected by the program due to errors

808

Page 6: Central Cancer Registry Data Management System

along with inf ormation on the nature of theerrors. Any Linkage File Change Documents gener-ated by the program are added to the Linkage FileUpdate File.

Consolidation Report Program

The Consolidation Report Program processes theConsolidation Report File to generate the reportscommunicating the results of the processing of thecurrent set of transactions against the MasterFile. The following reports are produced:

1. Inter-record edit problems.

2. Inter-f ield edit problems.

3. Results and problems encountered in consoli-dating patient and tumor inf ormation.

4. List of deletions.

5. List of rejected transactions.

6. List of patient record sets requested by reg-istry personnel.

Operational Report Program

The Operational Report Program processes theOperational Report File to generate the lists andturnaround forms used by the registry in its nor-mal operations. The program produces the follow-ing sets of reports:

1. Follow-up Cards and forms with appropriatecontrol lists.

2. Lists of patients for whom reporting is incom-plete, e.g., hospital follow-back lists.

3. Death Certificate reports, e.g., Death Certif-icate facsimiles, and requests to the Vital Sta-tistics Section for copies of the Death Certifi-cates.

4. Supplement to the alphabetic patient direc-tory.

5. Reports to the hospital, e.g., accessionlists, notification of follow-up information fromother sources, and tabulating cards.

Linkage File Update Program

The Linkage File Update Program maintains theLinkage File. By means of a parameter on a systemcontrol card, the user may run the program ineither an update mode or a creation mode.

The standard mode of operation is the updatemode. In this mode, the program reads transac-tions from the Linkage File Update File which wascreated by the Pre-edit Program and added to bythe Linkage Program and the Master File Edit/Update Program. Depending upon the transactiontype which is read, the program performs one ofthe following functions:

1. Deletes a specific record from a patient'srecord set.

2. Deletes all records for a patient.

3. Adds a record for a new patient or an AKA rec-ord for an existing patient.

4. Changes the value of a data item in a specificrecord or in all records of patient's record set.

5. Changes the sex or surname phonetic code on apatient's records, i.e., modify the record key.

6. Lists a patient's record set.

Checks are performed to make certain that thekey which links a patient's Master Tumor File rec-ords and Linkage File record set is not lost andthat the pointers which connect all the records ina patient's record set are maintained.

The program is run in the creation mode whenthe Linkage File is first generated. In this sit-uation, an input file must be provided which con-tains only add AKA transactions.

The Linkage File Update Program produces areport of the linkage records deleted, added orchanged along with any requested lists of patientrecord sets and a summary of errors encountered inthe input transactions.

Extract Program

The Extract Program reads the Master Tumor Fileto produce the Extract File by transforming a pa-tient's record set into a set of records summariz-ing each admission for a neoplasm. If problemsare known to exist in the patient's record set onthe Master Tumor file, the appropriate indicatorsare set since these problems may preclude the useof the data for analysis.

Data Collection Monitoring Program

The Data Collection Monitoring Program proc-esses the Extract File and produces a set of re-ports for use by the registry staff in their eval-uation of the data collection effort. The reportscontain both cumulative (year-to-date) and currentexperience.

Reports produced by the Data Collection Moni-toring Program are:

1. Case reporting by hospitals to date as com-pared to the total reported in the previous year.

2. Death Certificates not received more than sixmonths after death.

3. Follow-up status of all cases under currentfollow-up.

4. Quality of case reporting, including the per-centage of cases with microscopic confirmation,radiological reports, and laboratory data.

809

Page 7: Central Cancer Registry Data Management System

Inactive Case Migration Program

Annually, the Inactive Case Migration Programgenerates new Active and Inactive Master TumorFiles from the current files. The program mi-grates a patient's record set from the Active Mas-ter Tumor File onto the Inactive Master Tumor Fileif the patient meets the criteria for becominginactive.

The registry staff provides a set of registrynumbers for those patient record sets which are tobe migrated from the Inactive Master Tumor Fileonto the Active Master Tumor File. This set ofregistry numbers corresponds to those patients onthe Inactive Master Tumor File for whom one ormore transactions have been received during theyear.

The program generates the following reports:

1. List of registry and microfilm roll numbersfor those patients whose record sets are to bemigrated to the Inactive Master Tumor File.

2. List of registry numbers for those patientswhose records are to be migrated back to theActive Master Tumor File.

3. List of registry numbers for patients whoserecords were to be migrated from the Inactive Mas-ter Tumor File to the Active Master Tumor File butwhose records could not be found.

4. Tabulations on the status of the Active andInactive Master Tumor Files.

5. Alphabetic list of all patients on the Inac-tive Master Tumor File (on microfiche).

6. Alphabetic list of all patients on the ActiveMaster Tumor File (on microfiche).

Weights File Generation Programs

The generation of a Weights File for use by theLinkage Program requires four programs and is doneinitially when the system is installed and there-after whenever the statistical staff at the regis-try decide that there have been significantchanges in the transcription error rate or omis-sion rate of any item used in the linkage process.

The Error Rates Analysis Program processes afile representing a random sample of existingmatched records in the Master File and, for eachdata item used in the linkage process, computes a

set of error probabilities [4].

A file containing an array of these error prob-abilities is the only output from the Error RateAnalysis Program.

The Frequency Analysis Program computes fre-quency distributions for the demographic informa-tion used in matching and various statisticsneeded for the weights computation. It processesa file representing a random sample of the popula-tion from which the Tumor Records are being

obtained and produces a work file containing thedistributions and statistics mentioned above.

The Weights Creation Program processes thefiles generated by the previous two programs inconjunction with a set of table sizes provided bythe registry staff. The table sizes are the num-ber of surnames in the surname table, the numberof first names in the first-name table, and thenumber of birthplaces in the birthplace table.The program computes the weight assigned to eachelementary agreement and disagreement configura-tion used in matching and produces a file and alisting of the weights assigned.

The Weights Loading Program combines the fileproduced by the Weights Creation Program and afile containing the default match thresholds (asdetermined by the registry staff) into the WeightsFile. The Weights File contains six records:

1. A record containing all the weights.

2. A record consisting of the surname table.

3. A record consisting of the first-name table.

4. A record containing the hash table used by thesurname search algorithm.

5. A record consisting of a birthplace table.

6. A record containing the default match thresh-olds.

Table Maintenance Program

The creation and maintenance of the Edit Con-trol, Label Set and Format Description f iles isperformed by the Table Maintenance Program. Inputto the program is a set of commands describing thevalid data elements, their possible values andcorresponding textual labels, and their locationwithin each record type. The commands and associ-ated data may be contained in a file created usinga text editor such as WYLBUR or TSO, or they maybe punched on a deck of cards. The program per-mits the registry staff to alter the existing editcriteria, record format, and label values withoutthe need for programmer intervention in mostcases.

System Files

Transaction Files

The various records and forms submitted by theregistry staff are maintained in the system on avariety of transaction files. These files aredifferentiated by the subset of transactions whicheach contains.

Linkage File Update File

The Linkage File Update File contains thetransactions used by the Linkage File Update Pro-gram to update the Linkage File. The transactionsconsist initially of Linkage File Change Documents(LFCDs) which are added to the file by the

810

Page 8: Central Cancer Registry Data Management System

Pre-Edit Program, the Linkage Program, and theMaster File Edit/Update Program. Prior to proc-essing, the individual LFCDs are broken down intosmaller transactions representing the individualactions to be effected against the Linkage File.The possible actions which can be performed are:delete one or all records for a patient, add arecord, change a record component and list thepatient's record set.

Linkage File

The Linkage File is maintained by the LinkageFile Update Program and is used by the LinkageProgram to match incoming transactions with exist-ing patients.

The Linkage File uses an Indexed SequentialAccess Method (ISAM) for efficiency. The recordkey consists of the concatenation of the surnamephonetic code, sex, and registry number.

A single patient may have several records onthe Linkage File, each having the same sex andregistry number but with differing surname pho-netic codes. One of the records (the first knownto the system) is designated as the primary recordfor the patient. This record has the same surnamephonetic code as found on the Master Tumor Filefor the patient. The primary record cannot bedeleted from the Linkage File unless all recordsfor the patient are removed from the system. Allthe records for a given patient are associated bya closed chain of pointers.

Edit Control Files

The Pre-Edit Program utilizes data contained inthe Edit Control Files to determine the validityof the contents of incoming transactions. TheEdit Control Files consist of the Data ElementEdit File and the Single Digit Edit File.

The Data Element Edit File contains informationon the valid codes permitted for each data elementin the system. In addition, where applicable, itsupplies the entry number to the Single Digit EditTable or to one of the special edit routines inthe Pre-Edit Program.

The Single Digit Edit File contains informationon the valid codes (other than blank) permittedfor each single digit data element. The possiblecodes are space, 0 to 9, dash (-), and ampersand(&). This file is used by the Pre-Edit Program tocreate the Single Digit Edit Table, access towhich is through an entry number contained in theData Element Edit Table (created from the DataElement Edit File).

Label Set Files

The Label Set Files contain the valid codes andlabels for those data elements for which a textualrepresentation is required at some point in thesystem, e.g., town code and name, hospital codeand name, and codes and descriptions for causes ofdeath.

For ease in processing, there are two Label SetFiles: one for single-digit data elements onlyand one for all other data elements needinglabels.

Format Description Files

The Pre-Edit Program, the Transaction ErrorReport Program, the SFCD Application Report Pro-gram, the Suspense File Analysis Program, and theDeath Certificate Translation and Merge Programall use the Format Description Files to obtain thedata needed for determining data element locationand length.

The files contain information on the data ele-ments present on each transaction record, includ-ing their starting position and length. The InputFormat Description File defines the formats of therecords on the transaction file input to the Pre-Edit Program. The Internal Format DescriptionFile defines the formats of the transaction filerecords throughout the rest of the system.

Weights File

The Weights File contains records indicatingthe weight to be assigned to each of the matchcriteria used in the linkage process. The valuesof the weights are derived from an analysis of thedemographic characteristics of the registry's pop-ulation base.

Master Tumor File

The Master Tumor File is used by the system asthe repository for demographic, admission, diag-nostic, treatment, and follow-up information abouta patient reported to the registry. It is main-tained on two separate physical files for process-ing efficiency. The Inactive Master Tumor Fileconsists of patient record sets for those patientswho have been determined to be inactive. TheActive Master Tumor File consists of all otherpatient record sets, i.e., the patient record setsof those patients for whom new information canreasonably be expected to be received by the reg-istry.

The Active Master Tumor File is maintained bythe Master Tumor File Edit/Update Program. TheInactive Master Tumor File is maintained by theInactive Case Migration Program. The Extract Pro-gram uses the data on both Master Tumor Files togenerate the Extract File.

The Master Tumor File is a hierarchical fileconsisting of patient record sets. A patient rec-ord set is defined as all the records on file fora patient. The records which may be included in apatient record set are:

1. Patient Header Record -- contains patient keyinformation, demographic data which should notchange over time, and the Death Certificate inf or-mation received from the Vital Statistics Section.

811

Page 9: Central Cancer Registry Data Management System

2. Tumor Diagnostic and Treatment Records -- con-tain the consolidated tumor diagnostic and treat-ment data f or a neoplasm.

3. Hospital Admission and Treatment Records -contain the admission, diagnostic, and treatmentinformation for a particular hospitalization andneop lasm.

4. Follow-up Record -- contains the follow-upinformation for a neoplasm.

The hierarchical relationship among the recordsin a patient record set is illustrated by:

Patient Header RecordTumor Diagnostic and Treatment Records

Hospital Admission and Treatment RecordsHospital Admission and Treatment Records

Follow-up RecordTumor Diagnostic and Treatment Records

Hospital Admission and Treatment RecordsFollow-up Record

Every patient record set contains a PatientHeader Record; the other records do not need to bepresent. Tumor diagnostic and Treatment Recordsare present only if the independent neoplasm wasreported f rom multiple sources. Hospital Admis-sion and Treatment Records are present for eachreported hospitalization. A Follow-up Record ispresent for each independent neoplasm that is be-ing actively followed or for which follow-upinformation has been received. The minimal pa-tient record set consists of either a PatientHeader Record containing Death Certificate inf or-mation or a Patient Header Record and a HospitalAdmission Record.

On the Master Tumor File, patient record setsare sequenced by registry number. To link allrecords pertaining to the same neoplasm, a tumoridentification number is appended to the TumorDiagnostic and Treatment Records and to the Hospi-tal Admission and Treatment Records. An admissionnumber is also appended to Hospital Admission andTreatment Records so that the registry staff isable to refer to individual records.

Consolidation Report File

The Consolidation Report File consists of rec-ords containing information on the results of con-solidation operations performed on a patient'srecord set. The Consolidation Report Programprocesses this file to generate the consolidationreports.

Operational Report File

The Operational Report File consists of recordscontaining the information to be included on theoperational reports. Each record contains an in-dication of the reports on which it is to beincluded along with the actual data to be printed.The Operational Report Program organizes this fileand prints the reports.

Extract File

The Extract File contains one record for eachhospital admission on the Master Tumor File and isproduced as often as needed by the Extract Pro-gram. It is used for statistical purposes by theregistry staff and other researchers and is struc-tured for ease of use with standard software pack-ages such as Datatext, TPL, and SAS. The file isalso read by the Data Collection Monitoring Pro-gram to produce reports on the current status oftumor reporting and adequacy of quality control.

References

(1) U.S. Bureau of the Census, Census Use Study:ADMATCH Users Manual, Washington, D.C., 1970.

(2) U.S. Bureau of the Census, Census Use Study:OS ADMATCH: An Address Matching System,Washington, D.C., 1970.

(3) U.S. Bureau of the Census, Census Use Study:DIME: A Geographic Base File System, Wash-ington, D.C., 1972.

(4) I.P. Fellegi and A.B. Sunter, A Theory forRecord Linkage, JASA, December 1969, pp.1183-1210.

812

Page 10: Central Cancer Registry Data Management System

Figure 1: System Flowchart

813