60
THE CONCOR (CONSISTENCY AND CORRECTION) EDIT AND IMPUTATION SYSTEM: ITS ADEQUACY AND STATE OF COMPLETION A Report Prepared By: FREDERIC J. GRANT During The Period: JANUARY 7-31, 1980 Under The Auspices Of The: AMERICAN PUBLIC HEALTH ASSOCIATION Supported By The: U.S. AGENCY FOR INTERNATIONAL DEVELOPMENT OFFICE OF POPULATION, AID/DSPEC-C-0053 AUTHORIZATIOIN : Ltr. POP/FPS: 12/3/79 Assgn. No. 582-012

THE CONCOR (CONSISTENCY AND CORRECTION) ITS …

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …

THE CONCOR (CONSISTENCY AND CORRECTION)

EDIT AND IMPUTATION SYSTEM

ITS ADEQUACY AND STATE OF COMPLETION

A Report Prepared By FREDERIC J GRANT

During The Period JANUARY 7-31 1980

Under The Auspices Of The AMERICAN PUBLIC HEALTH ASSOCIATION

Supported By The US AGENCY FOR INTERNATIONAL DEVELOPMENT OFFICE OF POPULATION AIDDSPEC-C-0053

AUTHORIZATIOIN Ltr POPFPS 12379 Assgn No 582-012

PREFACE

Since June 1979 a major design of the COBOL CONCOR edit and imputation system has been undertaken by the International Statistical Programs Center (ISPC) of the US Department of Commerce Bureau of the Census A one-day program held October 10 1979 previewed enhancements which were planned to be implemented to the system Based upon the information furnished at that workshyshop I uidertook an interim review of the state of completion of the COBOL CONCOR package The result of that review was a working document entitled Report on the Developing COBOL CONCOR Edit and Imputation System At the timeof that writing the system was not in a sufficient degree of completion to definitively gauge its adequacy for exportation to developing countries This current publication COBOL CONCOR 1980 Its Adequacy and State of Completion while substantial in its own regard can best be understood in light of that previous report

On January 7-19 1980 I attended a workshop designed to provide particishypants with an in-depth explanation of the full range of capabilities the new COBOL CONCOR supports During this time I was able to learn the new CONCOR language and conduct tests bearing on the adequacy and completeness of the system Results from these test programs comprise parts of many of the Appendices

On January 18 1980 I was debriefed at the Office of Population Agency for International Development Rosslyn Virginia over the specific areas which compose the body of this report In this instance any comments of a critical nature about CONCOR must be preceded by a statement attesting to the compeshytence and dedication of the ISPC staff who have done an extraordinary job in redesigning and rewriting many of the programs comprising the system since October 10 1979

Though my experience with systems analysis and design utilizing the COBOL programming language encompasses three years local circumstances and specialishyzation are important considerations The discussions of this report are based on my overall experience in the data processing field and how I think they apply to the development of CONCOR At the time of the writing of this report I am the Senior Systems Analyst and the Director of Data Base Administration for the Georgia World Congress Institute a state-operated nonprofit research organization located in Atlanta Georgia

ii

CONTENTS

Page

PREFACE

EXECUTIVE SUMMARY iii

I BACKGROUND 1

II THE ADEQUACY OF CONCOR 3

III PROPOSED CHANGES TO THE LANGUAGE STRUCTURE 5

IV THE ADEQUACY OF COBOL CONCOR DOCUMENTATION 12

V CONCLUSION 14

APPENDICES

Appendix A Appendix B Appendix C Appendix D Appendix E Appendix F Appendix G Appendix H Appendix I Appendix J

- Bucen Enforcement Proposal - Evaluative Criteria - Workshop Itinerary - Participants - CONCOR Evaluation Form - NewOld Command Comparisons - ISPC Future Enhancement List - CONCOR System Internal Variables - CONCOR-EDITOR Execution Statistics - Diagnostic Message Guide Example

0

EXECUTIVE SUMMARY

The December 1979 COBOL CONCOR (Version 2) is a much improved software package All commands appear to be functional however the system should be exhaustively tested by an independent agency prior to its general release This agency Should also precisely determine the systems relative speed and core processor requirements While the system (exclusive of documentation) could immediately be utilized in a situation of extreme need some CONCOR language coding inconsistencies detract from the learnability and exportashybility of the package and should be corrected Additionally there are other modifications or adjustments which would enhance the overall utility and productivity of the language in census and survey applications

System documentation continu~s to be a problem There is no Users Guide The Systems Manual though well-constituted informationally should be thoroughly reorganized in accordance with the guidelines set forth in this report

The staff of ISPC exhibited competence and professionalism in the conduct of the two-week workshop January 7-18 1980 ISPC generally is aware of both the potential and shortcomings of the CONCOR project The current CONCOR version makes a significant redesign of the overall system As a package it is in a state where its completion is within reach

I BACKGROUND

CONCOR (an acronym of Consistency and Correction) is best characterized as a software tool designed to expedite the processing of data files duringthe edit and imputation phase of population census and surveys As a metashycompiler written in the COBOL language the system reads and verifies CONCOR language statements to produce an executable EDITOR program The objectiveof this process is the creation of an error-free file which can be used at a later time for tabulation purposes

Since its release as Version 1 December 30 1978 numerous elements of the COBOL CONCOR system have undergone continual change and redefinition Infact the system has not been permitted to stand still for any period of time nor has it been exhaustively tested In June of 1979 ISPC suspendedthe further distribution of the COBOL CONCOR system This decision was based principally upon reports of the packages unsatisfactory performance at workshops held in Panama and Thailand ISPC upon their own initiative developed a proposal to overhaul CONCOR and its accompanying documentation This proposal is contained in Appendix A representing an ambitious undertakingWhile not all of the desired changes and capabilities could be implementedVersion 2 of December 1979 represents a significant managerial effort The questions are now whether COBOL CONCOR Version 2 will be a demonstrably adeshyquate sofrvare package -- a package capable of exportation to developingcountries -- a package requiring no further modification The purpose of this report is to address these critical issues In connection with this Appendix B sets forth the specific criteria around which such a discussion must evolve As this is not intended to be a compendium some of these broader issues will be immediately treated following chapters and appendiceswill qualify the exact nature of system altu ations already undertaken as well as further adjustments believed to be essential in realizing the goals of the systems philosophy

Workshop

During the period of January 7-18 1930 a workshop was held under the sponsorship of the ISPC to demonstrate the capabilities of the latest reshyvision of the COBOL CONCOR software package A schedule of events of this workshop is contained in Appendix C This workshop was intended to provideparticipants with the opportunity to program in the CONCOR language and to thereby test aspects of the system as individually appropriate A listing of the participants and the international organizations they represented is

EDITOR is the new name of the EXECUTOR module of previous language versions

A complete history of the development of COBOL CONCOR can be found in both the ACCENTER 1978 Version 1 and December 1979 Version 2 systemsmanuals as well as in previous consulting reports

2

contained in Appendix D During the concluding days of the workshop each participant was asked by ISPC to provide a written evaluation of the now-called December 1979 version of CONCOR This evaluation form Appendix E also inshycludes space for comments concerning the competence of the system documentation as well as any additional comments including these regarding the organization and clarity of workshop presentations It is assumed that in the near future summaries of these comments will be available to interested agencies

While virtually all instructional aspects of this two-week workshop were conducted in a highly professional manner -- a manner which revealed a high degree of coordination among staff members in their efforts -- there are several areas which future workshops may improve upon

1 All publications should be assembled in their entirety and proof-read prior to distribution

2 A complete CONCOR language program example and accompanying 110 documents should be provided at the onset of the workshyshop for reference

3 Numerous short application programming problems involving all CONCOR language divisions should be utilized in place of a single lengthy problem

It is noted that this workshop was not intended to teach the CONCOR language as the organization and presentation of materials probably would have been different It is believed that the two-week time period was sufficient time to provide participants a familiarity with the use of the new CONCOR features especially in light of the fact that workshop participants were permitted to work weekends and beyond normal working hours at their disshycretion Though funding was not generally available it is known that several workshop members chose to extend their stay inWashington to continue testing the COMCOR package or to work on projects which they could attempt to immedishyately install on their home computers At the conclusion of the workshops participants were permitted to take with them an installation tape of CONCOR as well as all the other materials they had acquired during the course of the project

3

II THE ADEQUACY OF CONCOR

CONCOR has been described by its designers as an adequate packageAdequacy as an evaluative criteria is often relative to need and should not be confused with readiness as an issue The CONCOR system exclusive of documentation is sufficiently corplete that in a situation of extreme need it could be used as a data-cleaning tool in the editing and imputation phaseof census processing Less extreme circumstances would impose reticence on such an endorsement Though non-exhaustive tests indicate that CONCOR appearsto be capable of performing all of the commands as implemented because of the rapidness with which the system was rewritten it is thought that there has not been enough time to fully test all aspects of the project Thereforeprior to its general dissemination it is recommended that an independent agency conduct exhaustive tests to certify the integrity of the system proshygrams The importance of this certification cannot be understated in lightof previous workshop experiences Concurrent with this testing process the same agency should determine the relative speed and size of the system under actual production circumstances and further determine CONCORs ease of nstalshylation Later sections of this discussion set forth additional testing recom endations

It is generally recognized that of all the data-cleaning tools available for exportation CONCOR is potentially the most powerful especially with the addition of its new commands as outlined in Appendix F While its utility is not in doubt one must ask the question of how much more useful could CONCOR be if modified and would this additional utility be worth the costs involved The nature of modifications (excluding documentation) to COBOL CONCOR approprishyate at this time for cnsideration are threefold

1 Adjustments to the elements of the system which are internallyinconsistent or awkward to facilitate its learnability and usability am ig developing country programmers

a Implementation of the division headings DICTIONARY-DIVISION EXECUTION-DIVISION and REPORT-DIVISION De-emphasis of section headings

b Improvement of the consistency among data identifiers allow alphanumeric variables to be coded without mandatory comshyparison strings throughout the DATA-DIVISION and to be of the same length of numeric variables Permit numeric identifiers to be of an equal length to NEW DATA identishyfiers Permit the coding of single dimension row and column vectors in the same manner as multi-dimensional arrays

2 Implementation of selective commands and internal variables to facilitate the production environment use of CONCOR in census applications These include

4

a LOADUNLOAD arrays Commands which would save and replace automatically hot-decked values from batch to batch

b TOTAL-QUESTIONNAIRE-COUNT-RECORD-COUNT internal

variables independent of AREA CONTROL

3 Other modifications

a Default values for max-storage parameter set in realistic range

b Allowance of more variables for survey applications

Some of these modifications are part of what ISPC calls its wish list for the future development of CONCOR This document has been included in this report as Appendix G It is arguable that these features are essential to the completion of the CONCOR package While it is beyond the scope of this report to draw a conclusion in this area the enhancements as outlined above are ones that would make the language more internally consistent and thereby easier to learn and apply to a census data production environment These modifications are not arbitrary or cosmetic but are a direct result of handsshyon programming experience in the language as well as observations and disshycussions with other workshop participants While it is probably impossible to ever be satisfied with the overall structure of any programming language the resolution of this issue of completeness must be made relative to the objecshytives for developing the COBOL CONCOR system in the first place An explicit statement of these objectivEs has been absent in all systems documentation to date

5

III PROPOSED CHANGES TO THE LANGUAGE STRUCTURE

Based upon the assumption that it is the intent uf sponso-ing agenciesto optimize the COBOL CONCOR package -- a goal which is believed currentlyobtainable -- an understanding of the nature of these changes and how theywould impact users is essential Appendix F sets forth in a comparative manner differences between the old December 1978 and the new December 1979 editions of CONCOR Studying this appendix obviates the fact that while the new version of the language is clearly superior to the old in nearly everyaspect the basic and overall structure of the language is essentially unshychanged Compartmentalization of aspects of the language into divisions represents a significant ideological enhancement to the language Indeeddevelopment of programs by divisions proved to be an extremely useful way of understanding the nature of editing work to be performed However note that while the END-DIVISION comnand is essential to the language the division headings DICTIONARY-DIVISION EXECUTION-DIVISION and REPORT-DIVISION were not implemented and are therefore preceded by a period to be treated as comment lines in the program listing It is inconsistent to implement END-DIVISION commands while not implementing the division headings It is believed that this division structuring is important enough to the overall organizational structure of a CONCOR source language program that it should be implementedprior to general distribution The section headers shown on the figures inAppendix F however are another matter They are cumbersome and were generally not coded by workshop participants and they could be deleted from this version of the language altogether with little loss in organizational understanding The CONCOR language is significantly powerful to stand on its own as a distinct product and is not meant to be a COBOL imitation its present degree of development and specialization do not warrant the structural drag of additional section identifiers is the probable intent of the originalCONCOR project was to develop a package which was uncomplicated and unwieldy to use The question of division and section names implementation while seemingly cosmetic can have real impact on its perceived easiness of learning and use

Figure 1 on the following page illustrates common mistakes programmersmake coding numeric and alphanumeric variables in the DATA-DIVISION These mistakes are the result of the inconsistent variable formats For instancein the numeric data definition statement it is permissible to specify 19-23 where N signifies numeric 9 signifies the length of the item and 23 specifiesthe starting position in the record In NEW-DATA however it is possible to code an item with a maximum length of 18 While on the surface this inconshysistency would seem harmless typically some data defined user variables in NEW-DATA defined N18 could be moved inadvertently to output record fields defined by a data definition statement 119-23 such an action would result in a data error Under certain circumstances itwould be highly desirable to output these larger length values A similar circumstance exists between the numeric and alphanumeric data coding conventions While the maximum lengthof the numeric is permitted to be 9 in the data definition statement (18 in NEW-DATA) the maximum alphanumeric variable is permitted to be only 4 characters in length In the current systems manual it is recommended that

FIGURE 1

DICTIONARY-DIVISION

DICTIONARY-NAME DATA-CODING-EXAMPLE

INPUT-FILE

OUTPUT-FILE

AREA-CONTROL N2-2 N2-4 N3-6 N2-9 N9-23 QUESTIONNAIRE-CONTROL A4-2 A3-6 A2-9 A3-11 A3-14

RECORD-CONTROL Al-l

DEFINE-RECORD

HOI-TYPE-OF-HOUISING-UNIT Nl-17

H02-MATERIAL-OF-ROOF N1-19 10 9

H03-TOTAL-PERSONS-IN-UNIT N8-40 NOT-NUMERIC BLAIK

1104-STATE-OF-UIIII-CODE A4-50 0 U 1 D

DEFINE-RECORD

P01-SEX 1-13 W F

NEW-DATA

NOI-SAVE-TYPE-OF-HOUSING-UNIT

N02-SAVE-TYPE-OF-ROOF 1

N03-COUNT-TOTAL-IN-UNITS 10 0

N04-AGGREGATE-INCOME 18 0

END-DIVISION

Explanations

N2-4 This is an example of an external numeric input data item (N) with a length of 2 bytes starting in column 4 of the input record The maximumlength of this type of variable outside of NEW-DATA is 9 When coded in

NEW-DATA 18 is permitted

A4-2 This is an example of an external alphanumeric input data item (A)

with a length of 4 bytes starting in column 2 of the input record This

construction for alphanumeric variable is valid only in the control stateshyments Additionally it can never be over 4 bytes in length When alphshynumeric data fields are defined within record types the EDITOR program

requires that the comparison strings always be specified A maximum of 3 is permitted The purpose of these strings is to force recode the data to a numeric value If no match is found EDITOR automatically assigns a unique negative value to the field

7

alphanumeric coding be utilized in the QUESTIONNAIRE-CONTROL and RECORD-CONTROL statements where each input data item must be of the same data type as shown in the example When alphanumeric data variables are used in these control stateshyments their construction is identical to that of numeric items However when used elsewhere in the DATA-DIVISION alphanumeric variables are required to specify one of three possible comparison values as shown There are number of production instances when it never would be necessary or even desirable to reshycode alphanumeric data However as CONCOR attempts to force data into a totally numeric format upon output there is no current way to preserve these values if desired

An unwieldy alternative to this situation which may be acceptable under some circumstances would be the expansion of the number of comparison stringsfrom three to a more realistic number The limitation of this compromise is that a full twenty-six comparison identifiers would be required in order to accommodate data which utilized the entire alphabet A better solutionhowever would be to make the general format of the alphanumeric variables identical to that of numeric identifiers ie A9-23 and to permit alphashynumeric values so defined to pass unaltered through the CONCOR system

Anocher data-naming convention which caused several errors and which could be corrected concerns the array data definitional statements While arraysof two and more dimensions are handled in a superior manner by the CONCOR proshygram single-dimension arrays pose a problem in coding as shown in the Figure 2 It is suggested that the command imperatives be changed to permit the codingof both rows and columns in single dimension arrays ie allow a single row vector as well as a single column vector to maintain the consistel -yof the array data definitional statements

A major requirement of COBOL CONCOR file processing concerns the fact that all related data records must be physically contiguous on the input file The implication of this requirement is that files may require preprocessing prior to actual data editing (This preprocessing is usually a sort routine upon a selected CONTROL-AREA key) While this type of processing merely introduces a new step in file processing a major limitation becomes apparent when a largenumber of DISCRETE DATA files of the same census or survey questionnaire are to be processed This limitation is the introduction of manual steps to save the most recent inputed values ie preventing the program from startingwith cold values each batch run If a command such as LOADUNLOAD ARRAYS was incorporated into the language (an enhancement not believed to be difficult to implement) manual processing would be reduced to a minimum between batches and the maximum benefits of the hot deck methodology would be realized It is envisioned that such a command would automatically insure the transfer of the appropriately designated hot values Automatic processing of this nature if done correctly can greatly reduce the time required to clean multishyvolume files for once CONCOR language statements have been compiled linked

While it is possible at this time to save the arrays that amp-e used in the imputation processes on a separate write-file right now it is not possibleto automatically load those values back to an object program and to iTmedishyately resume processing on another volume It isbelieved that suh an automatic feature of the language would cut down the manual processing time significantly enough that it warrants inclusion into the package prior to its general distribution

FIGURE 2

A05-DI FF-BETWEEN-AGE-OF-FEMALE-BY-RECATION v2 4 4

AGE iF LHUSBAND RELATION Connents The ARRAY-DATA command statement

12-L7 18-24 25-35 36+ provides the means to declare array identifiers

2 1 3v 4v HEAD with up to five dimensions Current documentation 2 -1 3 CHILD is not as explicite about the rules of this 1 31 -2 -4 OTHER command as is desirable The parameters of I 2 2 2 NONRFLATTVE the command should function as follows

user-identifier number of dimensions D R C M number of rows number of columns

magnitude of element intiial start up valuesA06-DI FF-BETI4EEN-ArE-DF-PERSON-AND-M4OTHER 114

(This coding generates the below In the example A05 is a two dimensional array 16 18 21 23 error message) with 4 rows 4 columns a default magvitude of 9

and cold deck values as labeled

A06-DI FF- ETWEEN-AGE-OF-PERSON-AND-MOTHER 11t 587

I 2 III jqARNINIGDD-207) COMMAND TERMINATOR I) NOT FOUND C) ASSUMED PRESENT (2) EPROR (DD-9lI) DIMENSION OF USER-SPECIFIED ARRAY IS LESS THAN THE MINIMUM VALUE PERMITTFD (2)

PREEV1OUS DIAGIOSTIC AT CINIE 563

As shown by the array variable A06 CONCORs treatment of vectors is not consistent with the above multidimensional array skeme ie

(Example of how vectors must be currently A06 must be coded as follows coded to be correct)

A06-DIFF-BETWEEN-AGE-OF- PERSON-AND-MOTHERtl42 user-identifier 1 dimension number of elements in vector magnitude of element initial start up values

A simple modification to this command would permit 6 Lthe coding of both row and column vectors and make

16 LB 21 23 this command less error prone

0 0 0

9

and stored as an object module on the system no other compilations should berequired for questionnaire processing files of the same type Theoreticallya single well-written CONCOR program is all that would be required to process an entire census run

Appendix H contrasts the internal identifiers of the old and new languageversions Without such identifiers a user would have little information about the status of input as it is processed by EDITOR As noted in theappendix most internal pointers are reset upon each break in the CONTROL-AREAprovided a CONTROL-AREA has been defined The limitation here is that there are obvious instances when the termination in the processing mode would beadvantageous based on run counts although a CONTROL-AREA has been specifiedeg debugging CONCOR programs or comparing input files Therefore another set of pointers should be implemented for this purpose and made available for programmer reference

One clearly disturbing development which needs to be pursued during inshydepth testing of the system concerns the MAX-STORAGE parameters of the DEFINE-RECORD statement As shown in the figure on the following page when MAX-STORAGE was set equal to the maximum value a COBOL program was generated whichrequired 1O00K of core to run The MAX-STORAGE value of 999 is clearly notrealistic under most processing circumstances This example drives home severalimportant points about CONCOR The core requiremenis of CONCOR generated proshygrams can be influenced significantly by the amount or nature of programmerspecified I0 operations In fact it is possible to generate a program of a size most foreign country machines could not process It is recommended that tests determine a realistic max-value restriction for implementation to prevent problems in this area

The final area of recommended modification concerns the newly implementedREPORT-DIVISION The purpose of the REPORT-DIVISION is to enable a user todescribe or specify certain CONCOR language statements which will generatestatistical reports These reports contain statistics generated by EDITOR as specified by the GENERATE-EDIT-STATISTICS command of the EXECUTION4-DIVISIONAll of the reports produced are organized according to the data fields definedby the AREA-CONTROL command of the DATA-DICTIONARY If the AREA-CONTROL command is not defined in the DATA-DICTIONARY then all the statistics aresummarized at the total run level If a control area field is defined then allstatistics will be summarized for each unique CONTROL AREA as encountered bythe EDITCR program on the input file Statistics by total run level will notbe available This in part relates back to previous discussions citing theneed for new internal identifiers Report listings may contain the values ofentire records or entire questionnaires depending upon the keyword used inthe report generation commands The problem centers upon the homogeneity of CONCOR printouts during a production run

It is virtually impossible to distinguish reports on the basis of thevolumes they were run against Some means should be provided to allow users touniquely and purposefully label the reports generated in this division Indeedthe whole name REPORT-DIVISION suggests that such a command is implicit andappropriate Such a LABEL-REPORT or REPORT-FILE command along with file inforshymation from the system should not be difficult to implement

FIGURE 3

C O N C o R PAGE

SC0 N S I S T E N C Y A N D C-0 R R E C t I ON ) RUN 6AE 0hi88n

E D I T A N D I M P U T ATI 0 N S Y S T EM

USER DICTIONARY DIVISION-SOURCE LISTING

LINE NUMBER

7070

71

72 DEFINE-RECORD RECORDNAME= HOUSING-RECORDi 72

73 MAX-STORAGE=- 999 73

74 RLrRD-TYPE 11t NOTE AN LITERAL 74

267

71

P6T

268 DEFINE-RECORD RECORD-NAME= POPULATION-RECORD p68

P69MAX-STORAGE= 999269

270RECORD-TYPE= F1270

IEF2R5I JES2JOB04966SO0114 SYSOUT iEF2851 SYS8nO18T095553RAOOOAMCP12ARO000005 PASSED

= IEF2851 VOL qFP NOS TmPOn2 IEF285I wHKHDGMCONCORP12AERRORRECS KEPT

= IEF2851 VOL SEP NOS TMPO03 IEF3731 STEP LOAD START 800181001

IEF3741 STEP LOAD STOP 800181002 CPU OMIN 0165SEC SRR flMIN O25SEC VIRT iO00K SYS 3

STEP START TIME - 100155 STEP END TIME - 100257 STEP CPU TIME - 165 SECONDS STEP COMPLETION CODE - C 0

O TAPES - 0 DISKS - 0 PEGION (REQUESTED - 1000K USED - 1000K) IO COUNT 211

CARDS READ - 0 CARDS PUNCHED - 0 LINES PRINTE6 - 22

STEPNAME - LOAD CHARGE (MACHINE UNITS - 1198 DOLLARS- S-77 5TONUM9E 1

11

Concluding Remarks of System Modifications

Version 1 of the COBOL CONCOR language was notorious for abending without the generation of diagnostic messages Appendix I sets forth a test of some of these historical problems In eac) instance the system provided messagesand source program line references wLch enabled immediate identification of the processing difficulties However tests of the various commands revealed the general high performance character of the language in its newest version Under these circumstances the recommended modifica-ions should be considered as ones which will uniformly allow CONCOR to realize t-r potential of its design enhance its utility and moreover endure as a software product

12

IV THE ADEQUACY OF COBOL CONCOR DOCUMENTATION

The current documentation of CONCOR consists of a newly developed relativelyvoluminous systems manual a Diagnostic Message Guide and some loose-leaf inshystructions concerning the installation of the system on IBM computers Of these three documents the Diagnostic Message Guide is clearly complete and adequateand could bear no changes except to provide message number tabs to enable a programmer to more quickly locate the text of a message number An example of the superior format of this guide appears as Appendix J

Conspicuously absent from the list of systems documents is a Users Guide as originally specified in the ISPC enhancement proposal

A thorough assessment of the Systems Manual has been undertaken and indishycates with some reorganization and editing it could be made a more useful and usable document as itis well constituted informationally In its current form the Systems Manual can be best described as a working document a document which was not intended to teach the CONCOR language but centered upon an audishyence already familiar with previous CONCOR releases Further it is not asystems manual per se as some of its components and appendixes could more appropriately compose a users guide For example excerpts from the POPSTAN publications which review the basic editing concepts and contain the POPSTAN example problem should be deleted from the systems manual and would be more contextually effective in the Users Guide The EXECUTION-DIVISION chapter of the Systems Manual would especially benefit from general editing and reorganishyzation (Specifically the contents of pages 90-92 should be moved to page 63 -shythese pages explain the placement and purpose of the three permissible routines of the CONCOR language Had the author been granted more time he could have outlined additional specific modifications to bring existing documentation to a level of adequacy) However some guidelines stand out

1 Installation instruction and considerations should compose a chapter of the Systems Manual rather than a separate document (Hand-out materials covering installation were not adequate) Further a documentation scheme setting forth how CONCOR was installed on each computer and any modifications required during installation could be serially added to and become a permanent part of the installations systems manual If CONCOR is going to be used and if agencies are going to actively support it as a package installation limitations must be known and well-documented The prescribed format for this documentation should be set forth and any modifishycations to CONCOR should be continuously updated on these shcets so that a history of changes to the language will be readily accessible to supporting agencies If CONCOR is to survive as a standardized system editing results must be capable of being replicated among installations Such a form could contain other information as well concerning the tyce of computer amount of care and nature of utilities supporting

13

users Upon installation a copy of this form could be

sent to the US agency which will ultimately be responsible

for supporting the CONCOR package

an appendix2 A complete COBOL CONCOR program should appear in

for reference

3 The development of the Users Guide should include an intensive

review of the editing concepts involved in processing census

data files beyond the POPSTAN materials

4 An explanation of the CONCOR benchmark program syould appear

in the Users Guide and the Systems Manual The running of a

supplied benchmark program should be a standard installation

protocol used to test all operational aspects of a new

installation

This tool which has5 The development of a CONCOR pocket card been proven useful in teaching and assisting programmers in

utilizing programming language lays out all commands options on

a single small card An example of such a pocket card is the

Such a small document is useful in clari-IBM Assembler Card fying the language and setting forth structural rules without

continual reference to full-size manuals

14

V CONCLUSION

In a situation of extreme need CONCOR could probably be utilized as it seems capable of performing its editing functiens Its usefulness as a data

cleaning tool is generally undisputed The importance of having CONCOR exshyhaustively tested by an independent agency cannot be over-emphasized in light of prior failures of the system to perform adequately Additionally core requirements and processing speed should be precisely determined One must be encouraged in the amount of progress which has been made in a relatively short amount of time towards the completion of the CONCOR package While it is clearly beyond the scope of this report to estimate the cost and the amount of time required to implement the herein outlined highly desirable changes to the CONCOR language and documentation and then exhaustively test the system it is believed that this process could be completed within a single quarter (The changes are of the nature that no structural redefinishytions of a systematic nature would be required) These enhancements would make the COBOL CONCOR package more complete consistent and reliable in the field ISPC has competently moved the system to the point that its comshypletion iswithin reach

Given the probable revision of the current Systems Reference Manual and the development of a suitable Users Guide document it is likely that this version of CONCOR could be taught to experienced programmers (unfamiliar with previous versions of the language) in as little as four weeks And while one is impressed by the potential power and overall simplicity of the COBOL CONCOR language it is difficult to envision individuals with no real computer programming experience utilizing its full capabilities Hence the importance of including a review of basic editing concepts in the documentation is undershylined

Further as documentation is often critical to the perceived ease of use of computer-based systems itwill be essential to have these various manuals examined independently prior to their general use

As previously mentioned Appendix G represents features of which the ISPC feels could have positive impact on CONCOR Depending upon the outcome of the testing recommended in previous chapters and the availability of funding supporting agencies may desire to look again at an assembler (ALC) version of the language

Overall COBOL CONCOR remains an intermediate product yet to realize its potential which is thought considerable Software development of this nature is characteristically of a long-term evolutionary design In light of this reality agencies should be prepared to evaluate such projects success or failure in terms other than that of immediate utility It is the philosophy underlying the system not the system itself which will ultimately be exported

APPENDIX A

Bucen Enforcement Proposal

APPENDIX A

BUCEN IN-HOUSE REVIEW ENHANCEMENT PROPOSAL

1 Easy to use interrecord referencing

2 Improved output file capabilities

A provide overflow protection on WRITE command

B provide OUTPUT command to create corrected questionnaire file that matches IP file data dictionary

3 Improvededit statistics reported (LISTERR)

A provide automatic (user-specified) area break

B provide options for compilation and displaying edit statistics at various levels

C provide automatic (user-specified) tolerance checking of error rates by area

D automatically capture IDs of areas failing tolerance check

4 Clean up known bugs in code

5 Comprehensive testing

6 Clean up and enhance documentation

A reference manual more examples error message guide

B installation guide

C systems manual

APPENDIX B

EVALUATIVE CRITERIA

APPEiUIX B [4 si 11 T q tII

UNITED STATES GOVERNMENshy

Pfern0 KGa 7 dll TO DSPOPFPSD Robert H Haladay

DATE December 3 1979

DSPOPDEIO Liliane Floge

SUBJECT Criteria for Evaluation of SuCen COBOL CONCOR September 30thVersion as presented at January 1930 orkshop

The following points should be considered when evaluatingCONCOR the COBOLediting system during participation at the January 19380 workshyshop

1 Are the Installation anual the System ilanual and theUsers anual ready for use Are these manuals adequate fortheir purposes Are the manuals clear and consistentthey be understood by subject-matter personnel Can

as well program ers

as

2 Is the editing system capable of impleaenting all thefeatures as described in the documenit_tion 3 Does the system appc3Ar to be ely debugged andtested and ready for release to dLvlooing conris 4 Does the system appear to be fast enough to edit a

census in a reasonable airount of time 5 hat size core does tine sys t- equire

6 CorFm-ent in general on coole te-nss usefulness andefficiency of ir-nease usethe n of by developingcountry prograrrers and other c nsus personnel

cc Su-a-- e Olds APiA SDi OrWJulio Ortuizar luotolaCelta Systems

APPENDIX C

WORKSHOP ITINERARY

APPENDIX C

CONCOR Workshop Schedule January 7-18 1980

U S Bureau of the Census International Statistical Programs Center

Scuderi Building Room 209 4235 28th Avenue Marlow Heights Maryland

Monday January 7

930 am shy 1000 Welcoming Remarks

Overview of Workshop

1000 - 1000 Introduction to CONCOR - Purpose and function

- History of development - General computer

requirements

115 - 200 System Description

-Constraints in design of CONCOR

-Basic subsystems of CONCOR

-User interactions with system

-Examples of outputs produced

1030 shy 1045 Break

1045 - 1200 Editing Concepts - Ways to interrogate

data - Ways to correct data

- Editing housing and

population data - POPSTAN

- Advantages of CONCOR

1200 shy 115 pm Break

200 - 230

230 - 245

245 - 325

User Program Organization -Divisions - Sections

- Routines - Commands

Break

Command Language

Description -Types of statements -Format -Syntax

is

2

Tuesday January 8

Dictionary Division Command Statements

115 pm-2 15 Input-Record-Section930 am-10

30 Punctuation - Define-RecordInput data referencing

Working-Data-SectionWorking data declara-- New-Datations - Array-DataInternal data represen-

tation and storage Break215 - 230

1030 - 1045 Break 230 - 325 Dictionary Examples

- Minimum dictionary1045 - 1200 Dictionary-Attributes-

structureSection - Maximum dictionary- Dictionary-Name

structure

- Hand out dictionaryFile-Section problem- Input-File

- Output-File - Write-File

- Error-File

1200 - 115 pm Break

Wednesday January 9

115 pm-2 1 5 Execution Division Command

930 am-1030 Discussion of Partici-Statements (continued)pants Dictionary

- Routines of Edit-Specishyproblems fications-Section

Free work time -Prolog-Routine -Filter-Routine -Epilog-Routine

- Types and functions 1030 - 1045 Break

of edit specification

commands1045 - 1200 Execution Division

Command Statements - Range- Punctuation - Assert- Subscripting

- Internal Identifiers - Report-Control-Section

-Generate-Edit-Statistics 215 - 230 Break-Count-Imputes

-Examples 230 - 325 - PassFail clauses

- List1200 - 115 pm Break

3

Thursday January 10

930 am-1030 Discussion of Problems - If115 pm-215

Free work time - UntilExit - Stop

1030 - 1045 Break 215 - 230 Break

1045 shy 1200 Edit Specification Command Statements 230 - 325 - Drecode

(continued) - Grecode

- Allocate - Update - Let

1200 - 115pm Break

Friday January 11

930 am-1030 Edit Specification 115 pm-3 25 Hand out Edit Specishyfications as problemCommand Statements

(continued) Free work time

- Output - Write

1030 - 1045 Break

1045 - 1200 Report Division Command Statements - Display-Control-

Section -Display-Edit-Statistics

- Tolerance-Control-Section -Error-Rate-Check -Reject-File

-Report Examples

1200 - 115 pm Break

I

29

Monday January 14

930 am-1030 Discuss procedures for running problems on computer

1030-1045 Break

1045-1200 Component Programs of the CONCOR system

1200- 115 pm Break

Tuesday January 15

930 am - 325 pm Free work time

Wednesday January 16

930 am 1200 Free work time

1200- 115 pm Break

115 pm-215 How to Install CONCOR on IBM 360370 OS

215- 230 Break

230-325 Free work time

Thursday January 17

930 am-325 Free work time

115 pm-215 Future CONCOR Design Considerations - for census processing - for survey processing

- manual correction system

215- 230 Break

230 - 245 Evaluation Guidelines

- Hand out evaluation forms

245 - 325 Free work time

Friday January 18

930 am-1030 Free work time 115-325 Free work time

1030 - 1045 Break

1045 - 1200 Group Discussion on CONCOR - submission of written evaluations Distribution of IBM installation tapes Presentation of Certificates to Participants

1200-115 pm Break

APPENDIX D

PARTICIPANTS

APPENDIX D

CONCOR WORKSHOP January 7 - 18 1980 Marlow Heights MD

PARTICIPANTS

KENYA

James Midianga Government Computer Centre Central Bureau of Statistics PO Box 30266 Nairobi Kenya

PANAMA

Omar Rivera Rios Data Processing Specialist Contraloria General de la Republica de Panama Apartado Postal 5213 Panama 5 Panama

PHILIPPINES Guida T Capellan OIC Computer Programming Division National Census and Statistics Office Solicarel Bldg 1 Magsaysay Blvd

Sta Mesa Manila Philippines

THAILAND

Angsumal Sunalai National Statistical Office Bangkok 1 Thailand

EGYPT

Farag Sedky Mourad Ghaleb CAPMAS (Centeral Agency for Public

Mobilization and Statistics)NASR City Cairo Egypt

L

2 31

SAUDI ARABIA

Shargi R Al-Shargi National Computer Center PO Box 2534 Riyadh Saudi Arabia

Ahmed S Al-Ghamdi Dept Director of National Computer Center PO Box 2534 Riyadh Saudi Arabia

OTHER

Robert W ONeal USREPJECOR APO New York NY 09038

John N Adams Data Processing Advisor DACCA Department of State Washington DC 20520

OR co UNDP PO Box 224 Dacca Bangladesh

Joe Quasney US Census Bureau Washington DC 20233

Howard Brunsman 5715 N Ninth St Arlington VA 22205

Larry Shiller and Julio Ortuzar Delta Systems Consultants Inc 264 Alhambra Circle Coral Cables FL 33134

Nicholas Ourusoff Burpee Hill Road New London NH 03257 (United Nations)

Frederic J Grant IV Systems Development GeorgiaWorld Congress Institute 580 North Omni International Atlanta Georgia 30303

3 32

Rafael Samper Mary Jo Keenan Catherine B Gleason LACDR Room 2246 NS Washington DC 20523

John Marshall AIDSERDMPSE Rm 706C SA-12 Washington DC 20523

Susanne Bacon and Mary K Friday ISPCData Processing US Census Bureau Washington DC 20233

Leo Dougherty ISPCWorld Census Staff US Bureau of the Census Washington DC 20233

33

CON C OR WORKSHOP

January 7-18 1979 Washington DC

Staff

Robert R Bair

Luis Garcia

David Malkovsky

Sandra Mansfield

Selma Sawaya

Vivian Toro

APPENDIX E

CONCOR EVALUATION FORM

0

APPENDIX E

CONCOR WORKSHOP January 16 1980

BASIC COBOL VERSION 2

December 1979 Release

Participant Evaluation of Package

1 What is your impression of this version of CONCOR in terms of its usefulness and reliability for editing housing and population census data in less developed countries

2 If you are familiar with previous versions of CONCOR how does this current version compare to them

2D

35

2

3 How would you compare the utility of this version of CONCOR for less developed countries with other systems for editing data of which you are familiar

4 How well do you feel the documentation (Reference Manual and Diagnostic Messages Guide) supports the software

36

3

5 How well do you think this version of CONCOR would serve for editing other types of statistical censuses or surveys

6 If your organization does statistical data processing by computer would you recommend to your superiors that this software be used to edit the data Do you feel assistance would be required in the installation of this software on your organizations computer andor in training userR on the packages applicability to their work

37

7 In what way(s) could improvements or enhancements be made to this version

of the CONCOR software andor its documentation

8 Other comments

1shy

APPENDIX F

NEWOLD COMMAND COMPARISONS

0 0

OLD CONCOR VERSION 1 DECEMBER 1978

DATA-DICTIONARY

DEFINE INPUT-FILE

DEFINE ERROR-FILE

DEFINE COMMON-DATA

(contains the questionnaire-IDand record type location information)

DEFINE RECORD-TYPE=

DEFINE NEW-DATA

DEFINE ARRAY-DATA

APPENDIX F

NEW CONCOR VERSION 2 DECEMBER 1979

DICTIONARY-DIVISION

DICTIONARY-ATTRIBUTES-SECTION

DICTIONARY-NAME

FILE-SECTION

INPUT-FILE OUTPUT-FILE WRITE-FILE ERROR-FILE

DOENTIFICATION-CONTROL-SECTION

AREA-CONTROL

QUESTIONNAIRE-CONTROL

RECORD-CONTROL

INPUT-RECORD-SECTION

DEFINE-RECORD

WORKING-DATA-SECTION

NEW-DATA ARRAY-DATA

END-DiVISION

COMMENTS

In Version 1 all command keyword labels had to begin in column 1 ofthe coding sheet The position notation of Version 2 permits thecommand to begin in columns 1-72

The period preceding a statement

signifies that it is but a commentcard Accordingly it should be noted that while the END-DIVISION command must be present the division name identifier DICTIONARY-DIVISION

has not been implemented

Note Some parameters for each commandhave been altered even where general correspondence exists

OLD CONCOR VERSION 1 DECEMBER 1978

EDIT-SPECIFICATION

PROLOG

FILTER TYPE ( )

EPILOG

ALLOCATE (ALL) ASSERT (AST)

END ERROR

FILTER TYPE ( ) WITH

IF LET

RANGE (RNG) RECODE (REC) STOP --keyword

UPDATE (UPD) WRITE (WRT) XRECODE (XREC)

NEW CONCOR VERSION 2 DECEMBER 1979

EXECUTION-DIVISION

REPORT-CONTROL-SECTION

COUNT-IMPUTES GENERATE-EDIT-STATISTICS

EDIT-SPECIFICATION-SECTION

PROLOG-ROUTINE

FILTER-ROUTINE (name-of-record-type)

EPILOG-ROUTINE

ALLOCATE (ALLOC) ASSERT (ASRT) DRECODE (DRCD) END-DIVISION

EXIT

GRECODE (GRCD) IF LET LIST OUTPUT RANGE (RNG)

STOP UNTIL UPDATE (UPD) WRITE (WRT)

END-DIVISION

COMMENTS

Periods preceding division and section names signify that they are treated as comments by CONCOR

END-DIVISION signifies the termination of the CONCOR division organization and must always be present even though division identifiers have not been implemented

This figure additionally permits a comparison of the EDIT-SPECI-FICATION commands of both CONCOR language versions Note the changes in abbreviations and that some keyword options (not shown) have been altered even where general corresponshydence exists Individual commands are discussed on the following pages

CONCOR LANGUAGE COMMAND STATEMENTS

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

ALLOCATE (ALL) ALLOCATE (ALLOC) Allows for the assignment of values and with the UPDATE command providesthe imputation mechanism Major difference is this statement now providesfor a receiving-identifier list and statistics are generated as coded inthe GENERATE-EDIT-STATISTICS command option of the EXECUTION-DIVISION

ASSERT fAST) ASSERT (ASRT) The ASSERT command is the basic consistency editor command of CONCOR Command now provides for a NOERROR and MESSAGE option Additionallyby utilizinq the DUMPR DUMPO keywords provides the user with the option of using all the current value of all user-identifiers defined for the record type being processed Other changes to this command include the pass end-pass fail end-fail constructions for the P purpose of maintaining structured logic

(see RECODE) DRECODE (DRCD) Provides a method for recoding data items having a starting value of zero and continue in ascending sequence Replaces t previous XRECODE command

EXIT Provides the means to terminate EXECUTION (ESCAPE) of command statements within the UNTIL command

END END-IF END-THEN

A solitary END command is no lonqer permitted in conditional construction Note the similarity to COBOL pseudocode

END-ELSE

ERROR not implemented No longer supported in language

(continued)

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

FILTER TYPE ( ) WITH not implemented No longer are option statements supported in language This command once specified essentially a GOTO depending on construction

(see XRECODE) GRECODE (GRCD) This new command provides a method for the group recoding of input values

IF IF The IF statement has properties found virtually in all programming languages ie provides a means of evaluating a condition and controlling the execution of alternative logical statements Similar to pseudocode a CONCOR programmer must utilize the structured command IF THEN END-THEN ELSE END-ELSE construction in place of the IF THEN END ELSE END statements acceptable to the 1978 version

LET LET Virtually unchanged

LIST New command which allows for the generation of specific messages to be displayed in the report of edit statistics by questionnaire as well as specific individual identifiers

OUTPUT New command which provides the means by which records will be written to the output-file

(continuej)

4)bull

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 VERSION 1

RANGE (RNG)

RECODE (REC)

STOP- keyword

UPDATE (UPD)

(continued)

DECEMBER 1979 VERSION 2

RANGE (RNG)

name changed

STOP- keyword

UNTIL DO END-DO

UPDATE (UPD)

COMMENTS

As a basic editing command of CONCOR the December 1979 version permits the listing of the out-of-range values as well as the entire record or questionnaire through DUMPR and DUMPO optionswithin the command Another new option of this command involves the utilization of a PASS END-PASS FAIL END-FAIL construction permitting the specification of logicalcommand paths depending upon the outcome of the range test

Name changed to DRECODE Virtually identical with the DRC command in the COCENTS system

Unchanged

DO-END-DO one of the most significant and powerful enhancements to the CONCOR language this command provides a loopingcapability similar to most other high-level pogramminglanguages The number of iterations is under user control as well as the value initially assigned to the user identifier (counter) which is incremented with each successive execution of the command statements

Virtually unchanged it is the basic mechanism for maintaining arrays involved in imputation processes

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

WRITE (WRT) WRITE (WRT) The WRITE language statement while once utilized as the primary output command is now used to create an auxiliary or derivative file Statistics may be accumulated with the specification of the GENERATE-EDIT-STATISTICS option of the EXECUTION-DIVISION (See WRITE-FILE of DICTIONARY-DIVISION)

XRECODE (XREC) name changed Old command which has become the GRECODE command in the December 1979 version

0 0 0

DECEMBER 1978 VERSION 1

DECEMBER 1979 VERSION 2

Not implemented REPORT-DIViSION

DISPLAY-CONTROL-SECTION

DISPLAY-EDIT-STATISTICS

TOLERANCE-CONTROL-SECTION

ERROR-RATE-CHECK REJECT-FILE

END-DIVISION

COMMENTS

This new division (note division and section names treated as comments) provides the means to controlthe type of edit statistics reports to be generated Generally all reports produced by the commands ofthis division are organized according to the AREA-CONTROL command of the DICTIONARY-DIVISION Notethat this means the input data file must be preproshycessed (presorted external to CONCOR) on the basisof the AREA-CONTROL data field as CONCOR is incapableof merging uncontiguous records

There is currently no command to enable users to specify unique report headings on the listingfrom this division

APPENDIX G

ISPC FUTURE ENHANCEMENT LIST

G-1

Appendix G

Future Versions of CONCOR Design Considerations

The present version of the CONCOR system (Basic COBOL Version 2) was designed to provide adequate computing power to properly edit housing and population census da a using automatic correction techniquns As a result CONCOR should not be construed to be the ultimate software package for generalized data editing The developers of this version of CONCOR realize that there is a group of improvements that can be made to the system which would facilitate the census data editing process A second group of changes or enhancements also exist that could facilitate the editing of survey or other types of census data It should be noted that some of the modifications to meet these two objectives at times are mutually exclusive Some of the possible modifications and considerations are outlined below

1 Housing and Population Census Processing

1 1 Software

1111 Dictionary Division

(1) Implementation of the Dictionary Attributes Section This would allow the user to specify the dictionary size (number of pages) on disk and control other file characteristics such as volume specification and retention periods if applicable

(2) Addition of an optional input andor output file(s) which could be used to initialize and save values stored in arrays and used during the allocation process This feature would require additional syntax analysis and the creation of new commands to perform the movement of the data values

(3) An output record format area (Output Record Section) could be added This would allow the creation of edited output files in a different format and length than the file read into CONCOR This implementation would require other enhanceients to provide a simple means of moving values from th-iT i tato tile output file Unique naming of user-identifiers would also need to be addressed Another possiblity is the declaration of both the input and output location in the same command when specifying the contents of tile data record to be edited

(4) Addition of user-identifier naires to fields used in the controlling of summary statistics and unique questionnaire determination This would require modification to the system level data dictionary

46 G-2

(5) Increasing the number of area and questionnaire control fields

This implementation would require modification lo the

dictionary as tell as to the error records passed to the Report

Division Another problem would be the formatting of the extra

data values in the report headings

(6) Permit the mixing of difFerent data item types in the CONTROL

commands This would allow the greatest flexibilty in choosing

control fields In order to implement this enhancement the

dictionary and the generated COBOL program (EDITOR) would

require modification

(7) Give the user access to the values in the control fields during

the editing process Either the user would be prohibited from

changing these values (as is currently done with CONCOR

internal identifiers) to prevent the creation of new questionnaires or the summary information gathered on a

questionnaire basis would become meaningless Another

ramification of this enhancement is how to handle fields defined as alphanumeric as CONCOR internally works in binary

(8) Allow input data records with a similar format to be defined by

the same DEFINE-RECORD command statement This could be

accomplished by permitting multiple values in the RECORD-TYPE

clause of the DEFINE-RECORD command A possible problem here

is in the determination of which value to be move to the output

record when the record is to be written out using the OUTPUT

command 0

(9) Implement a COMMION-DATA concept to handle items command to all

record types This could be costly in terms of core storage

and additional conversion time if a storage area is allocated

for each identifier for each record type Only permitting one

common area for all records would require validation of the

data to ensure the values were exactly the same on each record

This wJould require the CONCOR program to perform many more

internal checks when reading the data file into the store area

(10) Permit the selective inclusion of data items found in the

DEFINE-RECORD command statement into the universe count used

for the determination of tolerance levels when using the COUNT-IIPUTES command statement

(11) Increase the number and types of data format referencing now

permitted External numeric data (type N) could be expanded to

handle 18 digit numbers Also signed numeric and packed

decimal data format could be added The signed numeric format

would allow both leading blanks and negative data values This

format is essential ihen dealing with data files generated by

FORTRAN programs

0

+

G-3 47

(12) Increase the number of comparison strings permitted for alphanumeric data items This would allow for easier processing of alphanumeric strings but would require modifications to the system data dictionary

(13) If a range of valid values were specified with the declaration of a user-identifier in the DEFINE-RECORD command an automatic range check could be performed prior to CONCOR giving the user access to the data items on the questionnaire The inclusion of such values are now done by many users in the form of comment statements This enhancement would also facilitate the use of the dictionary by tabulation software

(14) The language format used in the Dictionary Division could be modified to recognize both the space and comma character as valid delimiters The user would still be required to code two commas to receive the system default value for any parameter

(15) Provide a repetition factor in the initialization of arrays

(16) Print the data dictionary name in the titles of the dictionary documentation or provide a means by which the user may specify a title to appear in the listings

112 Execution Division

(M) Implement the Run Control Section This would allow the user to increase execution speed of the EDITOR program This would be accomplished by eliminating some of the system protection features such as division by zero and out of bounds checking In this secion the user could also specify new maximum values for the EDITOR error limits

(2) Provide an internal identifier (OCCURRENCE-PTR) which would contain the occurrence number of the record currently being filtered The only problem to resolve is how to treat this id-ntifier outside of the Filter Routines

(3) Implement a CREATE commnand This command would allow the user to create a record that Was not found on the input file This implementation would require special care as it gives the user the ability to change the valme of a CONCOR internal identifier (TYPE-CGUNT) and modify the contents of the store Another consequence of this command is what action is to be taken in the calculation of tolerance in terms of this new records inclusion into the universe of observations and number of changes

iK

48 G-4

(4) The code generated in the EDITOR program for the DRECODE

command should be modified to utilized a lookup table approach

This would decrease the execution time of the command but woul C

require additional communication between the GENED and GENDD

programs

(5) Permit the user to continue the message text field over

successive lines of code

(6) Implementation of a SET command command that would change

CONCORs default occurrence pointer for the duration of the SET

command This command would be ever helpful as the validation

of the existence of a record would only have to be done once

when the SET command was encounterd The major drawback to

this command is in the users ability to remember the action

taken during the last execution of the command when the

Execution Division command statements may cover many pages of

code

(7) Implement a method of specifying different data formats for

fields on the WRITE command statement This would give greater

flexibility to the user but would require major revision to the

routines now used to process the WRITE command

(8) Provide the user with two sets of internal identifiers The

first set currently exists in the system and is valid on the

basis of the current control area (if specified) being

executed The second set would be accumulated on a run total

basis and would require the creation of new CONCOR internal

ident ifiers

(9) Implement automatic array declarations for capturing allocation

frequency distributions This could be done by the system for

discrete values (especially if point number 13 above is

implemented) but a mechanism for handling continuous values

would need to be designed nother program would then be

required to format the information gathered into a readable

form

113 Report Division

(I) Provide a means by which the user may specify their own headings or titles -for the reports

(2) Give the user the ability to override page ejection as a means

of saving paper

(3) Provide a neans by which the statistics gathered may be

presented in other aqgregations other than the area-break and

total levels now pro ided This would require another

procedure to aggregate the statistics and pass them onto the

Report Division before the printing of the reports began Thi

extra procedure is required because of program size

cons i derat ions

G-5

49

114 System Level

(1) Generation of assembler (ALC) language code for the EDITOR program to be used on the IBM 360370 series type computers

(2) Modification of the GENSRC program i utilize a lookup table instead of the linear search technioa2 now used

(3) Modification of the RDMSGTXT program to look at continuation lines when searching for the text to be supplied by the system

(4) Hlodification to the system data dictionary to better arrange and make available information needed most frequently Also to examine the possibility of making the section dealing with the message text and report generation a separate file Investigate the industry standards for dictionary format and see how CONCOR meets the requirements set forth

(5) Make the parsing and syntacial routines interactive so as to increase the product ity of the CONCOR user This does not mean to imply that the editing process itself should be made interactive but rather only the development of the user code should be put online

12 Documentation

(1)The development of a self-teaching CONCOR manual

(2) The development of a CONCOR Users Guide

G-6

0

50

2 Survey or other Census Processing

21 Software

(1) Make optional the specification of the record type and questionnaire identification information as some files are not

hierarchical in nature

(2) Provide another input file as a means of doing a check-in procedure of sample cases expected This new input file would contain the master list of those oberservations selected to be

examined

(3) Allow floating point calculations

(4) Provide a mechanism for referencing repetitive sections of questionnaire This referencing (loop letter approach) could be accomplished in the same fashion as interrecord checking is now implemented Note that by using this approach two versions of the software would be required One version would be as CONCOR is now implemented and the second would accept flat data files where the subscript indicated a repetitive section on the survey document

(5) Add a weighting scheme to allow meaningful totals and summary

statistics based upon the sampling frame used to gather the data being edited

(6) Increase the number of user-identifiers allowed in the system because of the more detailed information found on survey forms

(7) Provide a means of manual correction of erroneous data items This correction scheme would utilize the data dictionary and allow for correction based upon the user-identifer name and

questionnaire identification CELADE has a COBOL version of

this program but it has not been finished nor tested

42

APPENDIX H

CONCOR SYSTEM INTERNAL VARIABLES

0 APPENO H

CONCOR SYSTEM INTERNAL VARIABLES

1 EOF-FLAG

2 ERROR-TN-QUESTIONNAIRE

3 CURRENT-POINTER-VALUE

4 TYPE-COUNT ( )

5 RECORD-COUNT

6 QUESTIONNAIRE-COUNT

7 RECORDS-IN-STORE

8 INVALID-RECORD-TYPE-COUNT

9 INCOMPLETE-FLAG

10 CONTINUATION-FLAG

11 INVALID-RECORD-FLAG

EOF-FLAG

ERRORS-IN-QUESTIONNAIRE

TYPE-COUNT ( )

IN-RECORD-COUNT

IN-QUESTIONNAIRE-COUNT

RECORDS-IN-STORE

INVALID-RECORD-TYPE-COUNT

INCOMPLETE-FLAG

CONTINUATION-FLAG

OUT-OF-RANGE

NOT-NUMBER

BLANK

(new internal counters) OUT-RECORD-COUNT OUTPUT-QUEST-COUNT WRITE-RECORD-COUNT

Change in spelling of variable

Not mentioned--possible ommission from manual

Not implemented

Note Current documentation equivocates when some variables

are reset--most are initialized with each control area break

Though implemented as reserved identifiers due to poor documentation of (old) CONCOR exact values and utility were unknown

These apparently new counters permit access to valuable totals within control areas

Note These are not cumulative counters Also outputshyquest-count violates naming conversion eg in out

APPENDIX I

CONCOR-EDITOR EXECUTION STATISTICS

C 0 N C o R - E D I T O R EXECUT ION S-TA-T1 ST I CS RUN DATEt 517I

APPENDIX I

CONTROL AREA 0 INPUTS RELD OUTPUTS BY OUTPUTCOHMAND OUTPUTS BY W0ITECOMP4JD NO VALI 0 SEQUENCE NUMBER 00000 00 0 RECTYPES 0

OUESTIONNAIRES RECORDS 0 QUEST KEYWORD- 0 ECORDS- RECORDS FORQUFSTshy

bullDIVISION BY ZERO LINE NUMBER i18 to DIVISION BY ZERO LINE NUMBER 118 bo DIVISION BY ZERO 00 LINE NUMBER 118

Oboo DIVISION BY ZERO 00 LINE NUMBER 118 LTNE |UJMP 000 DIVISION BY ZERO 000 LINE NUMBER 118

bo DIVISION BY ZERO 0 LINE NUMBER 118 00 RUN ABORTED DUE TO DIVISION BY ZERO ERRORS 00 1it LIST IR003 = Ro3t

1 6 42

112 LIST

Comments One of the most sever criticisms 114 LIST IDIVISIION BY ZErO ERROR Cw FOLLOWS of the old CONCOR package was that ]l LET RO01 003 = O it would cease processing for no

apparent reason--ABEND without generating any messages The most )16 LIST iRO01 R003 SET TO ZEROS Pou19 Ron3 conmon types of cessations are division by zero array coordinate 117 LIST RO0 0 = errors and user specified terminations A program to deliberately 118 LET P003 = R002 RO01 test CONCORS ability to providediagnostics was written The system 119 LIST LIST R003 R002RO01 =1 R0031 perfoned exceptionally well and correctly provided the line numbers in the 120 source program which caused the data exceptions The program text has been 121 superimposed upon the Execution Statistics page for illustration purposes 122 END TEST OF OPERANDS AND MSG

123

124

1 5 PEGISTERS RESET TO ZERO

125 LET R001 RO00 R003 = nt

0 N C 1 - Z D I T 0 it - ( r S T A T I S T I C S QJi DAT 0119

I ~ - ~APPENDIX b)APPENDIX TL I A Cr14A) 01 3TUTS JRtTF -0 VtLIQ AP-A I PUT R - 3ITD JTS LY UTPUT - Y C

I)ESTIjAAIES - PECLRW cU] I S T igry-t QEC UP )S irrkqS F] ) T

VLLII= FF= 15( C 1 u U-1 Tr JJ TI A)[ 9 )i 1= TI =WATE ) L i - U = VI L IJ) L T J3r-= I n

D 1 4A 1 EItR P IXSTI J A 1 L = 7 II I fIAT E I i Y C

= y 4 I 19f CZ U AT -Z-2j 1ESTIJh14 E CJ)PI) IATE= 2 V L 1E= L T I P1 =UM 11

= rU F r

F R L- -- I JE TI rjA OrPR eTE= rUMiF=E II V1 LIIF= LIT V M

tYfZU ttj gR R QUIESTI3Ii1 j Vi r IATE= LI n l FJw== 0R V J L T C ( -pJIT E~Q - cUESTT Y2ZjI 1= g rfiflUI)IATL= VeLUJE= P LIf PP= 1 3)

bull~ - A ED 011 1 Tn CCy -tENCE C rGR DI T FOR UP S )r 0 JCCUJ0EICF FJ71 ECGPO

173

175 UNTIL ROO1 gt 10 VARY P001 FROM 1 By Comments

In this example RO01 a 2 dimensional array 177 DO with 10 rows and 10 columns is attempting

178 UNTIL R002 - 10 VARY BO FROM 1 BY 1 to update the smaller (5 X 5) TA2 Nonshy- Uexistent element references caused the error

The program text has been superimposed on this

page for illustration purposes

180 ALLOCATE NOOP = TA1 (RO0] ROO)I

JAI UPDATE TA2 (RO01 R002) RO031

182 LIST N002 = R0037 NOO R003

1P3 LIST VALUES OF TA1 = RO03v TAl (RO01 R002)

184 LIST

185 LIST ITA2=TA2 RO01 Rn02)l

l6 END-DOI LIST

187 END-DO

deg r- -- shy

APPENJDIXI1 J( j

CONTROL AREA INPUTS READ OUTPUTS BY OUTPUT LOMANO OUTPUTS TEltCOMMNDfN

SEQUENCE NUMBER 11T A QUESTIONNAIRES RECORDS QUEST KEYWORD RECORDS SFf- 0 T rr-

JOB STOPPED DUETO USER STOP-RUN COMMAND ON LINC 215 - - -- 2 15 _ _ 4$ A

EDITOR -- NO RMAL END OF J B r T I1)

214 IF IN-RECORD-COUNT gt Comments In this example when the internal variable1215 ITHEN STOP-RUN END-THEN was than

215 TE END-THEN IN-RECORD-COUNT was greater than 500 processing was directed by the source COBOL CONCOR language

216 1 program to cease The program text has been - upon this page for illustrationI _-EC superimposed

S IF IN-RECORD-COUNT- lt 2 - - purposes ~~~~~~~~~~- bull - - ------I-

218~ THEN -LIST lt(THAN 20RECS INR

amp-~~~~~ ntai-beDcent

I

APPENDIX J

DIAGNOSTIC MEFSAGE GUIDE EXAMPLE

APPENDIX J DD-2

WARNING(DD-001) BLANK COMMAND STATEMENT LINE

EXPLANATION A blank command statement line was found in the user Dictionary Division command statement file

ACTION TAKEN Parsing continued with the next command statement line in the Dictionary Division command file

USER RESPONSE Put a comment indicator () in column 1 of the command statement line if spacing is desired if not remove the command statement line

ERROR (DD-002) ILLEGAL FIRST CHARACTER IN STRING

EXPLANATION A character not in the CONCOR legal character set (A-Z 0-9 - + the comma character () the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found as the first character in the string

ACTION TAKEN Parsing began again with the next string

USER RESPONSE Check for possible keying errors and ensure that the first character is a member of the valid CONCOR

character set

ERROR (DD-003) END OF LINE REACHED WITH NO DELIMITER FOR ILLEGAL STRING

EXPLANATION A string starting with a character not valid in the CONCOR legal character set (A-Z 0-9 - + the comma character ()p the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found without one of the delimiter characters ( =) when the end of the command line was reached

ACTION TAKEN Parsing began again with the next user command statement line in the Dictionary Division command file

USER RESPONSE Check for possible keying errors and ensure that each of the characters in the string is a member of the CONCOR valid character set (A-Z 0-9 - + the comma character () he delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ())

ERROR (DD-004) COMMENT INDICATOR () IN STRING

EXPLANATION The comment indicator character () was found embedded in a string

Page 2: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …

PREFACE

Since June 1979 a major design of the COBOL CONCOR edit and imputation system has been undertaken by the International Statistical Programs Center (ISPC) of the US Department of Commerce Bureau of the Census A one-day program held October 10 1979 previewed enhancements which were planned to be implemented to the system Based upon the information furnished at that workshyshop I uidertook an interim review of the state of completion of the COBOL CONCOR package The result of that review was a working document entitled Report on the Developing COBOL CONCOR Edit and Imputation System At the timeof that writing the system was not in a sufficient degree of completion to definitively gauge its adequacy for exportation to developing countries This current publication COBOL CONCOR 1980 Its Adequacy and State of Completion while substantial in its own regard can best be understood in light of that previous report

On January 7-19 1980 I attended a workshop designed to provide particishypants with an in-depth explanation of the full range of capabilities the new COBOL CONCOR supports During this time I was able to learn the new CONCOR language and conduct tests bearing on the adequacy and completeness of the system Results from these test programs comprise parts of many of the Appendices

On January 18 1980 I was debriefed at the Office of Population Agency for International Development Rosslyn Virginia over the specific areas which compose the body of this report In this instance any comments of a critical nature about CONCOR must be preceded by a statement attesting to the compeshytence and dedication of the ISPC staff who have done an extraordinary job in redesigning and rewriting many of the programs comprising the system since October 10 1979

Though my experience with systems analysis and design utilizing the COBOL programming language encompasses three years local circumstances and specialishyzation are important considerations The discussions of this report are based on my overall experience in the data processing field and how I think they apply to the development of CONCOR At the time of the writing of this report I am the Senior Systems Analyst and the Director of Data Base Administration for the Georgia World Congress Institute a state-operated nonprofit research organization located in Atlanta Georgia

ii

CONTENTS

Page

PREFACE

EXECUTIVE SUMMARY iii

I BACKGROUND 1

II THE ADEQUACY OF CONCOR 3

III PROPOSED CHANGES TO THE LANGUAGE STRUCTURE 5

IV THE ADEQUACY OF COBOL CONCOR DOCUMENTATION 12

V CONCLUSION 14

APPENDICES

Appendix A Appendix B Appendix C Appendix D Appendix E Appendix F Appendix G Appendix H Appendix I Appendix J

- Bucen Enforcement Proposal - Evaluative Criteria - Workshop Itinerary - Participants - CONCOR Evaluation Form - NewOld Command Comparisons - ISPC Future Enhancement List - CONCOR System Internal Variables - CONCOR-EDITOR Execution Statistics - Diagnostic Message Guide Example

0

EXECUTIVE SUMMARY

The December 1979 COBOL CONCOR (Version 2) is a much improved software package All commands appear to be functional however the system should be exhaustively tested by an independent agency prior to its general release This agency Should also precisely determine the systems relative speed and core processor requirements While the system (exclusive of documentation) could immediately be utilized in a situation of extreme need some CONCOR language coding inconsistencies detract from the learnability and exportashybility of the package and should be corrected Additionally there are other modifications or adjustments which would enhance the overall utility and productivity of the language in census and survey applications

System documentation continu~s to be a problem There is no Users Guide The Systems Manual though well-constituted informationally should be thoroughly reorganized in accordance with the guidelines set forth in this report

The staff of ISPC exhibited competence and professionalism in the conduct of the two-week workshop January 7-18 1980 ISPC generally is aware of both the potential and shortcomings of the CONCOR project The current CONCOR version makes a significant redesign of the overall system As a package it is in a state where its completion is within reach

I BACKGROUND

CONCOR (an acronym of Consistency and Correction) is best characterized as a software tool designed to expedite the processing of data files duringthe edit and imputation phase of population census and surveys As a metashycompiler written in the COBOL language the system reads and verifies CONCOR language statements to produce an executable EDITOR program The objectiveof this process is the creation of an error-free file which can be used at a later time for tabulation purposes

Since its release as Version 1 December 30 1978 numerous elements of the COBOL CONCOR system have undergone continual change and redefinition Infact the system has not been permitted to stand still for any period of time nor has it been exhaustively tested In June of 1979 ISPC suspendedthe further distribution of the COBOL CONCOR system This decision was based principally upon reports of the packages unsatisfactory performance at workshops held in Panama and Thailand ISPC upon their own initiative developed a proposal to overhaul CONCOR and its accompanying documentation This proposal is contained in Appendix A representing an ambitious undertakingWhile not all of the desired changes and capabilities could be implementedVersion 2 of December 1979 represents a significant managerial effort The questions are now whether COBOL CONCOR Version 2 will be a demonstrably adeshyquate sofrvare package -- a package capable of exportation to developingcountries -- a package requiring no further modification The purpose of this report is to address these critical issues In connection with this Appendix B sets forth the specific criteria around which such a discussion must evolve As this is not intended to be a compendium some of these broader issues will be immediately treated following chapters and appendiceswill qualify the exact nature of system altu ations already undertaken as well as further adjustments believed to be essential in realizing the goals of the systems philosophy

Workshop

During the period of January 7-18 1930 a workshop was held under the sponsorship of the ISPC to demonstrate the capabilities of the latest reshyvision of the COBOL CONCOR software package A schedule of events of this workshop is contained in Appendix C This workshop was intended to provideparticipants with the opportunity to program in the CONCOR language and to thereby test aspects of the system as individually appropriate A listing of the participants and the international organizations they represented is

EDITOR is the new name of the EXECUTOR module of previous language versions

A complete history of the development of COBOL CONCOR can be found in both the ACCENTER 1978 Version 1 and December 1979 Version 2 systemsmanuals as well as in previous consulting reports

2

contained in Appendix D During the concluding days of the workshop each participant was asked by ISPC to provide a written evaluation of the now-called December 1979 version of CONCOR This evaluation form Appendix E also inshycludes space for comments concerning the competence of the system documentation as well as any additional comments including these regarding the organization and clarity of workshop presentations It is assumed that in the near future summaries of these comments will be available to interested agencies

While virtually all instructional aspects of this two-week workshop were conducted in a highly professional manner -- a manner which revealed a high degree of coordination among staff members in their efforts -- there are several areas which future workshops may improve upon

1 All publications should be assembled in their entirety and proof-read prior to distribution

2 A complete CONCOR language program example and accompanying 110 documents should be provided at the onset of the workshyshop for reference

3 Numerous short application programming problems involving all CONCOR language divisions should be utilized in place of a single lengthy problem

It is noted that this workshop was not intended to teach the CONCOR language as the organization and presentation of materials probably would have been different It is believed that the two-week time period was sufficient time to provide participants a familiarity with the use of the new CONCOR features especially in light of the fact that workshop participants were permitted to work weekends and beyond normal working hours at their disshycretion Though funding was not generally available it is known that several workshop members chose to extend their stay inWashington to continue testing the COMCOR package or to work on projects which they could attempt to immedishyately install on their home computers At the conclusion of the workshops participants were permitted to take with them an installation tape of CONCOR as well as all the other materials they had acquired during the course of the project

3

II THE ADEQUACY OF CONCOR

CONCOR has been described by its designers as an adequate packageAdequacy as an evaluative criteria is often relative to need and should not be confused with readiness as an issue The CONCOR system exclusive of documentation is sufficiently corplete that in a situation of extreme need it could be used as a data-cleaning tool in the editing and imputation phaseof census processing Less extreme circumstances would impose reticence on such an endorsement Though non-exhaustive tests indicate that CONCOR appearsto be capable of performing all of the commands as implemented because of the rapidness with which the system was rewritten it is thought that there has not been enough time to fully test all aspects of the project Thereforeprior to its general dissemination it is recommended that an independent agency conduct exhaustive tests to certify the integrity of the system proshygrams The importance of this certification cannot be understated in lightof previous workshop experiences Concurrent with this testing process the same agency should determine the relative speed and size of the system under actual production circumstances and further determine CONCORs ease of nstalshylation Later sections of this discussion set forth additional testing recom endations

It is generally recognized that of all the data-cleaning tools available for exportation CONCOR is potentially the most powerful especially with the addition of its new commands as outlined in Appendix F While its utility is not in doubt one must ask the question of how much more useful could CONCOR be if modified and would this additional utility be worth the costs involved The nature of modifications (excluding documentation) to COBOL CONCOR approprishyate at this time for cnsideration are threefold

1 Adjustments to the elements of the system which are internallyinconsistent or awkward to facilitate its learnability and usability am ig developing country programmers

a Implementation of the division headings DICTIONARY-DIVISION EXECUTION-DIVISION and REPORT-DIVISION De-emphasis of section headings

b Improvement of the consistency among data identifiers allow alphanumeric variables to be coded without mandatory comshyparison strings throughout the DATA-DIVISION and to be of the same length of numeric variables Permit numeric identifiers to be of an equal length to NEW DATA identishyfiers Permit the coding of single dimension row and column vectors in the same manner as multi-dimensional arrays

2 Implementation of selective commands and internal variables to facilitate the production environment use of CONCOR in census applications These include

4

a LOADUNLOAD arrays Commands which would save and replace automatically hot-decked values from batch to batch

b TOTAL-QUESTIONNAIRE-COUNT-RECORD-COUNT internal

variables independent of AREA CONTROL

3 Other modifications

a Default values for max-storage parameter set in realistic range

b Allowance of more variables for survey applications

Some of these modifications are part of what ISPC calls its wish list for the future development of CONCOR This document has been included in this report as Appendix G It is arguable that these features are essential to the completion of the CONCOR package While it is beyond the scope of this report to draw a conclusion in this area the enhancements as outlined above are ones that would make the language more internally consistent and thereby easier to learn and apply to a census data production environment These modifications are not arbitrary or cosmetic but are a direct result of handsshyon programming experience in the language as well as observations and disshycussions with other workshop participants While it is probably impossible to ever be satisfied with the overall structure of any programming language the resolution of this issue of completeness must be made relative to the objecshytives for developing the COBOL CONCOR system in the first place An explicit statement of these objectivEs has been absent in all systems documentation to date

5

III PROPOSED CHANGES TO THE LANGUAGE STRUCTURE

Based upon the assumption that it is the intent uf sponso-ing agenciesto optimize the COBOL CONCOR package -- a goal which is believed currentlyobtainable -- an understanding of the nature of these changes and how theywould impact users is essential Appendix F sets forth in a comparative manner differences between the old December 1978 and the new December 1979 editions of CONCOR Studying this appendix obviates the fact that while the new version of the language is clearly superior to the old in nearly everyaspect the basic and overall structure of the language is essentially unshychanged Compartmentalization of aspects of the language into divisions represents a significant ideological enhancement to the language Indeeddevelopment of programs by divisions proved to be an extremely useful way of understanding the nature of editing work to be performed However note that while the END-DIVISION comnand is essential to the language the division headings DICTIONARY-DIVISION EXECUTION-DIVISION and REPORT-DIVISION were not implemented and are therefore preceded by a period to be treated as comment lines in the program listing It is inconsistent to implement END-DIVISION commands while not implementing the division headings It is believed that this division structuring is important enough to the overall organizational structure of a CONCOR source language program that it should be implementedprior to general distribution The section headers shown on the figures inAppendix F however are another matter They are cumbersome and were generally not coded by workshop participants and they could be deleted from this version of the language altogether with little loss in organizational understanding The CONCOR language is significantly powerful to stand on its own as a distinct product and is not meant to be a COBOL imitation its present degree of development and specialization do not warrant the structural drag of additional section identifiers is the probable intent of the originalCONCOR project was to develop a package which was uncomplicated and unwieldy to use The question of division and section names implementation while seemingly cosmetic can have real impact on its perceived easiness of learning and use

Figure 1 on the following page illustrates common mistakes programmersmake coding numeric and alphanumeric variables in the DATA-DIVISION These mistakes are the result of the inconsistent variable formats For instancein the numeric data definition statement it is permissible to specify 19-23 where N signifies numeric 9 signifies the length of the item and 23 specifiesthe starting position in the record In NEW-DATA however it is possible to code an item with a maximum length of 18 While on the surface this inconshysistency would seem harmless typically some data defined user variables in NEW-DATA defined N18 could be moved inadvertently to output record fields defined by a data definition statement 119-23 such an action would result in a data error Under certain circumstances itwould be highly desirable to output these larger length values A similar circumstance exists between the numeric and alphanumeric data coding conventions While the maximum lengthof the numeric is permitted to be 9 in the data definition statement (18 in NEW-DATA) the maximum alphanumeric variable is permitted to be only 4 characters in length In the current systems manual it is recommended that

FIGURE 1

DICTIONARY-DIVISION

DICTIONARY-NAME DATA-CODING-EXAMPLE

INPUT-FILE

OUTPUT-FILE

AREA-CONTROL N2-2 N2-4 N3-6 N2-9 N9-23 QUESTIONNAIRE-CONTROL A4-2 A3-6 A2-9 A3-11 A3-14

RECORD-CONTROL Al-l

DEFINE-RECORD

HOI-TYPE-OF-HOUISING-UNIT Nl-17

H02-MATERIAL-OF-ROOF N1-19 10 9

H03-TOTAL-PERSONS-IN-UNIT N8-40 NOT-NUMERIC BLAIK

1104-STATE-OF-UIIII-CODE A4-50 0 U 1 D

DEFINE-RECORD

P01-SEX 1-13 W F

NEW-DATA

NOI-SAVE-TYPE-OF-HOUSING-UNIT

N02-SAVE-TYPE-OF-ROOF 1

N03-COUNT-TOTAL-IN-UNITS 10 0

N04-AGGREGATE-INCOME 18 0

END-DIVISION

Explanations

N2-4 This is an example of an external numeric input data item (N) with a length of 2 bytes starting in column 4 of the input record The maximumlength of this type of variable outside of NEW-DATA is 9 When coded in

NEW-DATA 18 is permitted

A4-2 This is an example of an external alphanumeric input data item (A)

with a length of 4 bytes starting in column 2 of the input record This

construction for alphanumeric variable is valid only in the control stateshyments Additionally it can never be over 4 bytes in length When alphshynumeric data fields are defined within record types the EDITOR program

requires that the comparison strings always be specified A maximum of 3 is permitted The purpose of these strings is to force recode the data to a numeric value If no match is found EDITOR automatically assigns a unique negative value to the field

7

alphanumeric coding be utilized in the QUESTIONNAIRE-CONTROL and RECORD-CONTROL statements where each input data item must be of the same data type as shown in the example When alphanumeric data variables are used in these control stateshyments their construction is identical to that of numeric items However when used elsewhere in the DATA-DIVISION alphanumeric variables are required to specify one of three possible comparison values as shown There are number of production instances when it never would be necessary or even desirable to reshycode alphanumeric data However as CONCOR attempts to force data into a totally numeric format upon output there is no current way to preserve these values if desired

An unwieldy alternative to this situation which may be acceptable under some circumstances would be the expansion of the number of comparison stringsfrom three to a more realistic number The limitation of this compromise is that a full twenty-six comparison identifiers would be required in order to accommodate data which utilized the entire alphabet A better solutionhowever would be to make the general format of the alphanumeric variables identical to that of numeric identifiers ie A9-23 and to permit alphashynumeric values so defined to pass unaltered through the CONCOR system

Anocher data-naming convention which caused several errors and which could be corrected concerns the array data definitional statements While arraysof two and more dimensions are handled in a superior manner by the CONCOR proshygram single-dimension arrays pose a problem in coding as shown in the Figure 2 It is suggested that the command imperatives be changed to permit the codingof both rows and columns in single dimension arrays ie allow a single row vector as well as a single column vector to maintain the consistel -yof the array data definitional statements

A major requirement of COBOL CONCOR file processing concerns the fact that all related data records must be physically contiguous on the input file The implication of this requirement is that files may require preprocessing prior to actual data editing (This preprocessing is usually a sort routine upon a selected CONTROL-AREA key) While this type of processing merely introduces a new step in file processing a major limitation becomes apparent when a largenumber of DISCRETE DATA files of the same census or survey questionnaire are to be processed This limitation is the introduction of manual steps to save the most recent inputed values ie preventing the program from startingwith cold values each batch run If a command such as LOADUNLOAD ARRAYS was incorporated into the language (an enhancement not believed to be difficult to implement) manual processing would be reduced to a minimum between batches and the maximum benefits of the hot deck methodology would be realized It is envisioned that such a command would automatically insure the transfer of the appropriately designated hot values Automatic processing of this nature if done correctly can greatly reduce the time required to clean multishyvolume files for once CONCOR language statements have been compiled linked

While it is possible at this time to save the arrays that amp-e used in the imputation processes on a separate write-file right now it is not possibleto automatically load those values back to an object program and to iTmedishyately resume processing on another volume It isbelieved that suh an automatic feature of the language would cut down the manual processing time significantly enough that it warrants inclusion into the package prior to its general distribution

FIGURE 2

A05-DI FF-BETWEEN-AGE-OF-FEMALE-BY-RECATION v2 4 4

AGE iF LHUSBAND RELATION Connents The ARRAY-DATA command statement

12-L7 18-24 25-35 36+ provides the means to declare array identifiers

2 1 3v 4v HEAD with up to five dimensions Current documentation 2 -1 3 CHILD is not as explicite about the rules of this 1 31 -2 -4 OTHER command as is desirable The parameters of I 2 2 2 NONRFLATTVE the command should function as follows

user-identifier number of dimensions D R C M number of rows number of columns

magnitude of element intiial start up valuesA06-DI FF-BETI4EEN-ArE-DF-PERSON-AND-M4OTHER 114

(This coding generates the below In the example A05 is a two dimensional array 16 18 21 23 error message) with 4 rows 4 columns a default magvitude of 9

and cold deck values as labeled

A06-DI FF- ETWEEN-AGE-OF-PERSON-AND-MOTHER 11t 587

I 2 III jqARNINIGDD-207) COMMAND TERMINATOR I) NOT FOUND C) ASSUMED PRESENT (2) EPROR (DD-9lI) DIMENSION OF USER-SPECIFIED ARRAY IS LESS THAN THE MINIMUM VALUE PERMITTFD (2)

PREEV1OUS DIAGIOSTIC AT CINIE 563

As shown by the array variable A06 CONCORs treatment of vectors is not consistent with the above multidimensional array skeme ie

(Example of how vectors must be currently A06 must be coded as follows coded to be correct)

A06-DIFF-BETWEEN-AGE-OF- PERSON-AND-MOTHERtl42 user-identifier 1 dimension number of elements in vector magnitude of element initial start up values

A simple modification to this command would permit 6 Lthe coding of both row and column vectors and make

16 LB 21 23 this command less error prone

0 0 0

9

and stored as an object module on the system no other compilations should berequired for questionnaire processing files of the same type Theoreticallya single well-written CONCOR program is all that would be required to process an entire census run

Appendix H contrasts the internal identifiers of the old and new languageversions Without such identifiers a user would have little information about the status of input as it is processed by EDITOR As noted in theappendix most internal pointers are reset upon each break in the CONTROL-AREAprovided a CONTROL-AREA has been defined The limitation here is that there are obvious instances when the termination in the processing mode would beadvantageous based on run counts although a CONTROL-AREA has been specifiedeg debugging CONCOR programs or comparing input files Therefore another set of pointers should be implemented for this purpose and made available for programmer reference

One clearly disturbing development which needs to be pursued during inshydepth testing of the system concerns the MAX-STORAGE parameters of the DEFINE-RECORD statement As shown in the figure on the following page when MAX-STORAGE was set equal to the maximum value a COBOL program was generated whichrequired 1O00K of core to run The MAX-STORAGE value of 999 is clearly notrealistic under most processing circumstances This example drives home severalimportant points about CONCOR The core requiremenis of CONCOR generated proshygrams can be influenced significantly by the amount or nature of programmerspecified I0 operations In fact it is possible to generate a program of a size most foreign country machines could not process It is recommended that tests determine a realistic max-value restriction for implementation to prevent problems in this area

The final area of recommended modification concerns the newly implementedREPORT-DIVISION The purpose of the REPORT-DIVISION is to enable a user todescribe or specify certain CONCOR language statements which will generatestatistical reports These reports contain statistics generated by EDITOR as specified by the GENERATE-EDIT-STATISTICS command of the EXECUTION4-DIVISIONAll of the reports produced are organized according to the data fields definedby the AREA-CONTROL command of the DATA-DICTIONARY If the AREA-CONTROL command is not defined in the DATA-DICTIONARY then all the statistics aresummarized at the total run level If a control area field is defined then allstatistics will be summarized for each unique CONTROL AREA as encountered bythe EDITCR program on the input file Statistics by total run level will notbe available This in part relates back to previous discussions citing theneed for new internal identifiers Report listings may contain the values ofentire records or entire questionnaires depending upon the keyword used inthe report generation commands The problem centers upon the homogeneity of CONCOR printouts during a production run

It is virtually impossible to distinguish reports on the basis of thevolumes they were run against Some means should be provided to allow users touniquely and purposefully label the reports generated in this division Indeedthe whole name REPORT-DIVISION suggests that such a command is implicit andappropriate Such a LABEL-REPORT or REPORT-FILE command along with file inforshymation from the system should not be difficult to implement

FIGURE 3

C O N C o R PAGE

SC0 N S I S T E N C Y A N D C-0 R R E C t I ON ) RUN 6AE 0hi88n

E D I T A N D I M P U T ATI 0 N S Y S T EM

USER DICTIONARY DIVISION-SOURCE LISTING

LINE NUMBER

7070

71

72 DEFINE-RECORD RECORDNAME= HOUSING-RECORDi 72

73 MAX-STORAGE=- 999 73

74 RLrRD-TYPE 11t NOTE AN LITERAL 74

267

71

P6T

268 DEFINE-RECORD RECORD-NAME= POPULATION-RECORD p68

P69MAX-STORAGE= 999269

270RECORD-TYPE= F1270

IEF2R5I JES2JOB04966SO0114 SYSOUT iEF2851 SYS8nO18T095553RAOOOAMCP12ARO000005 PASSED

= IEF2851 VOL qFP NOS TmPOn2 IEF285I wHKHDGMCONCORP12AERRORRECS KEPT

= IEF2851 VOL SEP NOS TMPO03 IEF3731 STEP LOAD START 800181001

IEF3741 STEP LOAD STOP 800181002 CPU OMIN 0165SEC SRR flMIN O25SEC VIRT iO00K SYS 3

STEP START TIME - 100155 STEP END TIME - 100257 STEP CPU TIME - 165 SECONDS STEP COMPLETION CODE - C 0

O TAPES - 0 DISKS - 0 PEGION (REQUESTED - 1000K USED - 1000K) IO COUNT 211

CARDS READ - 0 CARDS PUNCHED - 0 LINES PRINTE6 - 22

STEPNAME - LOAD CHARGE (MACHINE UNITS - 1198 DOLLARS- S-77 5TONUM9E 1

11

Concluding Remarks of System Modifications

Version 1 of the COBOL CONCOR language was notorious for abending without the generation of diagnostic messages Appendix I sets forth a test of some of these historical problems In eac) instance the system provided messagesand source program line references wLch enabled immediate identification of the processing difficulties However tests of the various commands revealed the general high performance character of the language in its newest version Under these circumstances the recommended modifica-ions should be considered as ones which will uniformly allow CONCOR to realize t-r potential of its design enhance its utility and moreover endure as a software product

12

IV THE ADEQUACY OF COBOL CONCOR DOCUMENTATION

The current documentation of CONCOR consists of a newly developed relativelyvoluminous systems manual a Diagnostic Message Guide and some loose-leaf inshystructions concerning the installation of the system on IBM computers Of these three documents the Diagnostic Message Guide is clearly complete and adequateand could bear no changes except to provide message number tabs to enable a programmer to more quickly locate the text of a message number An example of the superior format of this guide appears as Appendix J

Conspicuously absent from the list of systems documents is a Users Guide as originally specified in the ISPC enhancement proposal

A thorough assessment of the Systems Manual has been undertaken and indishycates with some reorganization and editing it could be made a more useful and usable document as itis well constituted informationally In its current form the Systems Manual can be best described as a working document a document which was not intended to teach the CONCOR language but centered upon an audishyence already familiar with previous CONCOR releases Further it is not asystems manual per se as some of its components and appendixes could more appropriately compose a users guide For example excerpts from the POPSTAN publications which review the basic editing concepts and contain the POPSTAN example problem should be deleted from the systems manual and would be more contextually effective in the Users Guide The EXECUTION-DIVISION chapter of the Systems Manual would especially benefit from general editing and reorganishyzation (Specifically the contents of pages 90-92 should be moved to page 63 -shythese pages explain the placement and purpose of the three permissible routines of the CONCOR language Had the author been granted more time he could have outlined additional specific modifications to bring existing documentation to a level of adequacy) However some guidelines stand out

1 Installation instruction and considerations should compose a chapter of the Systems Manual rather than a separate document (Hand-out materials covering installation were not adequate) Further a documentation scheme setting forth how CONCOR was installed on each computer and any modifications required during installation could be serially added to and become a permanent part of the installations systems manual If CONCOR is going to be used and if agencies are going to actively support it as a package installation limitations must be known and well-documented The prescribed format for this documentation should be set forth and any modifishycations to CONCOR should be continuously updated on these shcets so that a history of changes to the language will be readily accessible to supporting agencies If CONCOR is to survive as a standardized system editing results must be capable of being replicated among installations Such a form could contain other information as well concerning the tyce of computer amount of care and nature of utilities supporting

13

users Upon installation a copy of this form could be

sent to the US agency which will ultimately be responsible

for supporting the CONCOR package

an appendix2 A complete COBOL CONCOR program should appear in

for reference

3 The development of the Users Guide should include an intensive

review of the editing concepts involved in processing census

data files beyond the POPSTAN materials

4 An explanation of the CONCOR benchmark program syould appear

in the Users Guide and the Systems Manual The running of a

supplied benchmark program should be a standard installation

protocol used to test all operational aspects of a new

installation

This tool which has5 The development of a CONCOR pocket card been proven useful in teaching and assisting programmers in

utilizing programming language lays out all commands options on

a single small card An example of such a pocket card is the

Such a small document is useful in clari-IBM Assembler Card fying the language and setting forth structural rules without

continual reference to full-size manuals

14

V CONCLUSION

In a situation of extreme need CONCOR could probably be utilized as it seems capable of performing its editing functiens Its usefulness as a data

cleaning tool is generally undisputed The importance of having CONCOR exshyhaustively tested by an independent agency cannot be over-emphasized in light of prior failures of the system to perform adequately Additionally core requirements and processing speed should be precisely determined One must be encouraged in the amount of progress which has been made in a relatively short amount of time towards the completion of the CONCOR package While it is clearly beyond the scope of this report to estimate the cost and the amount of time required to implement the herein outlined highly desirable changes to the CONCOR language and documentation and then exhaustively test the system it is believed that this process could be completed within a single quarter (The changes are of the nature that no structural redefinishytions of a systematic nature would be required) These enhancements would make the COBOL CONCOR package more complete consistent and reliable in the field ISPC has competently moved the system to the point that its comshypletion iswithin reach

Given the probable revision of the current Systems Reference Manual and the development of a suitable Users Guide document it is likely that this version of CONCOR could be taught to experienced programmers (unfamiliar with previous versions of the language) in as little as four weeks And while one is impressed by the potential power and overall simplicity of the COBOL CONCOR language it is difficult to envision individuals with no real computer programming experience utilizing its full capabilities Hence the importance of including a review of basic editing concepts in the documentation is undershylined

Further as documentation is often critical to the perceived ease of use of computer-based systems itwill be essential to have these various manuals examined independently prior to their general use

As previously mentioned Appendix G represents features of which the ISPC feels could have positive impact on CONCOR Depending upon the outcome of the testing recommended in previous chapters and the availability of funding supporting agencies may desire to look again at an assembler (ALC) version of the language

Overall COBOL CONCOR remains an intermediate product yet to realize its potential which is thought considerable Software development of this nature is characteristically of a long-term evolutionary design In light of this reality agencies should be prepared to evaluate such projects success or failure in terms other than that of immediate utility It is the philosophy underlying the system not the system itself which will ultimately be exported

APPENDIX A

Bucen Enforcement Proposal

APPENDIX A

BUCEN IN-HOUSE REVIEW ENHANCEMENT PROPOSAL

1 Easy to use interrecord referencing

2 Improved output file capabilities

A provide overflow protection on WRITE command

B provide OUTPUT command to create corrected questionnaire file that matches IP file data dictionary

3 Improvededit statistics reported (LISTERR)

A provide automatic (user-specified) area break

B provide options for compilation and displaying edit statistics at various levels

C provide automatic (user-specified) tolerance checking of error rates by area

D automatically capture IDs of areas failing tolerance check

4 Clean up known bugs in code

5 Comprehensive testing

6 Clean up and enhance documentation

A reference manual more examples error message guide

B installation guide

C systems manual

APPENDIX B

EVALUATIVE CRITERIA

APPEiUIX B [4 si 11 T q tII

UNITED STATES GOVERNMENshy

Pfern0 KGa 7 dll TO DSPOPFPSD Robert H Haladay

DATE December 3 1979

DSPOPDEIO Liliane Floge

SUBJECT Criteria for Evaluation of SuCen COBOL CONCOR September 30thVersion as presented at January 1930 orkshop

The following points should be considered when evaluatingCONCOR the COBOLediting system during participation at the January 19380 workshyshop

1 Are the Installation anual the System ilanual and theUsers anual ready for use Are these manuals adequate fortheir purposes Are the manuals clear and consistentthey be understood by subject-matter personnel Can

as well program ers

as

2 Is the editing system capable of impleaenting all thefeatures as described in the documenit_tion 3 Does the system appc3Ar to be ely debugged andtested and ready for release to dLvlooing conris 4 Does the system appear to be fast enough to edit a

census in a reasonable airount of time 5 hat size core does tine sys t- equire

6 CorFm-ent in general on coole te-nss usefulness andefficiency of ir-nease usethe n of by developingcountry prograrrers and other c nsus personnel

cc Su-a-- e Olds APiA SDi OrWJulio Ortuizar luotolaCelta Systems

APPENDIX C

WORKSHOP ITINERARY

APPENDIX C

CONCOR Workshop Schedule January 7-18 1980

U S Bureau of the Census International Statistical Programs Center

Scuderi Building Room 209 4235 28th Avenue Marlow Heights Maryland

Monday January 7

930 am shy 1000 Welcoming Remarks

Overview of Workshop

1000 - 1000 Introduction to CONCOR - Purpose and function

- History of development - General computer

requirements

115 - 200 System Description

-Constraints in design of CONCOR

-Basic subsystems of CONCOR

-User interactions with system

-Examples of outputs produced

1030 shy 1045 Break

1045 - 1200 Editing Concepts - Ways to interrogate

data - Ways to correct data

- Editing housing and

population data - POPSTAN

- Advantages of CONCOR

1200 shy 115 pm Break

200 - 230

230 - 245

245 - 325

User Program Organization -Divisions - Sections

- Routines - Commands

Break

Command Language

Description -Types of statements -Format -Syntax

is

2

Tuesday January 8

Dictionary Division Command Statements

115 pm-2 15 Input-Record-Section930 am-10

30 Punctuation - Define-RecordInput data referencing

Working-Data-SectionWorking data declara-- New-Datations - Array-DataInternal data represen-

tation and storage Break215 - 230

1030 - 1045 Break 230 - 325 Dictionary Examples

- Minimum dictionary1045 - 1200 Dictionary-Attributes-

structureSection - Maximum dictionary- Dictionary-Name

structure

- Hand out dictionaryFile-Section problem- Input-File

- Output-File - Write-File

- Error-File

1200 - 115 pm Break

Wednesday January 9

115 pm-2 1 5 Execution Division Command

930 am-1030 Discussion of Partici-Statements (continued)pants Dictionary

- Routines of Edit-Specishyproblems fications-Section

Free work time -Prolog-Routine -Filter-Routine -Epilog-Routine

- Types and functions 1030 - 1045 Break

of edit specification

commands1045 - 1200 Execution Division

Command Statements - Range- Punctuation - Assert- Subscripting

- Internal Identifiers - Report-Control-Section

-Generate-Edit-Statistics 215 - 230 Break-Count-Imputes

-Examples 230 - 325 - PassFail clauses

- List1200 - 115 pm Break

3

Thursday January 10

930 am-1030 Discussion of Problems - If115 pm-215

Free work time - UntilExit - Stop

1030 - 1045 Break 215 - 230 Break

1045 shy 1200 Edit Specification Command Statements 230 - 325 - Drecode

(continued) - Grecode

- Allocate - Update - Let

1200 - 115pm Break

Friday January 11

930 am-1030 Edit Specification 115 pm-3 25 Hand out Edit Specishyfications as problemCommand Statements

(continued) Free work time

- Output - Write

1030 - 1045 Break

1045 - 1200 Report Division Command Statements - Display-Control-

Section -Display-Edit-Statistics

- Tolerance-Control-Section -Error-Rate-Check -Reject-File

-Report Examples

1200 - 115 pm Break

I

29

Monday January 14

930 am-1030 Discuss procedures for running problems on computer

1030-1045 Break

1045-1200 Component Programs of the CONCOR system

1200- 115 pm Break

Tuesday January 15

930 am - 325 pm Free work time

Wednesday January 16

930 am 1200 Free work time

1200- 115 pm Break

115 pm-215 How to Install CONCOR on IBM 360370 OS

215- 230 Break

230-325 Free work time

Thursday January 17

930 am-325 Free work time

115 pm-215 Future CONCOR Design Considerations - for census processing - for survey processing

- manual correction system

215- 230 Break

230 - 245 Evaluation Guidelines

- Hand out evaluation forms

245 - 325 Free work time

Friday January 18

930 am-1030 Free work time 115-325 Free work time

1030 - 1045 Break

1045 - 1200 Group Discussion on CONCOR - submission of written evaluations Distribution of IBM installation tapes Presentation of Certificates to Participants

1200-115 pm Break

APPENDIX D

PARTICIPANTS

APPENDIX D

CONCOR WORKSHOP January 7 - 18 1980 Marlow Heights MD

PARTICIPANTS

KENYA

James Midianga Government Computer Centre Central Bureau of Statistics PO Box 30266 Nairobi Kenya

PANAMA

Omar Rivera Rios Data Processing Specialist Contraloria General de la Republica de Panama Apartado Postal 5213 Panama 5 Panama

PHILIPPINES Guida T Capellan OIC Computer Programming Division National Census and Statistics Office Solicarel Bldg 1 Magsaysay Blvd

Sta Mesa Manila Philippines

THAILAND

Angsumal Sunalai National Statistical Office Bangkok 1 Thailand

EGYPT

Farag Sedky Mourad Ghaleb CAPMAS (Centeral Agency for Public

Mobilization and Statistics)NASR City Cairo Egypt

L

2 31

SAUDI ARABIA

Shargi R Al-Shargi National Computer Center PO Box 2534 Riyadh Saudi Arabia

Ahmed S Al-Ghamdi Dept Director of National Computer Center PO Box 2534 Riyadh Saudi Arabia

OTHER

Robert W ONeal USREPJECOR APO New York NY 09038

John N Adams Data Processing Advisor DACCA Department of State Washington DC 20520

OR co UNDP PO Box 224 Dacca Bangladesh

Joe Quasney US Census Bureau Washington DC 20233

Howard Brunsman 5715 N Ninth St Arlington VA 22205

Larry Shiller and Julio Ortuzar Delta Systems Consultants Inc 264 Alhambra Circle Coral Cables FL 33134

Nicholas Ourusoff Burpee Hill Road New London NH 03257 (United Nations)

Frederic J Grant IV Systems Development GeorgiaWorld Congress Institute 580 North Omni International Atlanta Georgia 30303

3 32

Rafael Samper Mary Jo Keenan Catherine B Gleason LACDR Room 2246 NS Washington DC 20523

John Marshall AIDSERDMPSE Rm 706C SA-12 Washington DC 20523

Susanne Bacon and Mary K Friday ISPCData Processing US Census Bureau Washington DC 20233

Leo Dougherty ISPCWorld Census Staff US Bureau of the Census Washington DC 20233

33

CON C OR WORKSHOP

January 7-18 1979 Washington DC

Staff

Robert R Bair

Luis Garcia

David Malkovsky

Sandra Mansfield

Selma Sawaya

Vivian Toro

APPENDIX E

CONCOR EVALUATION FORM

0

APPENDIX E

CONCOR WORKSHOP January 16 1980

BASIC COBOL VERSION 2

December 1979 Release

Participant Evaluation of Package

1 What is your impression of this version of CONCOR in terms of its usefulness and reliability for editing housing and population census data in less developed countries

2 If you are familiar with previous versions of CONCOR how does this current version compare to them

2D

35

2

3 How would you compare the utility of this version of CONCOR for less developed countries with other systems for editing data of which you are familiar

4 How well do you feel the documentation (Reference Manual and Diagnostic Messages Guide) supports the software

36

3

5 How well do you think this version of CONCOR would serve for editing other types of statistical censuses or surveys

6 If your organization does statistical data processing by computer would you recommend to your superiors that this software be used to edit the data Do you feel assistance would be required in the installation of this software on your organizations computer andor in training userR on the packages applicability to their work

37

7 In what way(s) could improvements or enhancements be made to this version

of the CONCOR software andor its documentation

8 Other comments

1shy

APPENDIX F

NEWOLD COMMAND COMPARISONS

0 0

OLD CONCOR VERSION 1 DECEMBER 1978

DATA-DICTIONARY

DEFINE INPUT-FILE

DEFINE ERROR-FILE

DEFINE COMMON-DATA

(contains the questionnaire-IDand record type location information)

DEFINE RECORD-TYPE=

DEFINE NEW-DATA

DEFINE ARRAY-DATA

APPENDIX F

NEW CONCOR VERSION 2 DECEMBER 1979

DICTIONARY-DIVISION

DICTIONARY-ATTRIBUTES-SECTION

DICTIONARY-NAME

FILE-SECTION

INPUT-FILE OUTPUT-FILE WRITE-FILE ERROR-FILE

DOENTIFICATION-CONTROL-SECTION

AREA-CONTROL

QUESTIONNAIRE-CONTROL

RECORD-CONTROL

INPUT-RECORD-SECTION

DEFINE-RECORD

WORKING-DATA-SECTION

NEW-DATA ARRAY-DATA

END-DiVISION

COMMENTS

In Version 1 all command keyword labels had to begin in column 1 ofthe coding sheet The position notation of Version 2 permits thecommand to begin in columns 1-72

The period preceding a statement

signifies that it is but a commentcard Accordingly it should be noted that while the END-DIVISION command must be present the division name identifier DICTIONARY-DIVISION

has not been implemented

Note Some parameters for each commandhave been altered even where general correspondence exists

OLD CONCOR VERSION 1 DECEMBER 1978

EDIT-SPECIFICATION

PROLOG

FILTER TYPE ( )

EPILOG

ALLOCATE (ALL) ASSERT (AST)

END ERROR

FILTER TYPE ( ) WITH

IF LET

RANGE (RNG) RECODE (REC) STOP --keyword

UPDATE (UPD) WRITE (WRT) XRECODE (XREC)

NEW CONCOR VERSION 2 DECEMBER 1979

EXECUTION-DIVISION

REPORT-CONTROL-SECTION

COUNT-IMPUTES GENERATE-EDIT-STATISTICS

EDIT-SPECIFICATION-SECTION

PROLOG-ROUTINE

FILTER-ROUTINE (name-of-record-type)

EPILOG-ROUTINE

ALLOCATE (ALLOC) ASSERT (ASRT) DRECODE (DRCD) END-DIVISION

EXIT

GRECODE (GRCD) IF LET LIST OUTPUT RANGE (RNG)

STOP UNTIL UPDATE (UPD) WRITE (WRT)

END-DIVISION

COMMENTS

Periods preceding division and section names signify that they are treated as comments by CONCOR

END-DIVISION signifies the termination of the CONCOR division organization and must always be present even though division identifiers have not been implemented

This figure additionally permits a comparison of the EDIT-SPECI-FICATION commands of both CONCOR language versions Note the changes in abbreviations and that some keyword options (not shown) have been altered even where general corresponshydence exists Individual commands are discussed on the following pages

CONCOR LANGUAGE COMMAND STATEMENTS

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

ALLOCATE (ALL) ALLOCATE (ALLOC) Allows for the assignment of values and with the UPDATE command providesthe imputation mechanism Major difference is this statement now providesfor a receiving-identifier list and statistics are generated as coded inthe GENERATE-EDIT-STATISTICS command option of the EXECUTION-DIVISION

ASSERT fAST) ASSERT (ASRT) The ASSERT command is the basic consistency editor command of CONCOR Command now provides for a NOERROR and MESSAGE option Additionallyby utilizinq the DUMPR DUMPO keywords provides the user with the option of using all the current value of all user-identifiers defined for the record type being processed Other changes to this command include the pass end-pass fail end-fail constructions for the P purpose of maintaining structured logic

(see RECODE) DRECODE (DRCD) Provides a method for recoding data items having a starting value of zero and continue in ascending sequence Replaces t previous XRECODE command

EXIT Provides the means to terminate EXECUTION (ESCAPE) of command statements within the UNTIL command

END END-IF END-THEN

A solitary END command is no lonqer permitted in conditional construction Note the similarity to COBOL pseudocode

END-ELSE

ERROR not implemented No longer supported in language

(continued)

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

FILTER TYPE ( ) WITH not implemented No longer are option statements supported in language This command once specified essentially a GOTO depending on construction

(see XRECODE) GRECODE (GRCD) This new command provides a method for the group recoding of input values

IF IF The IF statement has properties found virtually in all programming languages ie provides a means of evaluating a condition and controlling the execution of alternative logical statements Similar to pseudocode a CONCOR programmer must utilize the structured command IF THEN END-THEN ELSE END-ELSE construction in place of the IF THEN END ELSE END statements acceptable to the 1978 version

LET LET Virtually unchanged

LIST New command which allows for the generation of specific messages to be displayed in the report of edit statistics by questionnaire as well as specific individual identifiers

OUTPUT New command which provides the means by which records will be written to the output-file

(continuej)

4)bull

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 VERSION 1

RANGE (RNG)

RECODE (REC)

STOP- keyword

UPDATE (UPD)

(continued)

DECEMBER 1979 VERSION 2

RANGE (RNG)

name changed

STOP- keyword

UNTIL DO END-DO

UPDATE (UPD)

COMMENTS

As a basic editing command of CONCOR the December 1979 version permits the listing of the out-of-range values as well as the entire record or questionnaire through DUMPR and DUMPO optionswithin the command Another new option of this command involves the utilization of a PASS END-PASS FAIL END-FAIL construction permitting the specification of logicalcommand paths depending upon the outcome of the range test

Name changed to DRECODE Virtually identical with the DRC command in the COCENTS system

Unchanged

DO-END-DO one of the most significant and powerful enhancements to the CONCOR language this command provides a loopingcapability similar to most other high-level pogramminglanguages The number of iterations is under user control as well as the value initially assigned to the user identifier (counter) which is incremented with each successive execution of the command statements

Virtually unchanged it is the basic mechanism for maintaining arrays involved in imputation processes

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

WRITE (WRT) WRITE (WRT) The WRITE language statement while once utilized as the primary output command is now used to create an auxiliary or derivative file Statistics may be accumulated with the specification of the GENERATE-EDIT-STATISTICS option of the EXECUTION-DIVISION (See WRITE-FILE of DICTIONARY-DIVISION)

XRECODE (XREC) name changed Old command which has become the GRECODE command in the December 1979 version

0 0 0

DECEMBER 1978 VERSION 1

DECEMBER 1979 VERSION 2

Not implemented REPORT-DIViSION

DISPLAY-CONTROL-SECTION

DISPLAY-EDIT-STATISTICS

TOLERANCE-CONTROL-SECTION

ERROR-RATE-CHECK REJECT-FILE

END-DIVISION

COMMENTS

This new division (note division and section names treated as comments) provides the means to controlthe type of edit statistics reports to be generated Generally all reports produced by the commands ofthis division are organized according to the AREA-CONTROL command of the DICTIONARY-DIVISION Notethat this means the input data file must be preproshycessed (presorted external to CONCOR) on the basisof the AREA-CONTROL data field as CONCOR is incapableof merging uncontiguous records

There is currently no command to enable users to specify unique report headings on the listingfrom this division

APPENDIX G

ISPC FUTURE ENHANCEMENT LIST

G-1

Appendix G

Future Versions of CONCOR Design Considerations

The present version of the CONCOR system (Basic COBOL Version 2) was designed to provide adequate computing power to properly edit housing and population census da a using automatic correction techniquns As a result CONCOR should not be construed to be the ultimate software package for generalized data editing The developers of this version of CONCOR realize that there is a group of improvements that can be made to the system which would facilitate the census data editing process A second group of changes or enhancements also exist that could facilitate the editing of survey or other types of census data It should be noted that some of the modifications to meet these two objectives at times are mutually exclusive Some of the possible modifications and considerations are outlined below

1 Housing and Population Census Processing

1 1 Software

1111 Dictionary Division

(1) Implementation of the Dictionary Attributes Section This would allow the user to specify the dictionary size (number of pages) on disk and control other file characteristics such as volume specification and retention periods if applicable

(2) Addition of an optional input andor output file(s) which could be used to initialize and save values stored in arrays and used during the allocation process This feature would require additional syntax analysis and the creation of new commands to perform the movement of the data values

(3) An output record format area (Output Record Section) could be added This would allow the creation of edited output files in a different format and length than the file read into CONCOR This implementation would require other enhanceients to provide a simple means of moving values from th-iT i tato tile output file Unique naming of user-identifiers would also need to be addressed Another possiblity is the declaration of both the input and output location in the same command when specifying the contents of tile data record to be edited

(4) Addition of user-identifier naires to fields used in the controlling of summary statistics and unique questionnaire determination This would require modification to the system level data dictionary

46 G-2

(5) Increasing the number of area and questionnaire control fields

This implementation would require modification lo the

dictionary as tell as to the error records passed to the Report

Division Another problem would be the formatting of the extra

data values in the report headings

(6) Permit the mixing of difFerent data item types in the CONTROL

commands This would allow the greatest flexibilty in choosing

control fields In order to implement this enhancement the

dictionary and the generated COBOL program (EDITOR) would

require modification

(7) Give the user access to the values in the control fields during

the editing process Either the user would be prohibited from

changing these values (as is currently done with CONCOR

internal identifiers) to prevent the creation of new questionnaires or the summary information gathered on a

questionnaire basis would become meaningless Another

ramification of this enhancement is how to handle fields defined as alphanumeric as CONCOR internally works in binary

(8) Allow input data records with a similar format to be defined by

the same DEFINE-RECORD command statement This could be

accomplished by permitting multiple values in the RECORD-TYPE

clause of the DEFINE-RECORD command A possible problem here

is in the determination of which value to be move to the output

record when the record is to be written out using the OUTPUT

command 0

(9) Implement a COMMION-DATA concept to handle items command to all

record types This could be costly in terms of core storage

and additional conversion time if a storage area is allocated

for each identifier for each record type Only permitting one

common area for all records would require validation of the

data to ensure the values were exactly the same on each record

This wJould require the CONCOR program to perform many more

internal checks when reading the data file into the store area

(10) Permit the selective inclusion of data items found in the

DEFINE-RECORD command statement into the universe count used

for the determination of tolerance levels when using the COUNT-IIPUTES command statement

(11) Increase the number and types of data format referencing now

permitted External numeric data (type N) could be expanded to

handle 18 digit numbers Also signed numeric and packed

decimal data format could be added The signed numeric format

would allow both leading blanks and negative data values This

format is essential ihen dealing with data files generated by

FORTRAN programs

0

+

G-3 47

(12) Increase the number of comparison strings permitted for alphanumeric data items This would allow for easier processing of alphanumeric strings but would require modifications to the system data dictionary

(13) If a range of valid values were specified with the declaration of a user-identifier in the DEFINE-RECORD command an automatic range check could be performed prior to CONCOR giving the user access to the data items on the questionnaire The inclusion of such values are now done by many users in the form of comment statements This enhancement would also facilitate the use of the dictionary by tabulation software

(14) The language format used in the Dictionary Division could be modified to recognize both the space and comma character as valid delimiters The user would still be required to code two commas to receive the system default value for any parameter

(15) Provide a repetition factor in the initialization of arrays

(16) Print the data dictionary name in the titles of the dictionary documentation or provide a means by which the user may specify a title to appear in the listings

112 Execution Division

(M) Implement the Run Control Section This would allow the user to increase execution speed of the EDITOR program This would be accomplished by eliminating some of the system protection features such as division by zero and out of bounds checking In this secion the user could also specify new maximum values for the EDITOR error limits

(2) Provide an internal identifier (OCCURRENCE-PTR) which would contain the occurrence number of the record currently being filtered The only problem to resolve is how to treat this id-ntifier outside of the Filter Routines

(3) Implement a CREATE commnand This command would allow the user to create a record that Was not found on the input file This implementation would require special care as it gives the user the ability to change the valme of a CONCOR internal identifier (TYPE-CGUNT) and modify the contents of the store Another consequence of this command is what action is to be taken in the calculation of tolerance in terms of this new records inclusion into the universe of observations and number of changes

iK

48 G-4

(4) The code generated in the EDITOR program for the DRECODE

command should be modified to utilized a lookup table approach

This would decrease the execution time of the command but woul C

require additional communication between the GENED and GENDD

programs

(5) Permit the user to continue the message text field over

successive lines of code

(6) Implementation of a SET command command that would change

CONCORs default occurrence pointer for the duration of the SET

command This command would be ever helpful as the validation

of the existence of a record would only have to be done once

when the SET command was encounterd The major drawback to

this command is in the users ability to remember the action

taken during the last execution of the command when the

Execution Division command statements may cover many pages of

code

(7) Implement a method of specifying different data formats for

fields on the WRITE command statement This would give greater

flexibility to the user but would require major revision to the

routines now used to process the WRITE command

(8) Provide the user with two sets of internal identifiers The

first set currently exists in the system and is valid on the

basis of the current control area (if specified) being

executed The second set would be accumulated on a run total

basis and would require the creation of new CONCOR internal

ident ifiers

(9) Implement automatic array declarations for capturing allocation

frequency distributions This could be done by the system for

discrete values (especially if point number 13 above is

implemented) but a mechanism for handling continuous values

would need to be designed nother program would then be

required to format the information gathered into a readable

form

113 Report Division

(I) Provide a means by which the user may specify their own headings or titles -for the reports

(2) Give the user the ability to override page ejection as a means

of saving paper

(3) Provide a neans by which the statistics gathered may be

presented in other aqgregations other than the area-break and

total levels now pro ided This would require another

procedure to aggregate the statistics and pass them onto the

Report Division before the printing of the reports began Thi

extra procedure is required because of program size

cons i derat ions

G-5

49

114 System Level

(1) Generation of assembler (ALC) language code for the EDITOR program to be used on the IBM 360370 series type computers

(2) Modification of the GENSRC program i utilize a lookup table instead of the linear search technioa2 now used

(3) Modification of the RDMSGTXT program to look at continuation lines when searching for the text to be supplied by the system

(4) Hlodification to the system data dictionary to better arrange and make available information needed most frequently Also to examine the possibility of making the section dealing with the message text and report generation a separate file Investigate the industry standards for dictionary format and see how CONCOR meets the requirements set forth

(5) Make the parsing and syntacial routines interactive so as to increase the product ity of the CONCOR user This does not mean to imply that the editing process itself should be made interactive but rather only the development of the user code should be put online

12 Documentation

(1)The development of a self-teaching CONCOR manual

(2) The development of a CONCOR Users Guide

G-6

0

50

2 Survey or other Census Processing

21 Software

(1) Make optional the specification of the record type and questionnaire identification information as some files are not

hierarchical in nature

(2) Provide another input file as a means of doing a check-in procedure of sample cases expected This new input file would contain the master list of those oberservations selected to be

examined

(3) Allow floating point calculations

(4) Provide a mechanism for referencing repetitive sections of questionnaire This referencing (loop letter approach) could be accomplished in the same fashion as interrecord checking is now implemented Note that by using this approach two versions of the software would be required One version would be as CONCOR is now implemented and the second would accept flat data files where the subscript indicated a repetitive section on the survey document

(5) Add a weighting scheme to allow meaningful totals and summary

statistics based upon the sampling frame used to gather the data being edited

(6) Increase the number of user-identifiers allowed in the system because of the more detailed information found on survey forms

(7) Provide a means of manual correction of erroneous data items This correction scheme would utilize the data dictionary and allow for correction based upon the user-identifer name and

questionnaire identification CELADE has a COBOL version of

this program but it has not been finished nor tested

42

APPENDIX H

CONCOR SYSTEM INTERNAL VARIABLES

0 APPENO H

CONCOR SYSTEM INTERNAL VARIABLES

1 EOF-FLAG

2 ERROR-TN-QUESTIONNAIRE

3 CURRENT-POINTER-VALUE

4 TYPE-COUNT ( )

5 RECORD-COUNT

6 QUESTIONNAIRE-COUNT

7 RECORDS-IN-STORE

8 INVALID-RECORD-TYPE-COUNT

9 INCOMPLETE-FLAG

10 CONTINUATION-FLAG

11 INVALID-RECORD-FLAG

EOF-FLAG

ERRORS-IN-QUESTIONNAIRE

TYPE-COUNT ( )

IN-RECORD-COUNT

IN-QUESTIONNAIRE-COUNT

RECORDS-IN-STORE

INVALID-RECORD-TYPE-COUNT

INCOMPLETE-FLAG

CONTINUATION-FLAG

OUT-OF-RANGE

NOT-NUMBER

BLANK

(new internal counters) OUT-RECORD-COUNT OUTPUT-QUEST-COUNT WRITE-RECORD-COUNT

Change in spelling of variable

Not mentioned--possible ommission from manual

Not implemented

Note Current documentation equivocates when some variables

are reset--most are initialized with each control area break

Though implemented as reserved identifiers due to poor documentation of (old) CONCOR exact values and utility were unknown

These apparently new counters permit access to valuable totals within control areas

Note These are not cumulative counters Also outputshyquest-count violates naming conversion eg in out

APPENDIX I

CONCOR-EDITOR EXECUTION STATISTICS

C 0 N C o R - E D I T O R EXECUT ION S-TA-T1 ST I CS RUN DATEt 517I

APPENDIX I

CONTROL AREA 0 INPUTS RELD OUTPUTS BY OUTPUTCOHMAND OUTPUTS BY W0ITECOMP4JD NO VALI 0 SEQUENCE NUMBER 00000 00 0 RECTYPES 0

OUESTIONNAIRES RECORDS 0 QUEST KEYWORD- 0 ECORDS- RECORDS FORQUFSTshy

bullDIVISION BY ZERO LINE NUMBER i18 to DIVISION BY ZERO LINE NUMBER 118 bo DIVISION BY ZERO 00 LINE NUMBER 118

Oboo DIVISION BY ZERO 00 LINE NUMBER 118 LTNE |UJMP 000 DIVISION BY ZERO 000 LINE NUMBER 118

bo DIVISION BY ZERO 0 LINE NUMBER 118 00 RUN ABORTED DUE TO DIVISION BY ZERO ERRORS 00 1it LIST IR003 = Ro3t

1 6 42

112 LIST

Comments One of the most sever criticisms 114 LIST IDIVISIION BY ZErO ERROR Cw FOLLOWS of the old CONCOR package was that ]l LET RO01 003 = O it would cease processing for no

apparent reason--ABEND without generating any messages The most )16 LIST iRO01 R003 SET TO ZEROS Pou19 Ron3 conmon types of cessations are division by zero array coordinate 117 LIST RO0 0 = errors and user specified terminations A program to deliberately 118 LET P003 = R002 RO01 test CONCORS ability to providediagnostics was written The system 119 LIST LIST R003 R002RO01 =1 R0031 perfoned exceptionally well and correctly provided the line numbers in the 120 source program which caused the data exceptions The program text has been 121 superimposed upon the Execution Statistics page for illustration purposes 122 END TEST OF OPERANDS AND MSG

123

124

1 5 PEGISTERS RESET TO ZERO

125 LET R001 RO00 R003 = nt

0 N C 1 - Z D I T 0 it - ( r S T A T I S T I C S QJi DAT 0119

I ~ - ~APPENDIX b)APPENDIX TL I A Cr14A) 01 3TUTS JRtTF -0 VtLIQ AP-A I PUT R - 3ITD JTS LY UTPUT - Y C

I)ESTIjAAIES - PECLRW cU] I S T igry-t QEC UP )S irrkqS F] ) T

VLLII= FF= 15( C 1 u U-1 Tr JJ TI A)[ 9 )i 1= TI =WATE ) L i - U = VI L IJ) L T J3r-= I n

D 1 4A 1 EItR P IXSTI J A 1 L = 7 II I fIAT E I i Y C

= y 4 I 19f CZ U AT -Z-2j 1ESTIJh14 E CJ)PI) IATE= 2 V L 1E= L T I P1 =UM 11

= rU F r

F R L- -- I JE TI rjA OrPR eTE= rUMiF=E II V1 LIIF= LIT V M

tYfZU ttj gR R QUIESTI3Ii1 j Vi r IATE= LI n l FJw== 0R V J L T C ( -pJIT E~Q - cUESTT Y2ZjI 1= g rfiflUI)IATL= VeLUJE= P LIf PP= 1 3)

bull~ - A ED 011 1 Tn CCy -tENCE C rGR DI T FOR UP S )r 0 JCCUJ0EICF FJ71 ECGPO

173

175 UNTIL ROO1 gt 10 VARY P001 FROM 1 By Comments

In this example RO01 a 2 dimensional array 177 DO with 10 rows and 10 columns is attempting

178 UNTIL R002 - 10 VARY BO FROM 1 BY 1 to update the smaller (5 X 5) TA2 Nonshy- Uexistent element references caused the error

The program text has been superimposed on this

page for illustration purposes

180 ALLOCATE NOOP = TA1 (RO0] ROO)I

JAI UPDATE TA2 (RO01 R002) RO031

182 LIST N002 = R0037 NOO R003

1P3 LIST VALUES OF TA1 = RO03v TAl (RO01 R002)

184 LIST

185 LIST ITA2=TA2 RO01 Rn02)l

l6 END-DOI LIST

187 END-DO

deg r- -- shy

APPENJDIXI1 J( j

CONTROL AREA INPUTS READ OUTPUTS BY OUTPUT LOMANO OUTPUTS TEltCOMMNDfN

SEQUENCE NUMBER 11T A QUESTIONNAIRES RECORDS QUEST KEYWORD RECORDS SFf- 0 T rr-

JOB STOPPED DUETO USER STOP-RUN COMMAND ON LINC 215 - - -- 2 15 _ _ 4$ A

EDITOR -- NO RMAL END OF J B r T I1)

214 IF IN-RECORD-COUNT gt Comments In this example when the internal variable1215 ITHEN STOP-RUN END-THEN was than

215 TE END-THEN IN-RECORD-COUNT was greater than 500 processing was directed by the source COBOL CONCOR language

216 1 program to cease The program text has been - upon this page for illustrationI _-EC superimposed

S IF IN-RECORD-COUNT- lt 2 - - purposes ~~~~~~~~~~- bull - - ------I-

218~ THEN -LIST lt(THAN 20RECS INR

amp-~~~~~ ntai-beDcent

I

APPENDIX J

DIAGNOSTIC MEFSAGE GUIDE EXAMPLE

APPENDIX J DD-2

WARNING(DD-001) BLANK COMMAND STATEMENT LINE

EXPLANATION A blank command statement line was found in the user Dictionary Division command statement file

ACTION TAKEN Parsing continued with the next command statement line in the Dictionary Division command file

USER RESPONSE Put a comment indicator () in column 1 of the command statement line if spacing is desired if not remove the command statement line

ERROR (DD-002) ILLEGAL FIRST CHARACTER IN STRING

EXPLANATION A character not in the CONCOR legal character set (A-Z 0-9 - + the comma character () the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found as the first character in the string

ACTION TAKEN Parsing began again with the next string

USER RESPONSE Check for possible keying errors and ensure that the first character is a member of the valid CONCOR

character set

ERROR (DD-003) END OF LINE REACHED WITH NO DELIMITER FOR ILLEGAL STRING

EXPLANATION A string starting with a character not valid in the CONCOR legal character set (A-Z 0-9 - + the comma character ()p the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found without one of the delimiter characters ( =) when the end of the command line was reached

ACTION TAKEN Parsing began again with the next user command statement line in the Dictionary Division command file

USER RESPONSE Check for possible keying errors and ensure that each of the characters in the string is a member of the CONCOR valid character set (A-Z 0-9 - + the comma character () he delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ())

ERROR (DD-004) COMMENT INDICATOR () IN STRING

EXPLANATION The comment indicator character () was found embedded in a string

Page 3: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …

ii

CONTENTS

Page

PREFACE

EXECUTIVE SUMMARY iii

I BACKGROUND 1

II THE ADEQUACY OF CONCOR 3

III PROPOSED CHANGES TO THE LANGUAGE STRUCTURE 5

IV THE ADEQUACY OF COBOL CONCOR DOCUMENTATION 12

V CONCLUSION 14

APPENDICES

Appendix A Appendix B Appendix C Appendix D Appendix E Appendix F Appendix G Appendix H Appendix I Appendix J

- Bucen Enforcement Proposal - Evaluative Criteria - Workshop Itinerary - Participants - CONCOR Evaluation Form - NewOld Command Comparisons - ISPC Future Enhancement List - CONCOR System Internal Variables - CONCOR-EDITOR Execution Statistics - Diagnostic Message Guide Example

0

EXECUTIVE SUMMARY

The December 1979 COBOL CONCOR (Version 2) is a much improved software package All commands appear to be functional however the system should be exhaustively tested by an independent agency prior to its general release This agency Should also precisely determine the systems relative speed and core processor requirements While the system (exclusive of documentation) could immediately be utilized in a situation of extreme need some CONCOR language coding inconsistencies detract from the learnability and exportashybility of the package and should be corrected Additionally there are other modifications or adjustments which would enhance the overall utility and productivity of the language in census and survey applications

System documentation continu~s to be a problem There is no Users Guide The Systems Manual though well-constituted informationally should be thoroughly reorganized in accordance with the guidelines set forth in this report

The staff of ISPC exhibited competence and professionalism in the conduct of the two-week workshop January 7-18 1980 ISPC generally is aware of both the potential and shortcomings of the CONCOR project The current CONCOR version makes a significant redesign of the overall system As a package it is in a state where its completion is within reach

I BACKGROUND

CONCOR (an acronym of Consistency and Correction) is best characterized as a software tool designed to expedite the processing of data files duringthe edit and imputation phase of population census and surveys As a metashycompiler written in the COBOL language the system reads and verifies CONCOR language statements to produce an executable EDITOR program The objectiveof this process is the creation of an error-free file which can be used at a later time for tabulation purposes

Since its release as Version 1 December 30 1978 numerous elements of the COBOL CONCOR system have undergone continual change and redefinition Infact the system has not been permitted to stand still for any period of time nor has it been exhaustively tested In June of 1979 ISPC suspendedthe further distribution of the COBOL CONCOR system This decision was based principally upon reports of the packages unsatisfactory performance at workshops held in Panama and Thailand ISPC upon their own initiative developed a proposal to overhaul CONCOR and its accompanying documentation This proposal is contained in Appendix A representing an ambitious undertakingWhile not all of the desired changes and capabilities could be implementedVersion 2 of December 1979 represents a significant managerial effort The questions are now whether COBOL CONCOR Version 2 will be a demonstrably adeshyquate sofrvare package -- a package capable of exportation to developingcountries -- a package requiring no further modification The purpose of this report is to address these critical issues In connection with this Appendix B sets forth the specific criteria around which such a discussion must evolve As this is not intended to be a compendium some of these broader issues will be immediately treated following chapters and appendiceswill qualify the exact nature of system altu ations already undertaken as well as further adjustments believed to be essential in realizing the goals of the systems philosophy

Workshop

During the period of January 7-18 1930 a workshop was held under the sponsorship of the ISPC to demonstrate the capabilities of the latest reshyvision of the COBOL CONCOR software package A schedule of events of this workshop is contained in Appendix C This workshop was intended to provideparticipants with the opportunity to program in the CONCOR language and to thereby test aspects of the system as individually appropriate A listing of the participants and the international organizations they represented is

EDITOR is the new name of the EXECUTOR module of previous language versions

A complete history of the development of COBOL CONCOR can be found in both the ACCENTER 1978 Version 1 and December 1979 Version 2 systemsmanuals as well as in previous consulting reports

2

contained in Appendix D During the concluding days of the workshop each participant was asked by ISPC to provide a written evaluation of the now-called December 1979 version of CONCOR This evaluation form Appendix E also inshycludes space for comments concerning the competence of the system documentation as well as any additional comments including these regarding the organization and clarity of workshop presentations It is assumed that in the near future summaries of these comments will be available to interested agencies

While virtually all instructional aspects of this two-week workshop were conducted in a highly professional manner -- a manner which revealed a high degree of coordination among staff members in their efforts -- there are several areas which future workshops may improve upon

1 All publications should be assembled in their entirety and proof-read prior to distribution

2 A complete CONCOR language program example and accompanying 110 documents should be provided at the onset of the workshyshop for reference

3 Numerous short application programming problems involving all CONCOR language divisions should be utilized in place of a single lengthy problem

It is noted that this workshop was not intended to teach the CONCOR language as the organization and presentation of materials probably would have been different It is believed that the two-week time period was sufficient time to provide participants a familiarity with the use of the new CONCOR features especially in light of the fact that workshop participants were permitted to work weekends and beyond normal working hours at their disshycretion Though funding was not generally available it is known that several workshop members chose to extend their stay inWashington to continue testing the COMCOR package or to work on projects which they could attempt to immedishyately install on their home computers At the conclusion of the workshops participants were permitted to take with them an installation tape of CONCOR as well as all the other materials they had acquired during the course of the project

3

II THE ADEQUACY OF CONCOR

CONCOR has been described by its designers as an adequate packageAdequacy as an evaluative criteria is often relative to need and should not be confused with readiness as an issue The CONCOR system exclusive of documentation is sufficiently corplete that in a situation of extreme need it could be used as a data-cleaning tool in the editing and imputation phaseof census processing Less extreme circumstances would impose reticence on such an endorsement Though non-exhaustive tests indicate that CONCOR appearsto be capable of performing all of the commands as implemented because of the rapidness with which the system was rewritten it is thought that there has not been enough time to fully test all aspects of the project Thereforeprior to its general dissemination it is recommended that an independent agency conduct exhaustive tests to certify the integrity of the system proshygrams The importance of this certification cannot be understated in lightof previous workshop experiences Concurrent with this testing process the same agency should determine the relative speed and size of the system under actual production circumstances and further determine CONCORs ease of nstalshylation Later sections of this discussion set forth additional testing recom endations

It is generally recognized that of all the data-cleaning tools available for exportation CONCOR is potentially the most powerful especially with the addition of its new commands as outlined in Appendix F While its utility is not in doubt one must ask the question of how much more useful could CONCOR be if modified and would this additional utility be worth the costs involved The nature of modifications (excluding documentation) to COBOL CONCOR approprishyate at this time for cnsideration are threefold

1 Adjustments to the elements of the system which are internallyinconsistent or awkward to facilitate its learnability and usability am ig developing country programmers

a Implementation of the division headings DICTIONARY-DIVISION EXECUTION-DIVISION and REPORT-DIVISION De-emphasis of section headings

b Improvement of the consistency among data identifiers allow alphanumeric variables to be coded without mandatory comshyparison strings throughout the DATA-DIVISION and to be of the same length of numeric variables Permit numeric identifiers to be of an equal length to NEW DATA identishyfiers Permit the coding of single dimension row and column vectors in the same manner as multi-dimensional arrays

2 Implementation of selective commands and internal variables to facilitate the production environment use of CONCOR in census applications These include

4

a LOADUNLOAD arrays Commands which would save and replace automatically hot-decked values from batch to batch

b TOTAL-QUESTIONNAIRE-COUNT-RECORD-COUNT internal

variables independent of AREA CONTROL

3 Other modifications

a Default values for max-storage parameter set in realistic range

b Allowance of more variables for survey applications

Some of these modifications are part of what ISPC calls its wish list for the future development of CONCOR This document has been included in this report as Appendix G It is arguable that these features are essential to the completion of the CONCOR package While it is beyond the scope of this report to draw a conclusion in this area the enhancements as outlined above are ones that would make the language more internally consistent and thereby easier to learn and apply to a census data production environment These modifications are not arbitrary or cosmetic but are a direct result of handsshyon programming experience in the language as well as observations and disshycussions with other workshop participants While it is probably impossible to ever be satisfied with the overall structure of any programming language the resolution of this issue of completeness must be made relative to the objecshytives for developing the COBOL CONCOR system in the first place An explicit statement of these objectivEs has been absent in all systems documentation to date

5

III PROPOSED CHANGES TO THE LANGUAGE STRUCTURE

Based upon the assumption that it is the intent uf sponso-ing agenciesto optimize the COBOL CONCOR package -- a goal which is believed currentlyobtainable -- an understanding of the nature of these changes and how theywould impact users is essential Appendix F sets forth in a comparative manner differences between the old December 1978 and the new December 1979 editions of CONCOR Studying this appendix obviates the fact that while the new version of the language is clearly superior to the old in nearly everyaspect the basic and overall structure of the language is essentially unshychanged Compartmentalization of aspects of the language into divisions represents a significant ideological enhancement to the language Indeeddevelopment of programs by divisions proved to be an extremely useful way of understanding the nature of editing work to be performed However note that while the END-DIVISION comnand is essential to the language the division headings DICTIONARY-DIVISION EXECUTION-DIVISION and REPORT-DIVISION were not implemented and are therefore preceded by a period to be treated as comment lines in the program listing It is inconsistent to implement END-DIVISION commands while not implementing the division headings It is believed that this division structuring is important enough to the overall organizational structure of a CONCOR source language program that it should be implementedprior to general distribution The section headers shown on the figures inAppendix F however are another matter They are cumbersome and were generally not coded by workshop participants and they could be deleted from this version of the language altogether with little loss in organizational understanding The CONCOR language is significantly powerful to stand on its own as a distinct product and is not meant to be a COBOL imitation its present degree of development and specialization do not warrant the structural drag of additional section identifiers is the probable intent of the originalCONCOR project was to develop a package which was uncomplicated and unwieldy to use The question of division and section names implementation while seemingly cosmetic can have real impact on its perceived easiness of learning and use

Figure 1 on the following page illustrates common mistakes programmersmake coding numeric and alphanumeric variables in the DATA-DIVISION These mistakes are the result of the inconsistent variable formats For instancein the numeric data definition statement it is permissible to specify 19-23 where N signifies numeric 9 signifies the length of the item and 23 specifiesthe starting position in the record In NEW-DATA however it is possible to code an item with a maximum length of 18 While on the surface this inconshysistency would seem harmless typically some data defined user variables in NEW-DATA defined N18 could be moved inadvertently to output record fields defined by a data definition statement 119-23 such an action would result in a data error Under certain circumstances itwould be highly desirable to output these larger length values A similar circumstance exists between the numeric and alphanumeric data coding conventions While the maximum lengthof the numeric is permitted to be 9 in the data definition statement (18 in NEW-DATA) the maximum alphanumeric variable is permitted to be only 4 characters in length In the current systems manual it is recommended that

FIGURE 1

DICTIONARY-DIVISION

DICTIONARY-NAME DATA-CODING-EXAMPLE

INPUT-FILE

OUTPUT-FILE

AREA-CONTROL N2-2 N2-4 N3-6 N2-9 N9-23 QUESTIONNAIRE-CONTROL A4-2 A3-6 A2-9 A3-11 A3-14

RECORD-CONTROL Al-l

DEFINE-RECORD

HOI-TYPE-OF-HOUISING-UNIT Nl-17

H02-MATERIAL-OF-ROOF N1-19 10 9

H03-TOTAL-PERSONS-IN-UNIT N8-40 NOT-NUMERIC BLAIK

1104-STATE-OF-UIIII-CODE A4-50 0 U 1 D

DEFINE-RECORD

P01-SEX 1-13 W F

NEW-DATA

NOI-SAVE-TYPE-OF-HOUSING-UNIT

N02-SAVE-TYPE-OF-ROOF 1

N03-COUNT-TOTAL-IN-UNITS 10 0

N04-AGGREGATE-INCOME 18 0

END-DIVISION

Explanations

N2-4 This is an example of an external numeric input data item (N) with a length of 2 bytes starting in column 4 of the input record The maximumlength of this type of variable outside of NEW-DATA is 9 When coded in

NEW-DATA 18 is permitted

A4-2 This is an example of an external alphanumeric input data item (A)

with a length of 4 bytes starting in column 2 of the input record This

construction for alphanumeric variable is valid only in the control stateshyments Additionally it can never be over 4 bytes in length When alphshynumeric data fields are defined within record types the EDITOR program

requires that the comparison strings always be specified A maximum of 3 is permitted The purpose of these strings is to force recode the data to a numeric value If no match is found EDITOR automatically assigns a unique negative value to the field

7

alphanumeric coding be utilized in the QUESTIONNAIRE-CONTROL and RECORD-CONTROL statements where each input data item must be of the same data type as shown in the example When alphanumeric data variables are used in these control stateshyments their construction is identical to that of numeric items However when used elsewhere in the DATA-DIVISION alphanumeric variables are required to specify one of three possible comparison values as shown There are number of production instances when it never would be necessary or even desirable to reshycode alphanumeric data However as CONCOR attempts to force data into a totally numeric format upon output there is no current way to preserve these values if desired

An unwieldy alternative to this situation which may be acceptable under some circumstances would be the expansion of the number of comparison stringsfrom three to a more realistic number The limitation of this compromise is that a full twenty-six comparison identifiers would be required in order to accommodate data which utilized the entire alphabet A better solutionhowever would be to make the general format of the alphanumeric variables identical to that of numeric identifiers ie A9-23 and to permit alphashynumeric values so defined to pass unaltered through the CONCOR system

Anocher data-naming convention which caused several errors and which could be corrected concerns the array data definitional statements While arraysof two and more dimensions are handled in a superior manner by the CONCOR proshygram single-dimension arrays pose a problem in coding as shown in the Figure 2 It is suggested that the command imperatives be changed to permit the codingof both rows and columns in single dimension arrays ie allow a single row vector as well as a single column vector to maintain the consistel -yof the array data definitional statements

A major requirement of COBOL CONCOR file processing concerns the fact that all related data records must be physically contiguous on the input file The implication of this requirement is that files may require preprocessing prior to actual data editing (This preprocessing is usually a sort routine upon a selected CONTROL-AREA key) While this type of processing merely introduces a new step in file processing a major limitation becomes apparent when a largenumber of DISCRETE DATA files of the same census or survey questionnaire are to be processed This limitation is the introduction of manual steps to save the most recent inputed values ie preventing the program from startingwith cold values each batch run If a command such as LOADUNLOAD ARRAYS was incorporated into the language (an enhancement not believed to be difficult to implement) manual processing would be reduced to a minimum between batches and the maximum benefits of the hot deck methodology would be realized It is envisioned that such a command would automatically insure the transfer of the appropriately designated hot values Automatic processing of this nature if done correctly can greatly reduce the time required to clean multishyvolume files for once CONCOR language statements have been compiled linked

While it is possible at this time to save the arrays that amp-e used in the imputation processes on a separate write-file right now it is not possibleto automatically load those values back to an object program and to iTmedishyately resume processing on another volume It isbelieved that suh an automatic feature of the language would cut down the manual processing time significantly enough that it warrants inclusion into the package prior to its general distribution

FIGURE 2

A05-DI FF-BETWEEN-AGE-OF-FEMALE-BY-RECATION v2 4 4

AGE iF LHUSBAND RELATION Connents The ARRAY-DATA command statement

12-L7 18-24 25-35 36+ provides the means to declare array identifiers

2 1 3v 4v HEAD with up to five dimensions Current documentation 2 -1 3 CHILD is not as explicite about the rules of this 1 31 -2 -4 OTHER command as is desirable The parameters of I 2 2 2 NONRFLATTVE the command should function as follows

user-identifier number of dimensions D R C M number of rows number of columns

magnitude of element intiial start up valuesA06-DI FF-BETI4EEN-ArE-DF-PERSON-AND-M4OTHER 114

(This coding generates the below In the example A05 is a two dimensional array 16 18 21 23 error message) with 4 rows 4 columns a default magvitude of 9

and cold deck values as labeled

A06-DI FF- ETWEEN-AGE-OF-PERSON-AND-MOTHER 11t 587

I 2 III jqARNINIGDD-207) COMMAND TERMINATOR I) NOT FOUND C) ASSUMED PRESENT (2) EPROR (DD-9lI) DIMENSION OF USER-SPECIFIED ARRAY IS LESS THAN THE MINIMUM VALUE PERMITTFD (2)

PREEV1OUS DIAGIOSTIC AT CINIE 563

As shown by the array variable A06 CONCORs treatment of vectors is not consistent with the above multidimensional array skeme ie

(Example of how vectors must be currently A06 must be coded as follows coded to be correct)

A06-DIFF-BETWEEN-AGE-OF- PERSON-AND-MOTHERtl42 user-identifier 1 dimension number of elements in vector magnitude of element initial start up values

A simple modification to this command would permit 6 Lthe coding of both row and column vectors and make

16 LB 21 23 this command less error prone

0 0 0

9

and stored as an object module on the system no other compilations should berequired for questionnaire processing files of the same type Theoreticallya single well-written CONCOR program is all that would be required to process an entire census run

Appendix H contrasts the internal identifiers of the old and new languageversions Without such identifiers a user would have little information about the status of input as it is processed by EDITOR As noted in theappendix most internal pointers are reset upon each break in the CONTROL-AREAprovided a CONTROL-AREA has been defined The limitation here is that there are obvious instances when the termination in the processing mode would beadvantageous based on run counts although a CONTROL-AREA has been specifiedeg debugging CONCOR programs or comparing input files Therefore another set of pointers should be implemented for this purpose and made available for programmer reference

One clearly disturbing development which needs to be pursued during inshydepth testing of the system concerns the MAX-STORAGE parameters of the DEFINE-RECORD statement As shown in the figure on the following page when MAX-STORAGE was set equal to the maximum value a COBOL program was generated whichrequired 1O00K of core to run The MAX-STORAGE value of 999 is clearly notrealistic under most processing circumstances This example drives home severalimportant points about CONCOR The core requiremenis of CONCOR generated proshygrams can be influenced significantly by the amount or nature of programmerspecified I0 operations In fact it is possible to generate a program of a size most foreign country machines could not process It is recommended that tests determine a realistic max-value restriction for implementation to prevent problems in this area

The final area of recommended modification concerns the newly implementedREPORT-DIVISION The purpose of the REPORT-DIVISION is to enable a user todescribe or specify certain CONCOR language statements which will generatestatistical reports These reports contain statistics generated by EDITOR as specified by the GENERATE-EDIT-STATISTICS command of the EXECUTION4-DIVISIONAll of the reports produced are organized according to the data fields definedby the AREA-CONTROL command of the DATA-DICTIONARY If the AREA-CONTROL command is not defined in the DATA-DICTIONARY then all the statistics aresummarized at the total run level If a control area field is defined then allstatistics will be summarized for each unique CONTROL AREA as encountered bythe EDITCR program on the input file Statistics by total run level will notbe available This in part relates back to previous discussions citing theneed for new internal identifiers Report listings may contain the values ofentire records or entire questionnaires depending upon the keyword used inthe report generation commands The problem centers upon the homogeneity of CONCOR printouts during a production run

It is virtually impossible to distinguish reports on the basis of thevolumes they were run against Some means should be provided to allow users touniquely and purposefully label the reports generated in this division Indeedthe whole name REPORT-DIVISION suggests that such a command is implicit andappropriate Such a LABEL-REPORT or REPORT-FILE command along with file inforshymation from the system should not be difficult to implement

FIGURE 3

C O N C o R PAGE

SC0 N S I S T E N C Y A N D C-0 R R E C t I ON ) RUN 6AE 0hi88n

E D I T A N D I M P U T ATI 0 N S Y S T EM

USER DICTIONARY DIVISION-SOURCE LISTING

LINE NUMBER

7070

71

72 DEFINE-RECORD RECORDNAME= HOUSING-RECORDi 72

73 MAX-STORAGE=- 999 73

74 RLrRD-TYPE 11t NOTE AN LITERAL 74

267

71

P6T

268 DEFINE-RECORD RECORD-NAME= POPULATION-RECORD p68

P69MAX-STORAGE= 999269

270RECORD-TYPE= F1270

IEF2R5I JES2JOB04966SO0114 SYSOUT iEF2851 SYS8nO18T095553RAOOOAMCP12ARO000005 PASSED

= IEF2851 VOL qFP NOS TmPOn2 IEF285I wHKHDGMCONCORP12AERRORRECS KEPT

= IEF2851 VOL SEP NOS TMPO03 IEF3731 STEP LOAD START 800181001

IEF3741 STEP LOAD STOP 800181002 CPU OMIN 0165SEC SRR flMIN O25SEC VIRT iO00K SYS 3

STEP START TIME - 100155 STEP END TIME - 100257 STEP CPU TIME - 165 SECONDS STEP COMPLETION CODE - C 0

O TAPES - 0 DISKS - 0 PEGION (REQUESTED - 1000K USED - 1000K) IO COUNT 211

CARDS READ - 0 CARDS PUNCHED - 0 LINES PRINTE6 - 22

STEPNAME - LOAD CHARGE (MACHINE UNITS - 1198 DOLLARS- S-77 5TONUM9E 1

11

Concluding Remarks of System Modifications

Version 1 of the COBOL CONCOR language was notorious for abending without the generation of diagnostic messages Appendix I sets forth a test of some of these historical problems In eac) instance the system provided messagesand source program line references wLch enabled immediate identification of the processing difficulties However tests of the various commands revealed the general high performance character of the language in its newest version Under these circumstances the recommended modifica-ions should be considered as ones which will uniformly allow CONCOR to realize t-r potential of its design enhance its utility and moreover endure as a software product

12

IV THE ADEQUACY OF COBOL CONCOR DOCUMENTATION

The current documentation of CONCOR consists of a newly developed relativelyvoluminous systems manual a Diagnostic Message Guide and some loose-leaf inshystructions concerning the installation of the system on IBM computers Of these three documents the Diagnostic Message Guide is clearly complete and adequateand could bear no changes except to provide message number tabs to enable a programmer to more quickly locate the text of a message number An example of the superior format of this guide appears as Appendix J

Conspicuously absent from the list of systems documents is a Users Guide as originally specified in the ISPC enhancement proposal

A thorough assessment of the Systems Manual has been undertaken and indishycates with some reorganization and editing it could be made a more useful and usable document as itis well constituted informationally In its current form the Systems Manual can be best described as a working document a document which was not intended to teach the CONCOR language but centered upon an audishyence already familiar with previous CONCOR releases Further it is not asystems manual per se as some of its components and appendixes could more appropriately compose a users guide For example excerpts from the POPSTAN publications which review the basic editing concepts and contain the POPSTAN example problem should be deleted from the systems manual and would be more contextually effective in the Users Guide The EXECUTION-DIVISION chapter of the Systems Manual would especially benefit from general editing and reorganishyzation (Specifically the contents of pages 90-92 should be moved to page 63 -shythese pages explain the placement and purpose of the three permissible routines of the CONCOR language Had the author been granted more time he could have outlined additional specific modifications to bring existing documentation to a level of adequacy) However some guidelines stand out

1 Installation instruction and considerations should compose a chapter of the Systems Manual rather than a separate document (Hand-out materials covering installation were not adequate) Further a documentation scheme setting forth how CONCOR was installed on each computer and any modifications required during installation could be serially added to and become a permanent part of the installations systems manual If CONCOR is going to be used and if agencies are going to actively support it as a package installation limitations must be known and well-documented The prescribed format for this documentation should be set forth and any modifishycations to CONCOR should be continuously updated on these shcets so that a history of changes to the language will be readily accessible to supporting agencies If CONCOR is to survive as a standardized system editing results must be capable of being replicated among installations Such a form could contain other information as well concerning the tyce of computer amount of care and nature of utilities supporting

13

users Upon installation a copy of this form could be

sent to the US agency which will ultimately be responsible

for supporting the CONCOR package

an appendix2 A complete COBOL CONCOR program should appear in

for reference

3 The development of the Users Guide should include an intensive

review of the editing concepts involved in processing census

data files beyond the POPSTAN materials

4 An explanation of the CONCOR benchmark program syould appear

in the Users Guide and the Systems Manual The running of a

supplied benchmark program should be a standard installation

protocol used to test all operational aspects of a new

installation

This tool which has5 The development of a CONCOR pocket card been proven useful in teaching and assisting programmers in

utilizing programming language lays out all commands options on

a single small card An example of such a pocket card is the

Such a small document is useful in clari-IBM Assembler Card fying the language and setting forth structural rules without

continual reference to full-size manuals

14

V CONCLUSION

In a situation of extreme need CONCOR could probably be utilized as it seems capable of performing its editing functiens Its usefulness as a data

cleaning tool is generally undisputed The importance of having CONCOR exshyhaustively tested by an independent agency cannot be over-emphasized in light of prior failures of the system to perform adequately Additionally core requirements and processing speed should be precisely determined One must be encouraged in the amount of progress which has been made in a relatively short amount of time towards the completion of the CONCOR package While it is clearly beyond the scope of this report to estimate the cost and the amount of time required to implement the herein outlined highly desirable changes to the CONCOR language and documentation and then exhaustively test the system it is believed that this process could be completed within a single quarter (The changes are of the nature that no structural redefinishytions of a systematic nature would be required) These enhancements would make the COBOL CONCOR package more complete consistent and reliable in the field ISPC has competently moved the system to the point that its comshypletion iswithin reach

Given the probable revision of the current Systems Reference Manual and the development of a suitable Users Guide document it is likely that this version of CONCOR could be taught to experienced programmers (unfamiliar with previous versions of the language) in as little as four weeks And while one is impressed by the potential power and overall simplicity of the COBOL CONCOR language it is difficult to envision individuals with no real computer programming experience utilizing its full capabilities Hence the importance of including a review of basic editing concepts in the documentation is undershylined

Further as documentation is often critical to the perceived ease of use of computer-based systems itwill be essential to have these various manuals examined independently prior to their general use

As previously mentioned Appendix G represents features of which the ISPC feels could have positive impact on CONCOR Depending upon the outcome of the testing recommended in previous chapters and the availability of funding supporting agencies may desire to look again at an assembler (ALC) version of the language

Overall COBOL CONCOR remains an intermediate product yet to realize its potential which is thought considerable Software development of this nature is characteristically of a long-term evolutionary design In light of this reality agencies should be prepared to evaluate such projects success or failure in terms other than that of immediate utility It is the philosophy underlying the system not the system itself which will ultimately be exported

APPENDIX A

Bucen Enforcement Proposal

APPENDIX A

BUCEN IN-HOUSE REVIEW ENHANCEMENT PROPOSAL

1 Easy to use interrecord referencing

2 Improved output file capabilities

A provide overflow protection on WRITE command

B provide OUTPUT command to create corrected questionnaire file that matches IP file data dictionary

3 Improvededit statistics reported (LISTERR)

A provide automatic (user-specified) area break

B provide options for compilation and displaying edit statistics at various levels

C provide automatic (user-specified) tolerance checking of error rates by area

D automatically capture IDs of areas failing tolerance check

4 Clean up known bugs in code

5 Comprehensive testing

6 Clean up and enhance documentation

A reference manual more examples error message guide

B installation guide

C systems manual

APPENDIX B

EVALUATIVE CRITERIA

APPEiUIX B [4 si 11 T q tII

UNITED STATES GOVERNMENshy

Pfern0 KGa 7 dll TO DSPOPFPSD Robert H Haladay

DATE December 3 1979

DSPOPDEIO Liliane Floge

SUBJECT Criteria for Evaluation of SuCen COBOL CONCOR September 30thVersion as presented at January 1930 orkshop

The following points should be considered when evaluatingCONCOR the COBOLediting system during participation at the January 19380 workshyshop

1 Are the Installation anual the System ilanual and theUsers anual ready for use Are these manuals adequate fortheir purposes Are the manuals clear and consistentthey be understood by subject-matter personnel Can

as well program ers

as

2 Is the editing system capable of impleaenting all thefeatures as described in the documenit_tion 3 Does the system appc3Ar to be ely debugged andtested and ready for release to dLvlooing conris 4 Does the system appear to be fast enough to edit a

census in a reasonable airount of time 5 hat size core does tine sys t- equire

6 CorFm-ent in general on coole te-nss usefulness andefficiency of ir-nease usethe n of by developingcountry prograrrers and other c nsus personnel

cc Su-a-- e Olds APiA SDi OrWJulio Ortuizar luotolaCelta Systems

APPENDIX C

WORKSHOP ITINERARY

APPENDIX C

CONCOR Workshop Schedule January 7-18 1980

U S Bureau of the Census International Statistical Programs Center

Scuderi Building Room 209 4235 28th Avenue Marlow Heights Maryland

Monday January 7

930 am shy 1000 Welcoming Remarks

Overview of Workshop

1000 - 1000 Introduction to CONCOR - Purpose and function

- History of development - General computer

requirements

115 - 200 System Description

-Constraints in design of CONCOR

-Basic subsystems of CONCOR

-User interactions with system

-Examples of outputs produced

1030 shy 1045 Break

1045 - 1200 Editing Concepts - Ways to interrogate

data - Ways to correct data

- Editing housing and

population data - POPSTAN

- Advantages of CONCOR

1200 shy 115 pm Break

200 - 230

230 - 245

245 - 325

User Program Organization -Divisions - Sections

- Routines - Commands

Break

Command Language

Description -Types of statements -Format -Syntax

is

2

Tuesday January 8

Dictionary Division Command Statements

115 pm-2 15 Input-Record-Section930 am-10

30 Punctuation - Define-RecordInput data referencing

Working-Data-SectionWorking data declara-- New-Datations - Array-DataInternal data represen-

tation and storage Break215 - 230

1030 - 1045 Break 230 - 325 Dictionary Examples

- Minimum dictionary1045 - 1200 Dictionary-Attributes-

structureSection - Maximum dictionary- Dictionary-Name

structure

- Hand out dictionaryFile-Section problem- Input-File

- Output-File - Write-File

- Error-File

1200 - 115 pm Break

Wednesday January 9

115 pm-2 1 5 Execution Division Command

930 am-1030 Discussion of Partici-Statements (continued)pants Dictionary

- Routines of Edit-Specishyproblems fications-Section

Free work time -Prolog-Routine -Filter-Routine -Epilog-Routine

- Types and functions 1030 - 1045 Break

of edit specification

commands1045 - 1200 Execution Division

Command Statements - Range- Punctuation - Assert- Subscripting

- Internal Identifiers - Report-Control-Section

-Generate-Edit-Statistics 215 - 230 Break-Count-Imputes

-Examples 230 - 325 - PassFail clauses

- List1200 - 115 pm Break

3

Thursday January 10

930 am-1030 Discussion of Problems - If115 pm-215

Free work time - UntilExit - Stop

1030 - 1045 Break 215 - 230 Break

1045 shy 1200 Edit Specification Command Statements 230 - 325 - Drecode

(continued) - Grecode

- Allocate - Update - Let

1200 - 115pm Break

Friday January 11

930 am-1030 Edit Specification 115 pm-3 25 Hand out Edit Specishyfications as problemCommand Statements

(continued) Free work time

- Output - Write

1030 - 1045 Break

1045 - 1200 Report Division Command Statements - Display-Control-

Section -Display-Edit-Statistics

- Tolerance-Control-Section -Error-Rate-Check -Reject-File

-Report Examples

1200 - 115 pm Break

I

29

Monday January 14

930 am-1030 Discuss procedures for running problems on computer

1030-1045 Break

1045-1200 Component Programs of the CONCOR system

1200- 115 pm Break

Tuesday January 15

930 am - 325 pm Free work time

Wednesday January 16

930 am 1200 Free work time

1200- 115 pm Break

115 pm-215 How to Install CONCOR on IBM 360370 OS

215- 230 Break

230-325 Free work time

Thursday January 17

930 am-325 Free work time

115 pm-215 Future CONCOR Design Considerations - for census processing - for survey processing

- manual correction system

215- 230 Break

230 - 245 Evaluation Guidelines

- Hand out evaluation forms

245 - 325 Free work time

Friday January 18

930 am-1030 Free work time 115-325 Free work time

1030 - 1045 Break

1045 - 1200 Group Discussion on CONCOR - submission of written evaluations Distribution of IBM installation tapes Presentation of Certificates to Participants

1200-115 pm Break

APPENDIX D

PARTICIPANTS

APPENDIX D

CONCOR WORKSHOP January 7 - 18 1980 Marlow Heights MD

PARTICIPANTS

KENYA

James Midianga Government Computer Centre Central Bureau of Statistics PO Box 30266 Nairobi Kenya

PANAMA

Omar Rivera Rios Data Processing Specialist Contraloria General de la Republica de Panama Apartado Postal 5213 Panama 5 Panama

PHILIPPINES Guida T Capellan OIC Computer Programming Division National Census and Statistics Office Solicarel Bldg 1 Magsaysay Blvd

Sta Mesa Manila Philippines

THAILAND

Angsumal Sunalai National Statistical Office Bangkok 1 Thailand

EGYPT

Farag Sedky Mourad Ghaleb CAPMAS (Centeral Agency for Public

Mobilization and Statistics)NASR City Cairo Egypt

L

2 31

SAUDI ARABIA

Shargi R Al-Shargi National Computer Center PO Box 2534 Riyadh Saudi Arabia

Ahmed S Al-Ghamdi Dept Director of National Computer Center PO Box 2534 Riyadh Saudi Arabia

OTHER

Robert W ONeal USREPJECOR APO New York NY 09038

John N Adams Data Processing Advisor DACCA Department of State Washington DC 20520

OR co UNDP PO Box 224 Dacca Bangladesh

Joe Quasney US Census Bureau Washington DC 20233

Howard Brunsman 5715 N Ninth St Arlington VA 22205

Larry Shiller and Julio Ortuzar Delta Systems Consultants Inc 264 Alhambra Circle Coral Cables FL 33134

Nicholas Ourusoff Burpee Hill Road New London NH 03257 (United Nations)

Frederic J Grant IV Systems Development GeorgiaWorld Congress Institute 580 North Omni International Atlanta Georgia 30303

3 32

Rafael Samper Mary Jo Keenan Catherine B Gleason LACDR Room 2246 NS Washington DC 20523

John Marshall AIDSERDMPSE Rm 706C SA-12 Washington DC 20523

Susanne Bacon and Mary K Friday ISPCData Processing US Census Bureau Washington DC 20233

Leo Dougherty ISPCWorld Census Staff US Bureau of the Census Washington DC 20233

33

CON C OR WORKSHOP

January 7-18 1979 Washington DC

Staff

Robert R Bair

Luis Garcia

David Malkovsky

Sandra Mansfield

Selma Sawaya

Vivian Toro

APPENDIX E

CONCOR EVALUATION FORM

0

APPENDIX E

CONCOR WORKSHOP January 16 1980

BASIC COBOL VERSION 2

December 1979 Release

Participant Evaluation of Package

1 What is your impression of this version of CONCOR in terms of its usefulness and reliability for editing housing and population census data in less developed countries

2 If you are familiar with previous versions of CONCOR how does this current version compare to them

2D

35

2

3 How would you compare the utility of this version of CONCOR for less developed countries with other systems for editing data of which you are familiar

4 How well do you feel the documentation (Reference Manual and Diagnostic Messages Guide) supports the software

36

3

5 How well do you think this version of CONCOR would serve for editing other types of statistical censuses or surveys

6 If your organization does statistical data processing by computer would you recommend to your superiors that this software be used to edit the data Do you feel assistance would be required in the installation of this software on your organizations computer andor in training userR on the packages applicability to their work

37

7 In what way(s) could improvements or enhancements be made to this version

of the CONCOR software andor its documentation

8 Other comments

1shy

APPENDIX F

NEWOLD COMMAND COMPARISONS

0 0

OLD CONCOR VERSION 1 DECEMBER 1978

DATA-DICTIONARY

DEFINE INPUT-FILE

DEFINE ERROR-FILE

DEFINE COMMON-DATA

(contains the questionnaire-IDand record type location information)

DEFINE RECORD-TYPE=

DEFINE NEW-DATA

DEFINE ARRAY-DATA

APPENDIX F

NEW CONCOR VERSION 2 DECEMBER 1979

DICTIONARY-DIVISION

DICTIONARY-ATTRIBUTES-SECTION

DICTIONARY-NAME

FILE-SECTION

INPUT-FILE OUTPUT-FILE WRITE-FILE ERROR-FILE

DOENTIFICATION-CONTROL-SECTION

AREA-CONTROL

QUESTIONNAIRE-CONTROL

RECORD-CONTROL

INPUT-RECORD-SECTION

DEFINE-RECORD

WORKING-DATA-SECTION

NEW-DATA ARRAY-DATA

END-DiVISION

COMMENTS

In Version 1 all command keyword labels had to begin in column 1 ofthe coding sheet The position notation of Version 2 permits thecommand to begin in columns 1-72

The period preceding a statement

signifies that it is but a commentcard Accordingly it should be noted that while the END-DIVISION command must be present the division name identifier DICTIONARY-DIVISION

has not been implemented

Note Some parameters for each commandhave been altered even where general correspondence exists

OLD CONCOR VERSION 1 DECEMBER 1978

EDIT-SPECIFICATION

PROLOG

FILTER TYPE ( )

EPILOG

ALLOCATE (ALL) ASSERT (AST)

END ERROR

FILTER TYPE ( ) WITH

IF LET

RANGE (RNG) RECODE (REC) STOP --keyword

UPDATE (UPD) WRITE (WRT) XRECODE (XREC)

NEW CONCOR VERSION 2 DECEMBER 1979

EXECUTION-DIVISION

REPORT-CONTROL-SECTION

COUNT-IMPUTES GENERATE-EDIT-STATISTICS

EDIT-SPECIFICATION-SECTION

PROLOG-ROUTINE

FILTER-ROUTINE (name-of-record-type)

EPILOG-ROUTINE

ALLOCATE (ALLOC) ASSERT (ASRT) DRECODE (DRCD) END-DIVISION

EXIT

GRECODE (GRCD) IF LET LIST OUTPUT RANGE (RNG)

STOP UNTIL UPDATE (UPD) WRITE (WRT)

END-DIVISION

COMMENTS

Periods preceding division and section names signify that they are treated as comments by CONCOR

END-DIVISION signifies the termination of the CONCOR division organization and must always be present even though division identifiers have not been implemented

This figure additionally permits a comparison of the EDIT-SPECI-FICATION commands of both CONCOR language versions Note the changes in abbreviations and that some keyword options (not shown) have been altered even where general corresponshydence exists Individual commands are discussed on the following pages

CONCOR LANGUAGE COMMAND STATEMENTS

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

ALLOCATE (ALL) ALLOCATE (ALLOC) Allows for the assignment of values and with the UPDATE command providesthe imputation mechanism Major difference is this statement now providesfor a receiving-identifier list and statistics are generated as coded inthe GENERATE-EDIT-STATISTICS command option of the EXECUTION-DIVISION

ASSERT fAST) ASSERT (ASRT) The ASSERT command is the basic consistency editor command of CONCOR Command now provides for a NOERROR and MESSAGE option Additionallyby utilizinq the DUMPR DUMPO keywords provides the user with the option of using all the current value of all user-identifiers defined for the record type being processed Other changes to this command include the pass end-pass fail end-fail constructions for the P purpose of maintaining structured logic

(see RECODE) DRECODE (DRCD) Provides a method for recoding data items having a starting value of zero and continue in ascending sequence Replaces t previous XRECODE command

EXIT Provides the means to terminate EXECUTION (ESCAPE) of command statements within the UNTIL command

END END-IF END-THEN

A solitary END command is no lonqer permitted in conditional construction Note the similarity to COBOL pseudocode

END-ELSE

ERROR not implemented No longer supported in language

(continued)

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

FILTER TYPE ( ) WITH not implemented No longer are option statements supported in language This command once specified essentially a GOTO depending on construction

(see XRECODE) GRECODE (GRCD) This new command provides a method for the group recoding of input values

IF IF The IF statement has properties found virtually in all programming languages ie provides a means of evaluating a condition and controlling the execution of alternative logical statements Similar to pseudocode a CONCOR programmer must utilize the structured command IF THEN END-THEN ELSE END-ELSE construction in place of the IF THEN END ELSE END statements acceptable to the 1978 version

LET LET Virtually unchanged

LIST New command which allows for the generation of specific messages to be displayed in the report of edit statistics by questionnaire as well as specific individual identifiers

OUTPUT New command which provides the means by which records will be written to the output-file

(continuej)

4)bull

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 VERSION 1

RANGE (RNG)

RECODE (REC)

STOP- keyword

UPDATE (UPD)

(continued)

DECEMBER 1979 VERSION 2

RANGE (RNG)

name changed

STOP- keyword

UNTIL DO END-DO

UPDATE (UPD)

COMMENTS

As a basic editing command of CONCOR the December 1979 version permits the listing of the out-of-range values as well as the entire record or questionnaire through DUMPR and DUMPO optionswithin the command Another new option of this command involves the utilization of a PASS END-PASS FAIL END-FAIL construction permitting the specification of logicalcommand paths depending upon the outcome of the range test

Name changed to DRECODE Virtually identical with the DRC command in the COCENTS system

Unchanged

DO-END-DO one of the most significant and powerful enhancements to the CONCOR language this command provides a loopingcapability similar to most other high-level pogramminglanguages The number of iterations is under user control as well as the value initially assigned to the user identifier (counter) which is incremented with each successive execution of the command statements

Virtually unchanged it is the basic mechanism for maintaining arrays involved in imputation processes

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

WRITE (WRT) WRITE (WRT) The WRITE language statement while once utilized as the primary output command is now used to create an auxiliary or derivative file Statistics may be accumulated with the specification of the GENERATE-EDIT-STATISTICS option of the EXECUTION-DIVISION (See WRITE-FILE of DICTIONARY-DIVISION)

XRECODE (XREC) name changed Old command which has become the GRECODE command in the December 1979 version

0 0 0

DECEMBER 1978 VERSION 1

DECEMBER 1979 VERSION 2

Not implemented REPORT-DIViSION

DISPLAY-CONTROL-SECTION

DISPLAY-EDIT-STATISTICS

TOLERANCE-CONTROL-SECTION

ERROR-RATE-CHECK REJECT-FILE

END-DIVISION

COMMENTS

This new division (note division and section names treated as comments) provides the means to controlthe type of edit statistics reports to be generated Generally all reports produced by the commands ofthis division are organized according to the AREA-CONTROL command of the DICTIONARY-DIVISION Notethat this means the input data file must be preproshycessed (presorted external to CONCOR) on the basisof the AREA-CONTROL data field as CONCOR is incapableof merging uncontiguous records

There is currently no command to enable users to specify unique report headings on the listingfrom this division

APPENDIX G

ISPC FUTURE ENHANCEMENT LIST

G-1

Appendix G

Future Versions of CONCOR Design Considerations

The present version of the CONCOR system (Basic COBOL Version 2) was designed to provide adequate computing power to properly edit housing and population census da a using automatic correction techniquns As a result CONCOR should not be construed to be the ultimate software package for generalized data editing The developers of this version of CONCOR realize that there is a group of improvements that can be made to the system which would facilitate the census data editing process A second group of changes or enhancements also exist that could facilitate the editing of survey or other types of census data It should be noted that some of the modifications to meet these two objectives at times are mutually exclusive Some of the possible modifications and considerations are outlined below

1 Housing and Population Census Processing

1 1 Software

1111 Dictionary Division

(1) Implementation of the Dictionary Attributes Section This would allow the user to specify the dictionary size (number of pages) on disk and control other file characteristics such as volume specification and retention periods if applicable

(2) Addition of an optional input andor output file(s) which could be used to initialize and save values stored in arrays and used during the allocation process This feature would require additional syntax analysis and the creation of new commands to perform the movement of the data values

(3) An output record format area (Output Record Section) could be added This would allow the creation of edited output files in a different format and length than the file read into CONCOR This implementation would require other enhanceients to provide a simple means of moving values from th-iT i tato tile output file Unique naming of user-identifiers would also need to be addressed Another possiblity is the declaration of both the input and output location in the same command when specifying the contents of tile data record to be edited

(4) Addition of user-identifier naires to fields used in the controlling of summary statistics and unique questionnaire determination This would require modification to the system level data dictionary

46 G-2

(5) Increasing the number of area and questionnaire control fields

This implementation would require modification lo the

dictionary as tell as to the error records passed to the Report

Division Another problem would be the formatting of the extra

data values in the report headings

(6) Permit the mixing of difFerent data item types in the CONTROL

commands This would allow the greatest flexibilty in choosing

control fields In order to implement this enhancement the

dictionary and the generated COBOL program (EDITOR) would

require modification

(7) Give the user access to the values in the control fields during

the editing process Either the user would be prohibited from

changing these values (as is currently done with CONCOR

internal identifiers) to prevent the creation of new questionnaires or the summary information gathered on a

questionnaire basis would become meaningless Another

ramification of this enhancement is how to handle fields defined as alphanumeric as CONCOR internally works in binary

(8) Allow input data records with a similar format to be defined by

the same DEFINE-RECORD command statement This could be

accomplished by permitting multiple values in the RECORD-TYPE

clause of the DEFINE-RECORD command A possible problem here

is in the determination of which value to be move to the output

record when the record is to be written out using the OUTPUT

command 0

(9) Implement a COMMION-DATA concept to handle items command to all

record types This could be costly in terms of core storage

and additional conversion time if a storage area is allocated

for each identifier for each record type Only permitting one

common area for all records would require validation of the

data to ensure the values were exactly the same on each record

This wJould require the CONCOR program to perform many more

internal checks when reading the data file into the store area

(10) Permit the selective inclusion of data items found in the

DEFINE-RECORD command statement into the universe count used

for the determination of tolerance levels when using the COUNT-IIPUTES command statement

(11) Increase the number and types of data format referencing now

permitted External numeric data (type N) could be expanded to

handle 18 digit numbers Also signed numeric and packed

decimal data format could be added The signed numeric format

would allow both leading blanks and negative data values This

format is essential ihen dealing with data files generated by

FORTRAN programs

0

+

G-3 47

(12) Increase the number of comparison strings permitted for alphanumeric data items This would allow for easier processing of alphanumeric strings but would require modifications to the system data dictionary

(13) If a range of valid values were specified with the declaration of a user-identifier in the DEFINE-RECORD command an automatic range check could be performed prior to CONCOR giving the user access to the data items on the questionnaire The inclusion of such values are now done by many users in the form of comment statements This enhancement would also facilitate the use of the dictionary by tabulation software

(14) The language format used in the Dictionary Division could be modified to recognize both the space and comma character as valid delimiters The user would still be required to code two commas to receive the system default value for any parameter

(15) Provide a repetition factor in the initialization of arrays

(16) Print the data dictionary name in the titles of the dictionary documentation or provide a means by which the user may specify a title to appear in the listings

112 Execution Division

(M) Implement the Run Control Section This would allow the user to increase execution speed of the EDITOR program This would be accomplished by eliminating some of the system protection features such as division by zero and out of bounds checking In this secion the user could also specify new maximum values for the EDITOR error limits

(2) Provide an internal identifier (OCCURRENCE-PTR) which would contain the occurrence number of the record currently being filtered The only problem to resolve is how to treat this id-ntifier outside of the Filter Routines

(3) Implement a CREATE commnand This command would allow the user to create a record that Was not found on the input file This implementation would require special care as it gives the user the ability to change the valme of a CONCOR internal identifier (TYPE-CGUNT) and modify the contents of the store Another consequence of this command is what action is to be taken in the calculation of tolerance in terms of this new records inclusion into the universe of observations and number of changes

iK

48 G-4

(4) The code generated in the EDITOR program for the DRECODE

command should be modified to utilized a lookup table approach

This would decrease the execution time of the command but woul C

require additional communication between the GENED and GENDD

programs

(5) Permit the user to continue the message text field over

successive lines of code

(6) Implementation of a SET command command that would change

CONCORs default occurrence pointer for the duration of the SET

command This command would be ever helpful as the validation

of the existence of a record would only have to be done once

when the SET command was encounterd The major drawback to

this command is in the users ability to remember the action

taken during the last execution of the command when the

Execution Division command statements may cover many pages of

code

(7) Implement a method of specifying different data formats for

fields on the WRITE command statement This would give greater

flexibility to the user but would require major revision to the

routines now used to process the WRITE command

(8) Provide the user with two sets of internal identifiers The

first set currently exists in the system and is valid on the

basis of the current control area (if specified) being

executed The second set would be accumulated on a run total

basis and would require the creation of new CONCOR internal

ident ifiers

(9) Implement automatic array declarations for capturing allocation

frequency distributions This could be done by the system for

discrete values (especially if point number 13 above is

implemented) but a mechanism for handling continuous values

would need to be designed nother program would then be

required to format the information gathered into a readable

form

113 Report Division

(I) Provide a means by which the user may specify their own headings or titles -for the reports

(2) Give the user the ability to override page ejection as a means

of saving paper

(3) Provide a neans by which the statistics gathered may be

presented in other aqgregations other than the area-break and

total levels now pro ided This would require another

procedure to aggregate the statistics and pass them onto the

Report Division before the printing of the reports began Thi

extra procedure is required because of program size

cons i derat ions

G-5

49

114 System Level

(1) Generation of assembler (ALC) language code for the EDITOR program to be used on the IBM 360370 series type computers

(2) Modification of the GENSRC program i utilize a lookup table instead of the linear search technioa2 now used

(3) Modification of the RDMSGTXT program to look at continuation lines when searching for the text to be supplied by the system

(4) Hlodification to the system data dictionary to better arrange and make available information needed most frequently Also to examine the possibility of making the section dealing with the message text and report generation a separate file Investigate the industry standards for dictionary format and see how CONCOR meets the requirements set forth

(5) Make the parsing and syntacial routines interactive so as to increase the product ity of the CONCOR user This does not mean to imply that the editing process itself should be made interactive but rather only the development of the user code should be put online

12 Documentation

(1)The development of a self-teaching CONCOR manual

(2) The development of a CONCOR Users Guide

G-6

0

50

2 Survey or other Census Processing

21 Software

(1) Make optional the specification of the record type and questionnaire identification information as some files are not

hierarchical in nature

(2) Provide another input file as a means of doing a check-in procedure of sample cases expected This new input file would contain the master list of those oberservations selected to be

examined

(3) Allow floating point calculations

(4) Provide a mechanism for referencing repetitive sections of questionnaire This referencing (loop letter approach) could be accomplished in the same fashion as interrecord checking is now implemented Note that by using this approach two versions of the software would be required One version would be as CONCOR is now implemented and the second would accept flat data files where the subscript indicated a repetitive section on the survey document

(5) Add a weighting scheme to allow meaningful totals and summary

statistics based upon the sampling frame used to gather the data being edited

(6) Increase the number of user-identifiers allowed in the system because of the more detailed information found on survey forms

(7) Provide a means of manual correction of erroneous data items This correction scheme would utilize the data dictionary and allow for correction based upon the user-identifer name and

questionnaire identification CELADE has a COBOL version of

this program but it has not been finished nor tested

42

APPENDIX H

CONCOR SYSTEM INTERNAL VARIABLES

0 APPENO H

CONCOR SYSTEM INTERNAL VARIABLES

1 EOF-FLAG

2 ERROR-TN-QUESTIONNAIRE

3 CURRENT-POINTER-VALUE

4 TYPE-COUNT ( )

5 RECORD-COUNT

6 QUESTIONNAIRE-COUNT

7 RECORDS-IN-STORE

8 INVALID-RECORD-TYPE-COUNT

9 INCOMPLETE-FLAG

10 CONTINUATION-FLAG

11 INVALID-RECORD-FLAG

EOF-FLAG

ERRORS-IN-QUESTIONNAIRE

TYPE-COUNT ( )

IN-RECORD-COUNT

IN-QUESTIONNAIRE-COUNT

RECORDS-IN-STORE

INVALID-RECORD-TYPE-COUNT

INCOMPLETE-FLAG

CONTINUATION-FLAG

OUT-OF-RANGE

NOT-NUMBER

BLANK

(new internal counters) OUT-RECORD-COUNT OUTPUT-QUEST-COUNT WRITE-RECORD-COUNT

Change in spelling of variable

Not mentioned--possible ommission from manual

Not implemented

Note Current documentation equivocates when some variables

are reset--most are initialized with each control area break

Though implemented as reserved identifiers due to poor documentation of (old) CONCOR exact values and utility were unknown

These apparently new counters permit access to valuable totals within control areas

Note These are not cumulative counters Also outputshyquest-count violates naming conversion eg in out

APPENDIX I

CONCOR-EDITOR EXECUTION STATISTICS

C 0 N C o R - E D I T O R EXECUT ION S-TA-T1 ST I CS RUN DATEt 517I

APPENDIX I

CONTROL AREA 0 INPUTS RELD OUTPUTS BY OUTPUTCOHMAND OUTPUTS BY W0ITECOMP4JD NO VALI 0 SEQUENCE NUMBER 00000 00 0 RECTYPES 0

OUESTIONNAIRES RECORDS 0 QUEST KEYWORD- 0 ECORDS- RECORDS FORQUFSTshy

bullDIVISION BY ZERO LINE NUMBER i18 to DIVISION BY ZERO LINE NUMBER 118 bo DIVISION BY ZERO 00 LINE NUMBER 118

Oboo DIVISION BY ZERO 00 LINE NUMBER 118 LTNE |UJMP 000 DIVISION BY ZERO 000 LINE NUMBER 118

bo DIVISION BY ZERO 0 LINE NUMBER 118 00 RUN ABORTED DUE TO DIVISION BY ZERO ERRORS 00 1it LIST IR003 = Ro3t

1 6 42

112 LIST

Comments One of the most sever criticisms 114 LIST IDIVISIION BY ZErO ERROR Cw FOLLOWS of the old CONCOR package was that ]l LET RO01 003 = O it would cease processing for no

apparent reason--ABEND without generating any messages The most )16 LIST iRO01 R003 SET TO ZEROS Pou19 Ron3 conmon types of cessations are division by zero array coordinate 117 LIST RO0 0 = errors and user specified terminations A program to deliberately 118 LET P003 = R002 RO01 test CONCORS ability to providediagnostics was written The system 119 LIST LIST R003 R002RO01 =1 R0031 perfoned exceptionally well and correctly provided the line numbers in the 120 source program which caused the data exceptions The program text has been 121 superimposed upon the Execution Statistics page for illustration purposes 122 END TEST OF OPERANDS AND MSG

123

124

1 5 PEGISTERS RESET TO ZERO

125 LET R001 RO00 R003 = nt

0 N C 1 - Z D I T 0 it - ( r S T A T I S T I C S QJi DAT 0119

I ~ - ~APPENDIX b)APPENDIX TL I A Cr14A) 01 3TUTS JRtTF -0 VtLIQ AP-A I PUT R - 3ITD JTS LY UTPUT - Y C

I)ESTIjAAIES - PECLRW cU] I S T igry-t QEC UP )S irrkqS F] ) T

VLLII= FF= 15( C 1 u U-1 Tr JJ TI A)[ 9 )i 1= TI =WATE ) L i - U = VI L IJ) L T J3r-= I n

D 1 4A 1 EItR P IXSTI J A 1 L = 7 II I fIAT E I i Y C

= y 4 I 19f CZ U AT -Z-2j 1ESTIJh14 E CJ)PI) IATE= 2 V L 1E= L T I P1 =UM 11

= rU F r

F R L- -- I JE TI rjA OrPR eTE= rUMiF=E II V1 LIIF= LIT V M

tYfZU ttj gR R QUIESTI3Ii1 j Vi r IATE= LI n l FJw== 0R V J L T C ( -pJIT E~Q - cUESTT Y2ZjI 1= g rfiflUI)IATL= VeLUJE= P LIf PP= 1 3)

bull~ - A ED 011 1 Tn CCy -tENCE C rGR DI T FOR UP S )r 0 JCCUJ0EICF FJ71 ECGPO

173

175 UNTIL ROO1 gt 10 VARY P001 FROM 1 By Comments

In this example RO01 a 2 dimensional array 177 DO with 10 rows and 10 columns is attempting

178 UNTIL R002 - 10 VARY BO FROM 1 BY 1 to update the smaller (5 X 5) TA2 Nonshy- Uexistent element references caused the error

The program text has been superimposed on this

page for illustration purposes

180 ALLOCATE NOOP = TA1 (RO0] ROO)I

JAI UPDATE TA2 (RO01 R002) RO031

182 LIST N002 = R0037 NOO R003

1P3 LIST VALUES OF TA1 = RO03v TAl (RO01 R002)

184 LIST

185 LIST ITA2=TA2 RO01 Rn02)l

l6 END-DOI LIST

187 END-DO

deg r- -- shy

APPENJDIXI1 J( j

CONTROL AREA INPUTS READ OUTPUTS BY OUTPUT LOMANO OUTPUTS TEltCOMMNDfN

SEQUENCE NUMBER 11T A QUESTIONNAIRES RECORDS QUEST KEYWORD RECORDS SFf- 0 T rr-

JOB STOPPED DUETO USER STOP-RUN COMMAND ON LINC 215 - - -- 2 15 _ _ 4$ A

EDITOR -- NO RMAL END OF J B r T I1)

214 IF IN-RECORD-COUNT gt Comments In this example when the internal variable1215 ITHEN STOP-RUN END-THEN was than

215 TE END-THEN IN-RECORD-COUNT was greater than 500 processing was directed by the source COBOL CONCOR language

216 1 program to cease The program text has been - upon this page for illustrationI _-EC superimposed

S IF IN-RECORD-COUNT- lt 2 - - purposes ~~~~~~~~~~- bull - - ------I-

218~ THEN -LIST lt(THAN 20RECS INR

amp-~~~~~ ntai-beDcent

I

APPENDIX J

DIAGNOSTIC MEFSAGE GUIDE EXAMPLE

APPENDIX J DD-2

WARNING(DD-001) BLANK COMMAND STATEMENT LINE

EXPLANATION A blank command statement line was found in the user Dictionary Division command statement file

ACTION TAKEN Parsing continued with the next command statement line in the Dictionary Division command file

USER RESPONSE Put a comment indicator () in column 1 of the command statement line if spacing is desired if not remove the command statement line

ERROR (DD-002) ILLEGAL FIRST CHARACTER IN STRING

EXPLANATION A character not in the CONCOR legal character set (A-Z 0-9 - + the comma character () the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found as the first character in the string

ACTION TAKEN Parsing began again with the next string

USER RESPONSE Check for possible keying errors and ensure that the first character is a member of the valid CONCOR

character set

ERROR (DD-003) END OF LINE REACHED WITH NO DELIMITER FOR ILLEGAL STRING

EXPLANATION A string starting with a character not valid in the CONCOR legal character set (A-Z 0-9 - + the comma character ()p the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found without one of the delimiter characters ( =) when the end of the command line was reached

ACTION TAKEN Parsing began again with the next user command statement line in the Dictionary Division command file

USER RESPONSE Check for possible keying errors and ensure that each of the characters in the string is a member of the CONCOR valid character set (A-Z 0-9 - + the comma character () he delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ())

ERROR (DD-004) COMMENT INDICATOR () IN STRING

EXPLANATION The comment indicator character () was found embedded in a string

Page 4: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …

0

EXECUTIVE SUMMARY

The December 1979 COBOL CONCOR (Version 2) is a much improved software package All commands appear to be functional however the system should be exhaustively tested by an independent agency prior to its general release This agency Should also precisely determine the systems relative speed and core processor requirements While the system (exclusive of documentation) could immediately be utilized in a situation of extreme need some CONCOR language coding inconsistencies detract from the learnability and exportashybility of the package and should be corrected Additionally there are other modifications or adjustments which would enhance the overall utility and productivity of the language in census and survey applications

System documentation continu~s to be a problem There is no Users Guide The Systems Manual though well-constituted informationally should be thoroughly reorganized in accordance with the guidelines set forth in this report

The staff of ISPC exhibited competence and professionalism in the conduct of the two-week workshop January 7-18 1980 ISPC generally is aware of both the potential and shortcomings of the CONCOR project The current CONCOR version makes a significant redesign of the overall system As a package it is in a state where its completion is within reach

I BACKGROUND

CONCOR (an acronym of Consistency and Correction) is best characterized as a software tool designed to expedite the processing of data files duringthe edit and imputation phase of population census and surveys As a metashycompiler written in the COBOL language the system reads and verifies CONCOR language statements to produce an executable EDITOR program The objectiveof this process is the creation of an error-free file which can be used at a later time for tabulation purposes

Since its release as Version 1 December 30 1978 numerous elements of the COBOL CONCOR system have undergone continual change and redefinition Infact the system has not been permitted to stand still for any period of time nor has it been exhaustively tested In June of 1979 ISPC suspendedthe further distribution of the COBOL CONCOR system This decision was based principally upon reports of the packages unsatisfactory performance at workshops held in Panama and Thailand ISPC upon their own initiative developed a proposal to overhaul CONCOR and its accompanying documentation This proposal is contained in Appendix A representing an ambitious undertakingWhile not all of the desired changes and capabilities could be implementedVersion 2 of December 1979 represents a significant managerial effort The questions are now whether COBOL CONCOR Version 2 will be a demonstrably adeshyquate sofrvare package -- a package capable of exportation to developingcountries -- a package requiring no further modification The purpose of this report is to address these critical issues In connection with this Appendix B sets forth the specific criteria around which such a discussion must evolve As this is not intended to be a compendium some of these broader issues will be immediately treated following chapters and appendiceswill qualify the exact nature of system altu ations already undertaken as well as further adjustments believed to be essential in realizing the goals of the systems philosophy

Workshop

During the period of January 7-18 1930 a workshop was held under the sponsorship of the ISPC to demonstrate the capabilities of the latest reshyvision of the COBOL CONCOR software package A schedule of events of this workshop is contained in Appendix C This workshop was intended to provideparticipants with the opportunity to program in the CONCOR language and to thereby test aspects of the system as individually appropriate A listing of the participants and the international organizations they represented is

EDITOR is the new name of the EXECUTOR module of previous language versions

A complete history of the development of COBOL CONCOR can be found in both the ACCENTER 1978 Version 1 and December 1979 Version 2 systemsmanuals as well as in previous consulting reports

2

contained in Appendix D During the concluding days of the workshop each participant was asked by ISPC to provide a written evaluation of the now-called December 1979 version of CONCOR This evaluation form Appendix E also inshycludes space for comments concerning the competence of the system documentation as well as any additional comments including these regarding the organization and clarity of workshop presentations It is assumed that in the near future summaries of these comments will be available to interested agencies

While virtually all instructional aspects of this two-week workshop were conducted in a highly professional manner -- a manner which revealed a high degree of coordination among staff members in their efforts -- there are several areas which future workshops may improve upon

1 All publications should be assembled in their entirety and proof-read prior to distribution

2 A complete CONCOR language program example and accompanying 110 documents should be provided at the onset of the workshyshop for reference

3 Numerous short application programming problems involving all CONCOR language divisions should be utilized in place of a single lengthy problem

It is noted that this workshop was not intended to teach the CONCOR language as the organization and presentation of materials probably would have been different It is believed that the two-week time period was sufficient time to provide participants a familiarity with the use of the new CONCOR features especially in light of the fact that workshop participants were permitted to work weekends and beyond normal working hours at their disshycretion Though funding was not generally available it is known that several workshop members chose to extend their stay inWashington to continue testing the COMCOR package or to work on projects which they could attempt to immedishyately install on their home computers At the conclusion of the workshops participants were permitted to take with them an installation tape of CONCOR as well as all the other materials they had acquired during the course of the project

3

II THE ADEQUACY OF CONCOR

CONCOR has been described by its designers as an adequate packageAdequacy as an evaluative criteria is often relative to need and should not be confused with readiness as an issue The CONCOR system exclusive of documentation is sufficiently corplete that in a situation of extreme need it could be used as a data-cleaning tool in the editing and imputation phaseof census processing Less extreme circumstances would impose reticence on such an endorsement Though non-exhaustive tests indicate that CONCOR appearsto be capable of performing all of the commands as implemented because of the rapidness with which the system was rewritten it is thought that there has not been enough time to fully test all aspects of the project Thereforeprior to its general dissemination it is recommended that an independent agency conduct exhaustive tests to certify the integrity of the system proshygrams The importance of this certification cannot be understated in lightof previous workshop experiences Concurrent with this testing process the same agency should determine the relative speed and size of the system under actual production circumstances and further determine CONCORs ease of nstalshylation Later sections of this discussion set forth additional testing recom endations

It is generally recognized that of all the data-cleaning tools available for exportation CONCOR is potentially the most powerful especially with the addition of its new commands as outlined in Appendix F While its utility is not in doubt one must ask the question of how much more useful could CONCOR be if modified and would this additional utility be worth the costs involved The nature of modifications (excluding documentation) to COBOL CONCOR approprishyate at this time for cnsideration are threefold

1 Adjustments to the elements of the system which are internallyinconsistent or awkward to facilitate its learnability and usability am ig developing country programmers

a Implementation of the division headings DICTIONARY-DIVISION EXECUTION-DIVISION and REPORT-DIVISION De-emphasis of section headings

b Improvement of the consistency among data identifiers allow alphanumeric variables to be coded without mandatory comshyparison strings throughout the DATA-DIVISION and to be of the same length of numeric variables Permit numeric identifiers to be of an equal length to NEW DATA identishyfiers Permit the coding of single dimension row and column vectors in the same manner as multi-dimensional arrays

2 Implementation of selective commands and internal variables to facilitate the production environment use of CONCOR in census applications These include

4

a LOADUNLOAD arrays Commands which would save and replace automatically hot-decked values from batch to batch

b TOTAL-QUESTIONNAIRE-COUNT-RECORD-COUNT internal

variables independent of AREA CONTROL

3 Other modifications

a Default values for max-storage parameter set in realistic range

b Allowance of more variables for survey applications

Some of these modifications are part of what ISPC calls its wish list for the future development of CONCOR This document has been included in this report as Appendix G It is arguable that these features are essential to the completion of the CONCOR package While it is beyond the scope of this report to draw a conclusion in this area the enhancements as outlined above are ones that would make the language more internally consistent and thereby easier to learn and apply to a census data production environment These modifications are not arbitrary or cosmetic but are a direct result of handsshyon programming experience in the language as well as observations and disshycussions with other workshop participants While it is probably impossible to ever be satisfied with the overall structure of any programming language the resolution of this issue of completeness must be made relative to the objecshytives for developing the COBOL CONCOR system in the first place An explicit statement of these objectivEs has been absent in all systems documentation to date

5

III PROPOSED CHANGES TO THE LANGUAGE STRUCTURE

Based upon the assumption that it is the intent uf sponso-ing agenciesto optimize the COBOL CONCOR package -- a goal which is believed currentlyobtainable -- an understanding of the nature of these changes and how theywould impact users is essential Appendix F sets forth in a comparative manner differences between the old December 1978 and the new December 1979 editions of CONCOR Studying this appendix obviates the fact that while the new version of the language is clearly superior to the old in nearly everyaspect the basic and overall structure of the language is essentially unshychanged Compartmentalization of aspects of the language into divisions represents a significant ideological enhancement to the language Indeeddevelopment of programs by divisions proved to be an extremely useful way of understanding the nature of editing work to be performed However note that while the END-DIVISION comnand is essential to the language the division headings DICTIONARY-DIVISION EXECUTION-DIVISION and REPORT-DIVISION were not implemented and are therefore preceded by a period to be treated as comment lines in the program listing It is inconsistent to implement END-DIVISION commands while not implementing the division headings It is believed that this division structuring is important enough to the overall organizational structure of a CONCOR source language program that it should be implementedprior to general distribution The section headers shown on the figures inAppendix F however are another matter They are cumbersome and were generally not coded by workshop participants and they could be deleted from this version of the language altogether with little loss in organizational understanding The CONCOR language is significantly powerful to stand on its own as a distinct product and is not meant to be a COBOL imitation its present degree of development and specialization do not warrant the structural drag of additional section identifiers is the probable intent of the originalCONCOR project was to develop a package which was uncomplicated and unwieldy to use The question of division and section names implementation while seemingly cosmetic can have real impact on its perceived easiness of learning and use

Figure 1 on the following page illustrates common mistakes programmersmake coding numeric and alphanumeric variables in the DATA-DIVISION These mistakes are the result of the inconsistent variable formats For instancein the numeric data definition statement it is permissible to specify 19-23 where N signifies numeric 9 signifies the length of the item and 23 specifiesthe starting position in the record In NEW-DATA however it is possible to code an item with a maximum length of 18 While on the surface this inconshysistency would seem harmless typically some data defined user variables in NEW-DATA defined N18 could be moved inadvertently to output record fields defined by a data definition statement 119-23 such an action would result in a data error Under certain circumstances itwould be highly desirable to output these larger length values A similar circumstance exists between the numeric and alphanumeric data coding conventions While the maximum lengthof the numeric is permitted to be 9 in the data definition statement (18 in NEW-DATA) the maximum alphanumeric variable is permitted to be only 4 characters in length In the current systems manual it is recommended that

FIGURE 1

DICTIONARY-DIVISION

DICTIONARY-NAME DATA-CODING-EXAMPLE

INPUT-FILE

OUTPUT-FILE

AREA-CONTROL N2-2 N2-4 N3-6 N2-9 N9-23 QUESTIONNAIRE-CONTROL A4-2 A3-6 A2-9 A3-11 A3-14

RECORD-CONTROL Al-l

DEFINE-RECORD

HOI-TYPE-OF-HOUISING-UNIT Nl-17

H02-MATERIAL-OF-ROOF N1-19 10 9

H03-TOTAL-PERSONS-IN-UNIT N8-40 NOT-NUMERIC BLAIK

1104-STATE-OF-UIIII-CODE A4-50 0 U 1 D

DEFINE-RECORD

P01-SEX 1-13 W F

NEW-DATA

NOI-SAVE-TYPE-OF-HOUSING-UNIT

N02-SAVE-TYPE-OF-ROOF 1

N03-COUNT-TOTAL-IN-UNITS 10 0

N04-AGGREGATE-INCOME 18 0

END-DIVISION

Explanations

N2-4 This is an example of an external numeric input data item (N) with a length of 2 bytes starting in column 4 of the input record The maximumlength of this type of variable outside of NEW-DATA is 9 When coded in

NEW-DATA 18 is permitted

A4-2 This is an example of an external alphanumeric input data item (A)

with a length of 4 bytes starting in column 2 of the input record This

construction for alphanumeric variable is valid only in the control stateshyments Additionally it can never be over 4 bytes in length When alphshynumeric data fields are defined within record types the EDITOR program

requires that the comparison strings always be specified A maximum of 3 is permitted The purpose of these strings is to force recode the data to a numeric value If no match is found EDITOR automatically assigns a unique negative value to the field

7

alphanumeric coding be utilized in the QUESTIONNAIRE-CONTROL and RECORD-CONTROL statements where each input data item must be of the same data type as shown in the example When alphanumeric data variables are used in these control stateshyments their construction is identical to that of numeric items However when used elsewhere in the DATA-DIVISION alphanumeric variables are required to specify one of three possible comparison values as shown There are number of production instances when it never would be necessary or even desirable to reshycode alphanumeric data However as CONCOR attempts to force data into a totally numeric format upon output there is no current way to preserve these values if desired

An unwieldy alternative to this situation which may be acceptable under some circumstances would be the expansion of the number of comparison stringsfrom three to a more realistic number The limitation of this compromise is that a full twenty-six comparison identifiers would be required in order to accommodate data which utilized the entire alphabet A better solutionhowever would be to make the general format of the alphanumeric variables identical to that of numeric identifiers ie A9-23 and to permit alphashynumeric values so defined to pass unaltered through the CONCOR system

Anocher data-naming convention which caused several errors and which could be corrected concerns the array data definitional statements While arraysof two and more dimensions are handled in a superior manner by the CONCOR proshygram single-dimension arrays pose a problem in coding as shown in the Figure 2 It is suggested that the command imperatives be changed to permit the codingof both rows and columns in single dimension arrays ie allow a single row vector as well as a single column vector to maintain the consistel -yof the array data definitional statements

A major requirement of COBOL CONCOR file processing concerns the fact that all related data records must be physically contiguous on the input file The implication of this requirement is that files may require preprocessing prior to actual data editing (This preprocessing is usually a sort routine upon a selected CONTROL-AREA key) While this type of processing merely introduces a new step in file processing a major limitation becomes apparent when a largenumber of DISCRETE DATA files of the same census or survey questionnaire are to be processed This limitation is the introduction of manual steps to save the most recent inputed values ie preventing the program from startingwith cold values each batch run If a command such as LOADUNLOAD ARRAYS was incorporated into the language (an enhancement not believed to be difficult to implement) manual processing would be reduced to a minimum between batches and the maximum benefits of the hot deck methodology would be realized It is envisioned that such a command would automatically insure the transfer of the appropriately designated hot values Automatic processing of this nature if done correctly can greatly reduce the time required to clean multishyvolume files for once CONCOR language statements have been compiled linked

While it is possible at this time to save the arrays that amp-e used in the imputation processes on a separate write-file right now it is not possibleto automatically load those values back to an object program and to iTmedishyately resume processing on another volume It isbelieved that suh an automatic feature of the language would cut down the manual processing time significantly enough that it warrants inclusion into the package prior to its general distribution

FIGURE 2

A05-DI FF-BETWEEN-AGE-OF-FEMALE-BY-RECATION v2 4 4

AGE iF LHUSBAND RELATION Connents The ARRAY-DATA command statement

12-L7 18-24 25-35 36+ provides the means to declare array identifiers

2 1 3v 4v HEAD with up to five dimensions Current documentation 2 -1 3 CHILD is not as explicite about the rules of this 1 31 -2 -4 OTHER command as is desirable The parameters of I 2 2 2 NONRFLATTVE the command should function as follows

user-identifier number of dimensions D R C M number of rows number of columns

magnitude of element intiial start up valuesA06-DI FF-BETI4EEN-ArE-DF-PERSON-AND-M4OTHER 114

(This coding generates the below In the example A05 is a two dimensional array 16 18 21 23 error message) with 4 rows 4 columns a default magvitude of 9

and cold deck values as labeled

A06-DI FF- ETWEEN-AGE-OF-PERSON-AND-MOTHER 11t 587

I 2 III jqARNINIGDD-207) COMMAND TERMINATOR I) NOT FOUND C) ASSUMED PRESENT (2) EPROR (DD-9lI) DIMENSION OF USER-SPECIFIED ARRAY IS LESS THAN THE MINIMUM VALUE PERMITTFD (2)

PREEV1OUS DIAGIOSTIC AT CINIE 563

As shown by the array variable A06 CONCORs treatment of vectors is not consistent with the above multidimensional array skeme ie

(Example of how vectors must be currently A06 must be coded as follows coded to be correct)

A06-DIFF-BETWEEN-AGE-OF- PERSON-AND-MOTHERtl42 user-identifier 1 dimension number of elements in vector magnitude of element initial start up values

A simple modification to this command would permit 6 Lthe coding of both row and column vectors and make

16 LB 21 23 this command less error prone

0 0 0

9

and stored as an object module on the system no other compilations should berequired for questionnaire processing files of the same type Theoreticallya single well-written CONCOR program is all that would be required to process an entire census run

Appendix H contrasts the internal identifiers of the old and new languageversions Without such identifiers a user would have little information about the status of input as it is processed by EDITOR As noted in theappendix most internal pointers are reset upon each break in the CONTROL-AREAprovided a CONTROL-AREA has been defined The limitation here is that there are obvious instances when the termination in the processing mode would beadvantageous based on run counts although a CONTROL-AREA has been specifiedeg debugging CONCOR programs or comparing input files Therefore another set of pointers should be implemented for this purpose and made available for programmer reference

One clearly disturbing development which needs to be pursued during inshydepth testing of the system concerns the MAX-STORAGE parameters of the DEFINE-RECORD statement As shown in the figure on the following page when MAX-STORAGE was set equal to the maximum value a COBOL program was generated whichrequired 1O00K of core to run The MAX-STORAGE value of 999 is clearly notrealistic under most processing circumstances This example drives home severalimportant points about CONCOR The core requiremenis of CONCOR generated proshygrams can be influenced significantly by the amount or nature of programmerspecified I0 operations In fact it is possible to generate a program of a size most foreign country machines could not process It is recommended that tests determine a realistic max-value restriction for implementation to prevent problems in this area

The final area of recommended modification concerns the newly implementedREPORT-DIVISION The purpose of the REPORT-DIVISION is to enable a user todescribe or specify certain CONCOR language statements which will generatestatistical reports These reports contain statistics generated by EDITOR as specified by the GENERATE-EDIT-STATISTICS command of the EXECUTION4-DIVISIONAll of the reports produced are organized according to the data fields definedby the AREA-CONTROL command of the DATA-DICTIONARY If the AREA-CONTROL command is not defined in the DATA-DICTIONARY then all the statistics aresummarized at the total run level If a control area field is defined then allstatistics will be summarized for each unique CONTROL AREA as encountered bythe EDITCR program on the input file Statistics by total run level will notbe available This in part relates back to previous discussions citing theneed for new internal identifiers Report listings may contain the values ofentire records or entire questionnaires depending upon the keyword used inthe report generation commands The problem centers upon the homogeneity of CONCOR printouts during a production run

It is virtually impossible to distinguish reports on the basis of thevolumes they were run against Some means should be provided to allow users touniquely and purposefully label the reports generated in this division Indeedthe whole name REPORT-DIVISION suggests that such a command is implicit andappropriate Such a LABEL-REPORT or REPORT-FILE command along with file inforshymation from the system should not be difficult to implement

FIGURE 3

C O N C o R PAGE

SC0 N S I S T E N C Y A N D C-0 R R E C t I ON ) RUN 6AE 0hi88n

E D I T A N D I M P U T ATI 0 N S Y S T EM

USER DICTIONARY DIVISION-SOURCE LISTING

LINE NUMBER

7070

71

72 DEFINE-RECORD RECORDNAME= HOUSING-RECORDi 72

73 MAX-STORAGE=- 999 73

74 RLrRD-TYPE 11t NOTE AN LITERAL 74

267

71

P6T

268 DEFINE-RECORD RECORD-NAME= POPULATION-RECORD p68

P69MAX-STORAGE= 999269

270RECORD-TYPE= F1270

IEF2R5I JES2JOB04966SO0114 SYSOUT iEF2851 SYS8nO18T095553RAOOOAMCP12ARO000005 PASSED

= IEF2851 VOL qFP NOS TmPOn2 IEF285I wHKHDGMCONCORP12AERRORRECS KEPT

= IEF2851 VOL SEP NOS TMPO03 IEF3731 STEP LOAD START 800181001

IEF3741 STEP LOAD STOP 800181002 CPU OMIN 0165SEC SRR flMIN O25SEC VIRT iO00K SYS 3

STEP START TIME - 100155 STEP END TIME - 100257 STEP CPU TIME - 165 SECONDS STEP COMPLETION CODE - C 0

O TAPES - 0 DISKS - 0 PEGION (REQUESTED - 1000K USED - 1000K) IO COUNT 211

CARDS READ - 0 CARDS PUNCHED - 0 LINES PRINTE6 - 22

STEPNAME - LOAD CHARGE (MACHINE UNITS - 1198 DOLLARS- S-77 5TONUM9E 1

11

Concluding Remarks of System Modifications

Version 1 of the COBOL CONCOR language was notorious for abending without the generation of diagnostic messages Appendix I sets forth a test of some of these historical problems In eac) instance the system provided messagesand source program line references wLch enabled immediate identification of the processing difficulties However tests of the various commands revealed the general high performance character of the language in its newest version Under these circumstances the recommended modifica-ions should be considered as ones which will uniformly allow CONCOR to realize t-r potential of its design enhance its utility and moreover endure as a software product

12

IV THE ADEQUACY OF COBOL CONCOR DOCUMENTATION

The current documentation of CONCOR consists of a newly developed relativelyvoluminous systems manual a Diagnostic Message Guide and some loose-leaf inshystructions concerning the installation of the system on IBM computers Of these three documents the Diagnostic Message Guide is clearly complete and adequateand could bear no changes except to provide message number tabs to enable a programmer to more quickly locate the text of a message number An example of the superior format of this guide appears as Appendix J

Conspicuously absent from the list of systems documents is a Users Guide as originally specified in the ISPC enhancement proposal

A thorough assessment of the Systems Manual has been undertaken and indishycates with some reorganization and editing it could be made a more useful and usable document as itis well constituted informationally In its current form the Systems Manual can be best described as a working document a document which was not intended to teach the CONCOR language but centered upon an audishyence already familiar with previous CONCOR releases Further it is not asystems manual per se as some of its components and appendixes could more appropriately compose a users guide For example excerpts from the POPSTAN publications which review the basic editing concepts and contain the POPSTAN example problem should be deleted from the systems manual and would be more contextually effective in the Users Guide The EXECUTION-DIVISION chapter of the Systems Manual would especially benefit from general editing and reorganishyzation (Specifically the contents of pages 90-92 should be moved to page 63 -shythese pages explain the placement and purpose of the three permissible routines of the CONCOR language Had the author been granted more time he could have outlined additional specific modifications to bring existing documentation to a level of adequacy) However some guidelines stand out

1 Installation instruction and considerations should compose a chapter of the Systems Manual rather than a separate document (Hand-out materials covering installation were not adequate) Further a documentation scheme setting forth how CONCOR was installed on each computer and any modifications required during installation could be serially added to and become a permanent part of the installations systems manual If CONCOR is going to be used and if agencies are going to actively support it as a package installation limitations must be known and well-documented The prescribed format for this documentation should be set forth and any modifishycations to CONCOR should be continuously updated on these shcets so that a history of changes to the language will be readily accessible to supporting agencies If CONCOR is to survive as a standardized system editing results must be capable of being replicated among installations Such a form could contain other information as well concerning the tyce of computer amount of care and nature of utilities supporting

13

users Upon installation a copy of this form could be

sent to the US agency which will ultimately be responsible

for supporting the CONCOR package

an appendix2 A complete COBOL CONCOR program should appear in

for reference

3 The development of the Users Guide should include an intensive

review of the editing concepts involved in processing census

data files beyond the POPSTAN materials

4 An explanation of the CONCOR benchmark program syould appear

in the Users Guide and the Systems Manual The running of a

supplied benchmark program should be a standard installation

protocol used to test all operational aspects of a new

installation

This tool which has5 The development of a CONCOR pocket card been proven useful in teaching and assisting programmers in

utilizing programming language lays out all commands options on

a single small card An example of such a pocket card is the

Such a small document is useful in clari-IBM Assembler Card fying the language and setting forth structural rules without

continual reference to full-size manuals

14

V CONCLUSION

In a situation of extreme need CONCOR could probably be utilized as it seems capable of performing its editing functiens Its usefulness as a data

cleaning tool is generally undisputed The importance of having CONCOR exshyhaustively tested by an independent agency cannot be over-emphasized in light of prior failures of the system to perform adequately Additionally core requirements and processing speed should be precisely determined One must be encouraged in the amount of progress which has been made in a relatively short amount of time towards the completion of the CONCOR package While it is clearly beyond the scope of this report to estimate the cost and the amount of time required to implement the herein outlined highly desirable changes to the CONCOR language and documentation and then exhaustively test the system it is believed that this process could be completed within a single quarter (The changes are of the nature that no structural redefinishytions of a systematic nature would be required) These enhancements would make the COBOL CONCOR package more complete consistent and reliable in the field ISPC has competently moved the system to the point that its comshypletion iswithin reach

Given the probable revision of the current Systems Reference Manual and the development of a suitable Users Guide document it is likely that this version of CONCOR could be taught to experienced programmers (unfamiliar with previous versions of the language) in as little as four weeks And while one is impressed by the potential power and overall simplicity of the COBOL CONCOR language it is difficult to envision individuals with no real computer programming experience utilizing its full capabilities Hence the importance of including a review of basic editing concepts in the documentation is undershylined

Further as documentation is often critical to the perceived ease of use of computer-based systems itwill be essential to have these various manuals examined independently prior to their general use

As previously mentioned Appendix G represents features of which the ISPC feels could have positive impact on CONCOR Depending upon the outcome of the testing recommended in previous chapters and the availability of funding supporting agencies may desire to look again at an assembler (ALC) version of the language

Overall COBOL CONCOR remains an intermediate product yet to realize its potential which is thought considerable Software development of this nature is characteristically of a long-term evolutionary design In light of this reality agencies should be prepared to evaluate such projects success or failure in terms other than that of immediate utility It is the philosophy underlying the system not the system itself which will ultimately be exported

APPENDIX A

Bucen Enforcement Proposal

APPENDIX A

BUCEN IN-HOUSE REVIEW ENHANCEMENT PROPOSAL

1 Easy to use interrecord referencing

2 Improved output file capabilities

A provide overflow protection on WRITE command

B provide OUTPUT command to create corrected questionnaire file that matches IP file data dictionary

3 Improvededit statistics reported (LISTERR)

A provide automatic (user-specified) area break

B provide options for compilation and displaying edit statistics at various levels

C provide automatic (user-specified) tolerance checking of error rates by area

D automatically capture IDs of areas failing tolerance check

4 Clean up known bugs in code

5 Comprehensive testing

6 Clean up and enhance documentation

A reference manual more examples error message guide

B installation guide

C systems manual

APPENDIX B

EVALUATIVE CRITERIA

APPEiUIX B [4 si 11 T q tII

UNITED STATES GOVERNMENshy

Pfern0 KGa 7 dll TO DSPOPFPSD Robert H Haladay

DATE December 3 1979

DSPOPDEIO Liliane Floge

SUBJECT Criteria for Evaluation of SuCen COBOL CONCOR September 30thVersion as presented at January 1930 orkshop

The following points should be considered when evaluatingCONCOR the COBOLediting system during participation at the January 19380 workshyshop

1 Are the Installation anual the System ilanual and theUsers anual ready for use Are these manuals adequate fortheir purposes Are the manuals clear and consistentthey be understood by subject-matter personnel Can

as well program ers

as

2 Is the editing system capable of impleaenting all thefeatures as described in the documenit_tion 3 Does the system appc3Ar to be ely debugged andtested and ready for release to dLvlooing conris 4 Does the system appear to be fast enough to edit a

census in a reasonable airount of time 5 hat size core does tine sys t- equire

6 CorFm-ent in general on coole te-nss usefulness andefficiency of ir-nease usethe n of by developingcountry prograrrers and other c nsus personnel

cc Su-a-- e Olds APiA SDi OrWJulio Ortuizar luotolaCelta Systems

APPENDIX C

WORKSHOP ITINERARY

APPENDIX C

CONCOR Workshop Schedule January 7-18 1980

U S Bureau of the Census International Statistical Programs Center

Scuderi Building Room 209 4235 28th Avenue Marlow Heights Maryland

Monday January 7

930 am shy 1000 Welcoming Remarks

Overview of Workshop

1000 - 1000 Introduction to CONCOR - Purpose and function

- History of development - General computer

requirements

115 - 200 System Description

-Constraints in design of CONCOR

-Basic subsystems of CONCOR

-User interactions with system

-Examples of outputs produced

1030 shy 1045 Break

1045 - 1200 Editing Concepts - Ways to interrogate

data - Ways to correct data

- Editing housing and

population data - POPSTAN

- Advantages of CONCOR

1200 shy 115 pm Break

200 - 230

230 - 245

245 - 325

User Program Organization -Divisions - Sections

- Routines - Commands

Break

Command Language

Description -Types of statements -Format -Syntax

is

2

Tuesday January 8

Dictionary Division Command Statements

115 pm-2 15 Input-Record-Section930 am-10

30 Punctuation - Define-RecordInput data referencing

Working-Data-SectionWorking data declara-- New-Datations - Array-DataInternal data represen-

tation and storage Break215 - 230

1030 - 1045 Break 230 - 325 Dictionary Examples

- Minimum dictionary1045 - 1200 Dictionary-Attributes-

structureSection - Maximum dictionary- Dictionary-Name

structure

- Hand out dictionaryFile-Section problem- Input-File

- Output-File - Write-File

- Error-File

1200 - 115 pm Break

Wednesday January 9

115 pm-2 1 5 Execution Division Command

930 am-1030 Discussion of Partici-Statements (continued)pants Dictionary

- Routines of Edit-Specishyproblems fications-Section

Free work time -Prolog-Routine -Filter-Routine -Epilog-Routine

- Types and functions 1030 - 1045 Break

of edit specification

commands1045 - 1200 Execution Division

Command Statements - Range- Punctuation - Assert- Subscripting

- Internal Identifiers - Report-Control-Section

-Generate-Edit-Statistics 215 - 230 Break-Count-Imputes

-Examples 230 - 325 - PassFail clauses

- List1200 - 115 pm Break

3

Thursday January 10

930 am-1030 Discussion of Problems - If115 pm-215

Free work time - UntilExit - Stop

1030 - 1045 Break 215 - 230 Break

1045 shy 1200 Edit Specification Command Statements 230 - 325 - Drecode

(continued) - Grecode

- Allocate - Update - Let

1200 - 115pm Break

Friday January 11

930 am-1030 Edit Specification 115 pm-3 25 Hand out Edit Specishyfications as problemCommand Statements

(continued) Free work time

- Output - Write

1030 - 1045 Break

1045 - 1200 Report Division Command Statements - Display-Control-

Section -Display-Edit-Statistics

- Tolerance-Control-Section -Error-Rate-Check -Reject-File

-Report Examples

1200 - 115 pm Break

I

29

Monday January 14

930 am-1030 Discuss procedures for running problems on computer

1030-1045 Break

1045-1200 Component Programs of the CONCOR system

1200- 115 pm Break

Tuesday January 15

930 am - 325 pm Free work time

Wednesday January 16

930 am 1200 Free work time

1200- 115 pm Break

115 pm-215 How to Install CONCOR on IBM 360370 OS

215- 230 Break

230-325 Free work time

Thursday January 17

930 am-325 Free work time

115 pm-215 Future CONCOR Design Considerations - for census processing - for survey processing

- manual correction system

215- 230 Break

230 - 245 Evaluation Guidelines

- Hand out evaluation forms

245 - 325 Free work time

Friday January 18

930 am-1030 Free work time 115-325 Free work time

1030 - 1045 Break

1045 - 1200 Group Discussion on CONCOR - submission of written evaluations Distribution of IBM installation tapes Presentation of Certificates to Participants

1200-115 pm Break

APPENDIX D

PARTICIPANTS

APPENDIX D

CONCOR WORKSHOP January 7 - 18 1980 Marlow Heights MD

PARTICIPANTS

KENYA

James Midianga Government Computer Centre Central Bureau of Statistics PO Box 30266 Nairobi Kenya

PANAMA

Omar Rivera Rios Data Processing Specialist Contraloria General de la Republica de Panama Apartado Postal 5213 Panama 5 Panama

PHILIPPINES Guida T Capellan OIC Computer Programming Division National Census and Statistics Office Solicarel Bldg 1 Magsaysay Blvd

Sta Mesa Manila Philippines

THAILAND

Angsumal Sunalai National Statistical Office Bangkok 1 Thailand

EGYPT

Farag Sedky Mourad Ghaleb CAPMAS (Centeral Agency for Public

Mobilization and Statistics)NASR City Cairo Egypt

L

2 31

SAUDI ARABIA

Shargi R Al-Shargi National Computer Center PO Box 2534 Riyadh Saudi Arabia

Ahmed S Al-Ghamdi Dept Director of National Computer Center PO Box 2534 Riyadh Saudi Arabia

OTHER

Robert W ONeal USREPJECOR APO New York NY 09038

John N Adams Data Processing Advisor DACCA Department of State Washington DC 20520

OR co UNDP PO Box 224 Dacca Bangladesh

Joe Quasney US Census Bureau Washington DC 20233

Howard Brunsman 5715 N Ninth St Arlington VA 22205

Larry Shiller and Julio Ortuzar Delta Systems Consultants Inc 264 Alhambra Circle Coral Cables FL 33134

Nicholas Ourusoff Burpee Hill Road New London NH 03257 (United Nations)

Frederic J Grant IV Systems Development GeorgiaWorld Congress Institute 580 North Omni International Atlanta Georgia 30303

3 32

Rafael Samper Mary Jo Keenan Catherine B Gleason LACDR Room 2246 NS Washington DC 20523

John Marshall AIDSERDMPSE Rm 706C SA-12 Washington DC 20523

Susanne Bacon and Mary K Friday ISPCData Processing US Census Bureau Washington DC 20233

Leo Dougherty ISPCWorld Census Staff US Bureau of the Census Washington DC 20233

33

CON C OR WORKSHOP

January 7-18 1979 Washington DC

Staff

Robert R Bair

Luis Garcia

David Malkovsky

Sandra Mansfield

Selma Sawaya

Vivian Toro

APPENDIX E

CONCOR EVALUATION FORM

0

APPENDIX E

CONCOR WORKSHOP January 16 1980

BASIC COBOL VERSION 2

December 1979 Release

Participant Evaluation of Package

1 What is your impression of this version of CONCOR in terms of its usefulness and reliability for editing housing and population census data in less developed countries

2 If you are familiar with previous versions of CONCOR how does this current version compare to them

2D

35

2

3 How would you compare the utility of this version of CONCOR for less developed countries with other systems for editing data of which you are familiar

4 How well do you feel the documentation (Reference Manual and Diagnostic Messages Guide) supports the software

36

3

5 How well do you think this version of CONCOR would serve for editing other types of statistical censuses or surveys

6 If your organization does statistical data processing by computer would you recommend to your superiors that this software be used to edit the data Do you feel assistance would be required in the installation of this software on your organizations computer andor in training userR on the packages applicability to their work

37

7 In what way(s) could improvements or enhancements be made to this version

of the CONCOR software andor its documentation

8 Other comments

1shy

APPENDIX F

NEWOLD COMMAND COMPARISONS

0 0

OLD CONCOR VERSION 1 DECEMBER 1978

DATA-DICTIONARY

DEFINE INPUT-FILE

DEFINE ERROR-FILE

DEFINE COMMON-DATA

(contains the questionnaire-IDand record type location information)

DEFINE RECORD-TYPE=

DEFINE NEW-DATA

DEFINE ARRAY-DATA

APPENDIX F

NEW CONCOR VERSION 2 DECEMBER 1979

DICTIONARY-DIVISION

DICTIONARY-ATTRIBUTES-SECTION

DICTIONARY-NAME

FILE-SECTION

INPUT-FILE OUTPUT-FILE WRITE-FILE ERROR-FILE

DOENTIFICATION-CONTROL-SECTION

AREA-CONTROL

QUESTIONNAIRE-CONTROL

RECORD-CONTROL

INPUT-RECORD-SECTION

DEFINE-RECORD

WORKING-DATA-SECTION

NEW-DATA ARRAY-DATA

END-DiVISION

COMMENTS

In Version 1 all command keyword labels had to begin in column 1 ofthe coding sheet The position notation of Version 2 permits thecommand to begin in columns 1-72

The period preceding a statement

signifies that it is but a commentcard Accordingly it should be noted that while the END-DIVISION command must be present the division name identifier DICTIONARY-DIVISION

has not been implemented

Note Some parameters for each commandhave been altered even where general correspondence exists

OLD CONCOR VERSION 1 DECEMBER 1978

EDIT-SPECIFICATION

PROLOG

FILTER TYPE ( )

EPILOG

ALLOCATE (ALL) ASSERT (AST)

END ERROR

FILTER TYPE ( ) WITH

IF LET

RANGE (RNG) RECODE (REC) STOP --keyword

UPDATE (UPD) WRITE (WRT) XRECODE (XREC)

NEW CONCOR VERSION 2 DECEMBER 1979

EXECUTION-DIVISION

REPORT-CONTROL-SECTION

COUNT-IMPUTES GENERATE-EDIT-STATISTICS

EDIT-SPECIFICATION-SECTION

PROLOG-ROUTINE

FILTER-ROUTINE (name-of-record-type)

EPILOG-ROUTINE

ALLOCATE (ALLOC) ASSERT (ASRT) DRECODE (DRCD) END-DIVISION

EXIT

GRECODE (GRCD) IF LET LIST OUTPUT RANGE (RNG)

STOP UNTIL UPDATE (UPD) WRITE (WRT)

END-DIVISION

COMMENTS

Periods preceding division and section names signify that they are treated as comments by CONCOR

END-DIVISION signifies the termination of the CONCOR division organization and must always be present even though division identifiers have not been implemented

This figure additionally permits a comparison of the EDIT-SPECI-FICATION commands of both CONCOR language versions Note the changes in abbreviations and that some keyword options (not shown) have been altered even where general corresponshydence exists Individual commands are discussed on the following pages

CONCOR LANGUAGE COMMAND STATEMENTS

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

ALLOCATE (ALL) ALLOCATE (ALLOC) Allows for the assignment of values and with the UPDATE command providesthe imputation mechanism Major difference is this statement now providesfor a receiving-identifier list and statistics are generated as coded inthe GENERATE-EDIT-STATISTICS command option of the EXECUTION-DIVISION

ASSERT fAST) ASSERT (ASRT) The ASSERT command is the basic consistency editor command of CONCOR Command now provides for a NOERROR and MESSAGE option Additionallyby utilizinq the DUMPR DUMPO keywords provides the user with the option of using all the current value of all user-identifiers defined for the record type being processed Other changes to this command include the pass end-pass fail end-fail constructions for the P purpose of maintaining structured logic

(see RECODE) DRECODE (DRCD) Provides a method for recoding data items having a starting value of zero and continue in ascending sequence Replaces t previous XRECODE command

EXIT Provides the means to terminate EXECUTION (ESCAPE) of command statements within the UNTIL command

END END-IF END-THEN

A solitary END command is no lonqer permitted in conditional construction Note the similarity to COBOL pseudocode

END-ELSE

ERROR not implemented No longer supported in language

(continued)

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

FILTER TYPE ( ) WITH not implemented No longer are option statements supported in language This command once specified essentially a GOTO depending on construction

(see XRECODE) GRECODE (GRCD) This new command provides a method for the group recoding of input values

IF IF The IF statement has properties found virtually in all programming languages ie provides a means of evaluating a condition and controlling the execution of alternative logical statements Similar to pseudocode a CONCOR programmer must utilize the structured command IF THEN END-THEN ELSE END-ELSE construction in place of the IF THEN END ELSE END statements acceptable to the 1978 version

LET LET Virtually unchanged

LIST New command which allows for the generation of specific messages to be displayed in the report of edit statistics by questionnaire as well as specific individual identifiers

OUTPUT New command which provides the means by which records will be written to the output-file

(continuej)

4)bull

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 VERSION 1

RANGE (RNG)

RECODE (REC)

STOP- keyword

UPDATE (UPD)

(continued)

DECEMBER 1979 VERSION 2

RANGE (RNG)

name changed

STOP- keyword

UNTIL DO END-DO

UPDATE (UPD)

COMMENTS

As a basic editing command of CONCOR the December 1979 version permits the listing of the out-of-range values as well as the entire record or questionnaire through DUMPR and DUMPO optionswithin the command Another new option of this command involves the utilization of a PASS END-PASS FAIL END-FAIL construction permitting the specification of logicalcommand paths depending upon the outcome of the range test

Name changed to DRECODE Virtually identical with the DRC command in the COCENTS system

Unchanged

DO-END-DO one of the most significant and powerful enhancements to the CONCOR language this command provides a loopingcapability similar to most other high-level pogramminglanguages The number of iterations is under user control as well as the value initially assigned to the user identifier (counter) which is incremented with each successive execution of the command statements

Virtually unchanged it is the basic mechanism for maintaining arrays involved in imputation processes

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

WRITE (WRT) WRITE (WRT) The WRITE language statement while once utilized as the primary output command is now used to create an auxiliary or derivative file Statistics may be accumulated with the specification of the GENERATE-EDIT-STATISTICS option of the EXECUTION-DIVISION (See WRITE-FILE of DICTIONARY-DIVISION)

XRECODE (XREC) name changed Old command which has become the GRECODE command in the December 1979 version

0 0 0

DECEMBER 1978 VERSION 1

DECEMBER 1979 VERSION 2

Not implemented REPORT-DIViSION

DISPLAY-CONTROL-SECTION

DISPLAY-EDIT-STATISTICS

TOLERANCE-CONTROL-SECTION

ERROR-RATE-CHECK REJECT-FILE

END-DIVISION

COMMENTS

This new division (note division and section names treated as comments) provides the means to controlthe type of edit statistics reports to be generated Generally all reports produced by the commands ofthis division are organized according to the AREA-CONTROL command of the DICTIONARY-DIVISION Notethat this means the input data file must be preproshycessed (presorted external to CONCOR) on the basisof the AREA-CONTROL data field as CONCOR is incapableof merging uncontiguous records

There is currently no command to enable users to specify unique report headings on the listingfrom this division

APPENDIX G

ISPC FUTURE ENHANCEMENT LIST

G-1

Appendix G

Future Versions of CONCOR Design Considerations

The present version of the CONCOR system (Basic COBOL Version 2) was designed to provide adequate computing power to properly edit housing and population census da a using automatic correction techniquns As a result CONCOR should not be construed to be the ultimate software package for generalized data editing The developers of this version of CONCOR realize that there is a group of improvements that can be made to the system which would facilitate the census data editing process A second group of changes or enhancements also exist that could facilitate the editing of survey or other types of census data It should be noted that some of the modifications to meet these two objectives at times are mutually exclusive Some of the possible modifications and considerations are outlined below

1 Housing and Population Census Processing

1 1 Software

1111 Dictionary Division

(1) Implementation of the Dictionary Attributes Section This would allow the user to specify the dictionary size (number of pages) on disk and control other file characteristics such as volume specification and retention periods if applicable

(2) Addition of an optional input andor output file(s) which could be used to initialize and save values stored in arrays and used during the allocation process This feature would require additional syntax analysis and the creation of new commands to perform the movement of the data values

(3) An output record format area (Output Record Section) could be added This would allow the creation of edited output files in a different format and length than the file read into CONCOR This implementation would require other enhanceients to provide a simple means of moving values from th-iT i tato tile output file Unique naming of user-identifiers would also need to be addressed Another possiblity is the declaration of both the input and output location in the same command when specifying the contents of tile data record to be edited

(4) Addition of user-identifier naires to fields used in the controlling of summary statistics and unique questionnaire determination This would require modification to the system level data dictionary

46 G-2

(5) Increasing the number of area and questionnaire control fields

This implementation would require modification lo the

dictionary as tell as to the error records passed to the Report

Division Another problem would be the formatting of the extra

data values in the report headings

(6) Permit the mixing of difFerent data item types in the CONTROL

commands This would allow the greatest flexibilty in choosing

control fields In order to implement this enhancement the

dictionary and the generated COBOL program (EDITOR) would

require modification

(7) Give the user access to the values in the control fields during

the editing process Either the user would be prohibited from

changing these values (as is currently done with CONCOR

internal identifiers) to prevent the creation of new questionnaires or the summary information gathered on a

questionnaire basis would become meaningless Another

ramification of this enhancement is how to handle fields defined as alphanumeric as CONCOR internally works in binary

(8) Allow input data records with a similar format to be defined by

the same DEFINE-RECORD command statement This could be

accomplished by permitting multiple values in the RECORD-TYPE

clause of the DEFINE-RECORD command A possible problem here

is in the determination of which value to be move to the output

record when the record is to be written out using the OUTPUT

command 0

(9) Implement a COMMION-DATA concept to handle items command to all

record types This could be costly in terms of core storage

and additional conversion time if a storage area is allocated

for each identifier for each record type Only permitting one

common area for all records would require validation of the

data to ensure the values were exactly the same on each record

This wJould require the CONCOR program to perform many more

internal checks when reading the data file into the store area

(10) Permit the selective inclusion of data items found in the

DEFINE-RECORD command statement into the universe count used

for the determination of tolerance levels when using the COUNT-IIPUTES command statement

(11) Increase the number and types of data format referencing now

permitted External numeric data (type N) could be expanded to

handle 18 digit numbers Also signed numeric and packed

decimal data format could be added The signed numeric format

would allow both leading blanks and negative data values This

format is essential ihen dealing with data files generated by

FORTRAN programs

0

+

G-3 47

(12) Increase the number of comparison strings permitted for alphanumeric data items This would allow for easier processing of alphanumeric strings but would require modifications to the system data dictionary

(13) If a range of valid values were specified with the declaration of a user-identifier in the DEFINE-RECORD command an automatic range check could be performed prior to CONCOR giving the user access to the data items on the questionnaire The inclusion of such values are now done by many users in the form of comment statements This enhancement would also facilitate the use of the dictionary by tabulation software

(14) The language format used in the Dictionary Division could be modified to recognize both the space and comma character as valid delimiters The user would still be required to code two commas to receive the system default value for any parameter

(15) Provide a repetition factor in the initialization of arrays

(16) Print the data dictionary name in the titles of the dictionary documentation or provide a means by which the user may specify a title to appear in the listings

112 Execution Division

(M) Implement the Run Control Section This would allow the user to increase execution speed of the EDITOR program This would be accomplished by eliminating some of the system protection features such as division by zero and out of bounds checking In this secion the user could also specify new maximum values for the EDITOR error limits

(2) Provide an internal identifier (OCCURRENCE-PTR) which would contain the occurrence number of the record currently being filtered The only problem to resolve is how to treat this id-ntifier outside of the Filter Routines

(3) Implement a CREATE commnand This command would allow the user to create a record that Was not found on the input file This implementation would require special care as it gives the user the ability to change the valme of a CONCOR internal identifier (TYPE-CGUNT) and modify the contents of the store Another consequence of this command is what action is to be taken in the calculation of tolerance in terms of this new records inclusion into the universe of observations and number of changes

iK

48 G-4

(4) The code generated in the EDITOR program for the DRECODE

command should be modified to utilized a lookup table approach

This would decrease the execution time of the command but woul C

require additional communication between the GENED and GENDD

programs

(5) Permit the user to continue the message text field over

successive lines of code

(6) Implementation of a SET command command that would change

CONCORs default occurrence pointer for the duration of the SET

command This command would be ever helpful as the validation

of the existence of a record would only have to be done once

when the SET command was encounterd The major drawback to

this command is in the users ability to remember the action

taken during the last execution of the command when the

Execution Division command statements may cover many pages of

code

(7) Implement a method of specifying different data formats for

fields on the WRITE command statement This would give greater

flexibility to the user but would require major revision to the

routines now used to process the WRITE command

(8) Provide the user with two sets of internal identifiers The

first set currently exists in the system and is valid on the

basis of the current control area (if specified) being

executed The second set would be accumulated on a run total

basis and would require the creation of new CONCOR internal

ident ifiers

(9) Implement automatic array declarations for capturing allocation

frequency distributions This could be done by the system for

discrete values (especially if point number 13 above is

implemented) but a mechanism for handling continuous values

would need to be designed nother program would then be

required to format the information gathered into a readable

form

113 Report Division

(I) Provide a means by which the user may specify their own headings or titles -for the reports

(2) Give the user the ability to override page ejection as a means

of saving paper

(3) Provide a neans by which the statistics gathered may be

presented in other aqgregations other than the area-break and

total levels now pro ided This would require another

procedure to aggregate the statistics and pass them onto the

Report Division before the printing of the reports began Thi

extra procedure is required because of program size

cons i derat ions

G-5

49

114 System Level

(1) Generation of assembler (ALC) language code for the EDITOR program to be used on the IBM 360370 series type computers

(2) Modification of the GENSRC program i utilize a lookup table instead of the linear search technioa2 now used

(3) Modification of the RDMSGTXT program to look at continuation lines when searching for the text to be supplied by the system

(4) Hlodification to the system data dictionary to better arrange and make available information needed most frequently Also to examine the possibility of making the section dealing with the message text and report generation a separate file Investigate the industry standards for dictionary format and see how CONCOR meets the requirements set forth

(5) Make the parsing and syntacial routines interactive so as to increase the product ity of the CONCOR user This does not mean to imply that the editing process itself should be made interactive but rather only the development of the user code should be put online

12 Documentation

(1)The development of a self-teaching CONCOR manual

(2) The development of a CONCOR Users Guide

G-6

0

50

2 Survey or other Census Processing

21 Software

(1) Make optional the specification of the record type and questionnaire identification information as some files are not

hierarchical in nature

(2) Provide another input file as a means of doing a check-in procedure of sample cases expected This new input file would contain the master list of those oberservations selected to be

examined

(3) Allow floating point calculations

(4) Provide a mechanism for referencing repetitive sections of questionnaire This referencing (loop letter approach) could be accomplished in the same fashion as interrecord checking is now implemented Note that by using this approach two versions of the software would be required One version would be as CONCOR is now implemented and the second would accept flat data files where the subscript indicated a repetitive section on the survey document

(5) Add a weighting scheme to allow meaningful totals and summary

statistics based upon the sampling frame used to gather the data being edited

(6) Increase the number of user-identifiers allowed in the system because of the more detailed information found on survey forms

(7) Provide a means of manual correction of erroneous data items This correction scheme would utilize the data dictionary and allow for correction based upon the user-identifer name and

questionnaire identification CELADE has a COBOL version of

this program but it has not been finished nor tested

42

APPENDIX H

CONCOR SYSTEM INTERNAL VARIABLES

0 APPENO H

CONCOR SYSTEM INTERNAL VARIABLES

1 EOF-FLAG

2 ERROR-TN-QUESTIONNAIRE

3 CURRENT-POINTER-VALUE

4 TYPE-COUNT ( )

5 RECORD-COUNT

6 QUESTIONNAIRE-COUNT

7 RECORDS-IN-STORE

8 INVALID-RECORD-TYPE-COUNT

9 INCOMPLETE-FLAG

10 CONTINUATION-FLAG

11 INVALID-RECORD-FLAG

EOF-FLAG

ERRORS-IN-QUESTIONNAIRE

TYPE-COUNT ( )

IN-RECORD-COUNT

IN-QUESTIONNAIRE-COUNT

RECORDS-IN-STORE

INVALID-RECORD-TYPE-COUNT

INCOMPLETE-FLAG

CONTINUATION-FLAG

OUT-OF-RANGE

NOT-NUMBER

BLANK

(new internal counters) OUT-RECORD-COUNT OUTPUT-QUEST-COUNT WRITE-RECORD-COUNT

Change in spelling of variable

Not mentioned--possible ommission from manual

Not implemented

Note Current documentation equivocates when some variables

are reset--most are initialized with each control area break

Though implemented as reserved identifiers due to poor documentation of (old) CONCOR exact values and utility were unknown

These apparently new counters permit access to valuable totals within control areas

Note These are not cumulative counters Also outputshyquest-count violates naming conversion eg in out

APPENDIX I

CONCOR-EDITOR EXECUTION STATISTICS

C 0 N C o R - E D I T O R EXECUT ION S-TA-T1 ST I CS RUN DATEt 517I

APPENDIX I

CONTROL AREA 0 INPUTS RELD OUTPUTS BY OUTPUTCOHMAND OUTPUTS BY W0ITECOMP4JD NO VALI 0 SEQUENCE NUMBER 00000 00 0 RECTYPES 0

OUESTIONNAIRES RECORDS 0 QUEST KEYWORD- 0 ECORDS- RECORDS FORQUFSTshy

bullDIVISION BY ZERO LINE NUMBER i18 to DIVISION BY ZERO LINE NUMBER 118 bo DIVISION BY ZERO 00 LINE NUMBER 118

Oboo DIVISION BY ZERO 00 LINE NUMBER 118 LTNE |UJMP 000 DIVISION BY ZERO 000 LINE NUMBER 118

bo DIVISION BY ZERO 0 LINE NUMBER 118 00 RUN ABORTED DUE TO DIVISION BY ZERO ERRORS 00 1it LIST IR003 = Ro3t

1 6 42

112 LIST

Comments One of the most sever criticisms 114 LIST IDIVISIION BY ZErO ERROR Cw FOLLOWS of the old CONCOR package was that ]l LET RO01 003 = O it would cease processing for no

apparent reason--ABEND without generating any messages The most )16 LIST iRO01 R003 SET TO ZEROS Pou19 Ron3 conmon types of cessations are division by zero array coordinate 117 LIST RO0 0 = errors and user specified terminations A program to deliberately 118 LET P003 = R002 RO01 test CONCORS ability to providediagnostics was written The system 119 LIST LIST R003 R002RO01 =1 R0031 perfoned exceptionally well and correctly provided the line numbers in the 120 source program which caused the data exceptions The program text has been 121 superimposed upon the Execution Statistics page for illustration purposes 122 END TEST OF OPERANDS AND MSG

123

124

1 5 PEGISTERS RESET TO ZERO

125 LET R001 RO00 R003 = nt

0 N C 1 - Z D I T 0 it - ( r S T A T I S T I C S QJi DAT 0119

I ~ - ~APPENDIX b)APPENDIX TL I A Cr14A) 01 3TUTS JRtTF -0 VtLIQ AP-A I PUT R - 3ITD JTS LY UTPUT - Y C

I)ESTIjAAIES - PECLRW cU] I S T igry-t QEC UP )S irrkqS F] ) T

VLLII= FF= 15( C 1 u U-1 Tr JJ TI A)[ 9 )i 1= TI =WATE ) L i - U = VI L IJ) L T J3r-= I n

D 1 4A 1 EItR P IXSTI J A 1 L = 7 II I fIAT E I i Y C

= y 4 I 19f CZ U AT -Z-2j 1ESTIJh14 E CJ)PI) IATE= 2 V L 1E= L T I P1 =UM 11

= rU F r

F R L- -- I JE TI rjA OrPR eTE= rUMiF=E II V1 LIIF= LIT V M

tYfZU ttj gR R QUIESTI3Ii1 j Vi r IATE= LI n l FJw== 0R V J L T C ( -pJIT E~Q - cUESTT Y2ZjI 1= g rfiflUI)IATL= VeLUJE= P LIf PP= 1 3)

bull~ - A ED 011 1 Tn CCy -tENCE C rGR DI T FOR UP S )r 0 JCCUJ0EICF FJ71 ECGPO

173

175 UNTIL ROO1 gt 10 VARY P001 FROM 1 By Comments

In this example RO01 a 2 dimensional array 177 DO with 10 rows and 10 columns is attempting

178 UNTIL R002 - 10 VARY BO FROM 1 BY 1 to update the smaller (5 X 5) TA2 Nonshy- Uexistent element references caused the error

The program text has been superimposed on this

page for illustration purposes

180 ALLOCATE NOOP = TA1 (RO0] ROO)I

JAI UPDATE TA2 (RO01 R002) RO031

182 LIST N002 = R0037 NOO R003

1P3 LIST VALUES OF TA1 = RO03v TAl (RO01 R002)

184 LIST

185 LIST ITA2=TA2 RO01 Rn02)l

l6 END-DOI LIST

187 END-DO

deg r- -- shy

APPENJDIXI1 J( j

CONTROL AREA INPUTS READ OUTPUTS BY OUTPUT LOMANO OUTPUTS TEltCOMMNDfN

SEQUENCE NUMBER 11T A QUESTIONNAIRES RECORDS QUEST KEYWORD RECORDS SFf- 0 T rr-

JOB STOPPED DUETO USER STOP-RUN COMMAND ON LINC 215 - - -- 2 15 _ _ 4$ A

EDITOR -- NO RMAL END OF J B r T I1)

214 IF IN-RECORD-COUNT gt Comments In this example when the internal variable1215 ITHEN STOP-RUN END-THEN was than

215 TE END-THEN IN-RECORD-COUNT was greater than 500 processing was directed by the source COBOL CONCOR language

216 1 program to cease The program text has been - upon this page for illustrationI _-EC superimposed

S IF IN-RECORD-COUNT- lt 2 - - purposes ~~~~~~~~~~- bull - - ------I-

218~ THEN -LIST lt(THAN 20RECS INR

amp-~~~~~ ntai-beDcent

I

APPENDIX J

DIAGNOSTIC MEFSAGE GUIDE EXAMPLE

APPENDIX J DD-2

WARNING(DD-001) BLANK COMMAND STATEMENT LINE

EXPLANATION A blank command statement line was found in the user Dictionary Division command statement file

ACTION TAKEN Parsing continued with the next command statement line in the Dictionary Division command file

USER RESPONSE Put a comment indicator () in column 1 of the command statement line if spacing is desired if not remove the command statement line

ERROR (DD-002) ILLEGAL FIRST CHARACTER IN STRING

EXPLANATION A character not in the CONCOR legal character set (A-Z 0-9 - + the comma character () the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found as the first character in the string

ACTION TAKEN Parsing began again with the next string

USER RESPONSE Check for possible keying errors and ensure that the first character is a member of the valid CONCOR

character set

ERROR (DD-003) END OF LINE REACHED WITH NO DELIMITER FOR ILLEGAL STRING

EXPLANATION A string starting with a character not valid in the CONCOR legal character set (A-Z 0-9 - + the comma character ()p the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found without one of the delimiter characters ( =) when the end of the command line was reached

ACTION TAKEN Parsing began again with the next user command statement line in the Dictionary Division command file

USER RESPONSE Check for possible keying errors and ensure that each of the characters in the string is a member of the CONCOR valid character set (A-Z 0-9 - + the comma character () he delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ())

ERROR (DD-004) COMMENT INDICATOR () IN STRING

EXPLANATION The comment indicator character () was found embedded in a string

Page 5: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …

I BACKGROUND

CONCOR (an acronym of Consistency and Correction) is best characterized as a software tool designed to expedite the processing of data files duringthe edit and imputation phase of population census and surveys As a metashycompiler written in the COBOL language the system reads and verifies CONCOR language statements to produce an executable EDITOR program The objectiveof this process is the creation of an error-free file which can be used at a later time for tabulation purposes

Since its release as Version 1 December 30 1978 numerous elements of the COBOL CONCOR system have undergone continual change and redefinition Infact the system has not been permitted to stand still for any period of time nor has it been exhaustively tested In June of 1979 ISPC suspendedthe further distribution of the COBOL CONCOR system This decision was based principally upon reports of the packages unsatisfactory performance at workshops held in Panama and Thailand ISPC upon their own initiative developed a proposal to overhaul CONCOR and its accompanying documentation This proposal is contained in Appendix A representing an ambitious undertakingWhile not all of the desired changes and capabilities could be implementedVersion 2 of December 1979 represents a significant managerial effort The questions are now whether COBOL CONCOR Version 2 will be a demonstrably adeshyquate sofrvare package -- a package capable of exportation to developingcountries -- a package requiring no further modification The purpose of this report is to address these critical issues In connection with this Appendix B sets forth the specific criteria around which such a discussion must evolve As this is not intended to be a compendium some of these broader issues will be immediately treated following chapters and appendiceswill qualify the exact nature of system altu ations already undertaken as well as further adjustments believed to be essential in realizing the goals of the systems philosophy

Workshop

During the period of January 7-18 1930 a workshop was held under the sponsorship of the ISPC to demonstrate the capabilities of the latest reshyvision of the COBOL CONCOR software package A schedule of events of this workshop is contained in Appendix C This workshop was intended to provideparticipants with the opportunity to program in the CONCOR language and to thereby test aspects of the system as individually appropriate A listing of the participants and the international organizations they represented is

EDITOR is the new name of the EXECUTOR module of previous language versions

A complete history of the development of COBOL CONCOR can be found in both the ACCENTER 1978 Version 1 and December 1979 Version 2 systemsmanuals as well as in previous consulting reports

2

contained in Appendix D During the concluding days of the workshop each participant was asked by ISPC to provide a written evaluation of the now-called December 1979 version of CONCOR This evaluation form Appendix E also inshycludes space for comments concerning the competence of the system documentation as well as any additional comments including these regarding the organization and clarity of workshop presentations It is assumed that in the near future summaries of these comments will be available to interested agencies

While virtually all instructional aspects of this two-week workshop were conducted in a highly professional manner -- a manner which revealed a high degree of coordination among staff members in their efforts -- there are several areas which future workshops may improve upon

1 All publications should be assembled in their entirety and proof-read prior to distribution

2 A complete CONCOR language program example and accompanying 110 documents should be provided at the onset of the workshyshop for reference

3 Numerous short application programming problems involving all CONCOR language divisions should be utilized in place of a single lengthy problem

It is noted that this workshop was not intended to teach the CONCOR language as the organization and presentation of materials probably would have been different It is believed that the two-week time period was sufficient time to provide participants a familiarity with the use of the new CONCOR features especially in light of the fact that workshop participants were permitted to work weekends and beyond normal working hours at their disshycretion Though funding was not generally available it is known that several workshop members chose to extend their stay inWashington to continue testing the COMCOR package or to work on projects which they could attempt to immedishyately install on their home computers At the conclusion of the workshops participants were permitted to take with them an installation tape of CONCOR as well as all the other materials they had acquired during the course of the project

3

II THE ADEQUACY OF CONCOR

CONCOR has been described by its designers as an adequate packageAdequacy as an evaluative criteria is often relative to need and should not be confused with readiness as an issue The CONCOR system exclusive of documentation is sufficiently corplete that in a situation of extreme need it could be used as a data-cleaning tool in the editing and imputation phaseof census processing Less extreme circumstances would impose reticence on such an endorsement Though non-exhaustive tests indicate that CONCOR appearsto be capable of performing all of the commands as implemented because of the rapidness with which the system was rewritten it is thought that there has not been enough time to fully test all aspects of the project Thereforeprior to its general dissemination it is recommended that an independent agency conduct exhaustive tests to certify the integrity of the system proshygrams The importance of this certification cannot be understated in lightof previous workshop experiences Concurrent with this testing process the same agency should determine the relative speed and size of the system under actual production circumstances and further determine CONCORs ease of nstalshylation Later sections of this discussion set forth additional testing recom endations

It is generally recognized that of all the data-cleaning tools available for exportation CONCOR is potentially the most powerful especially with the addition of its new commands as outlined in Appendix F While its utility is not in doubt one must ask the question of how much more useful could CONCOR be if modified and would this additional utility be worth the costs involved The nature of modifications (excluding documentation) to COBOL CONCOR approprishyate at this time for cnsideration are threefold

1 Adjustments to the elements of the system which are internallyinconsistent or awkward to facilitate its learnability and usability am ig developing country programmers

a Implementation of the division headings DICTIONARY-DIVISION EXECUTION-DIVISION and REPORT-DIVISION De-emphasis of section headings

b Improvement of the consistency among data identifiers allow alphanumeric variables to be coded without mandatory comshyparison strings throughout the DATA-DIVISION and to be of the same length of numeric variables Permit numeric identifiers to be of an equal length to NEW DATA identishyfiers Permit the coding of single dimension row and column vectors in the same manner as multi-dimensional arrays

2 Implementation of selective commands and internal variables to facilitate the production environment use of CONCOR in census applications These include

4

a LOADUNLOAD arrays Commands which would save and replace automatically hot-decked values from batch to batch

b TOTAL-QUESTIONNAIRE-COUNT-RECORD-COUNT internal

variables independent of AREA CONTROL

3 Other modifications

a Default values for max-storage parameter set in realistic range

b Allowance of more variables for survey applications

Some of these modifications are part of what ISPC calls its wish list for the future development of CONCOR This document has been included in this report as Appendix G It is arguable that these features are essential to the completion of the CONCOR package While it is beyond the scope of this report to draw a conclusion in this area the enhancements as outlined above are ones that would make the language more internally consistent and thereby easier to learn and apply to a census data production environment These modifications are not arbitrary or cosmetic but are a direct result of handsshyon programming experience in the language as well as observations and disshycussions with other workshop participants While it is probably impossible to ever be satisfied with the overall structure of any programming language the resolution of this issue of completeness must be made relative to the objecshytives for developing the COBOL CONCOR system in the first place An explicit statement of these objectivEs has been absent in all systems documentation to date

5

III PROPOSED CHANGES TO THE LANGUAGE STRUCTURE

Based upon the assumption that it is the intent uf sponso-ing agenciesto optimize the COBOL CONCOR package -- a goal which is believed currentlyobtainable -- an understanding of the nature of these changes and how theywould impact users is essential Appendix F sets forth in a comparative manner differences between the old December 1978 and the new December 1979 editions of CONCOR Studying this appendix obviates the fact that while the new version of the language is clearly superior to the old in nearly everyaspect the basic and overall structure of the language is essentially unshychanged Compartmentalization of aspects of the language into divisions represents a significant ideological enhancement to the language Indeeddevelopment of programs by divisions proved to be an extremely useful way of understanding the nature of editing work to be performed However note that while the END-DIVISION comnand is essential to the language the division headings DICTIONARY-DIVISION EXECUTION-DIVISION and REPORT-DIVISION were not implemented and are therefore preceded by a period to be treated as comment lines in the program listing It is inconsistent to implement END-DIVISION commands while not implementing the division headings It is believed that this division structuring is important enough to the overall organizational structure of a CONCOR source language program that it should be implementedprior to general distribution The section headers shown on the figures inAppendix F however are another matter They are cumbersome and were generally not coded by workshop participants and they could be deleted from this version of the language altogether with little loss in organizational understanding The CONCOR language is significantly powerful to stand on its own as a distinct product and is not meant to be a COBOL imitation its present degree of development and specialization do not warrant the structural drag of additional section identifiers is the probable intent of the originalCONCOR project was to develop a package which was uncomplicated and unwieldy to use The question of division and section names implementation while seemingly cosmetic can have real impact on its perceived easiness of learning and use

Figure 1 on the following page illustrates common mistakes programmersmake coding numeric and alphanumeric variables in the DATA-DIVISION These mistakes are the result of the inconsistent variable formats For instancein the numeric data definition statement it is permissible to specify 19-23 where N signifies numeric 9 signifies the length of the item and 23 specifiesthe starting position in the record In NEW-DATA however it is possible to code an item with a maximum length of 18 While on the surface this inconshysistency would seem harmless typically some data defined user variables in NEW-DATA defined N18 could be moved inadvertently to output record fields defined by a data definition statement 119-23 such an action would result in a data error Under certain circumstances itwould be highly desirable to output these larger length values A similar circumstance exists between the numeric and alphanumeric data coding conventions While the maximum lengthof the numeric is permitted to be 9 in the data definition statement (18 in NEW-DATA) the maximum alphanumeric variable is permitted to be only 4 characters in length In the current systems manual it is recommended that

FIGURE 1

DICTIONARY-DIVISION

DICTIONARY-NAME DATA-CODING-EXAMPLE

INPUT-FILE

OUTPUT-FILE

AREA-CONTROL N2-2 N2-4 N3-6 N2-9 N9-23 QUESTIONNAIRE-CONTROL A4-2 A3-6 A2-9 A3-11 A3-14

RECORD-CONTROL Al-l

DEFINE-RECORD

HOI-TYPE-OF-HOUISING-UNIT Nl-17

H02-MATERIAL-OF-ROOF N1-19 10 9

H03-TOTAL-PERSONS-IN-UNIT N8-40 NOT-NUMERIC BLAIK

1104-STATE-OF-UIIII-CODE A4-50 0 U 1 D

DEFINE-RECORD

P01-SEX 1-13 W F

NEW-DATA

NOI-SAVE-TYPE-OF-HOUSING-UNIT

N02-SAVE-TYPE-OF-ROOF 1

N03-COUNT-TOTAL-IN-UNITS 10 0

N04-AGGREGATE-INCOME 18 0

END-DIVISION

Explanations

N2-4 This is an example of an external numeric input data item (N) with a length of 2 bytes starting in column 4 of the input record The maximumlength of this type of variable outside of NEW-DATA is 9 When coded in

NEW-DATA 18 is permitted

A4-2 This is an example of an external alphanumeric input data item (A)

with a length of 4 bytes starting in column 2 of the input record This

construction for alphanumeric variable is valid only in the control stateshyments Additionally it can never be over 4 bytes in length When alphshynumeric data fields are defined within record types the EDITOR program

requires that the comparison strings always be specified A maximum of 3 is permitted The purpose of these strings is to force recode the data to a numeric value If no match is found EDITOR automatically assigns a unique negative value to the field

7

alphanumeric coding be utilized in the QUESTIONNAIRE-CONTROL and RECORD-CONTROL statements where each input data item must be of the same data type as shown in the example When alphanumeric data variables are used in these control stateshyments their construction is identical to that of numeric items However when used elsewhere in the DATA-DIVISION alphanumeric variables are required to specify one of three possible comparison values as shown There are number of production instances when it never would be necessary or even desirable to reshycode alphanumeric data However as CONCOR attempts to force data into a totally numeric format upon output there is no current way to preserve these values if desired

An unwieldy alternative to this situation which may be acceptable under some circumstances would be the expansion of the number of comparison stringsfrom three to a more realistic number The limitation of this compromise is that a full twenty-six comparison identifiers would be required in order to accommodate data which utilized the entire alphabet A better solutionhowever would be to make the general format of the alphanumeric variables identical to that of numeric identifiers ie A9-23 and to permit alphashynumeric values so defined to pass unaltered through the CONCOR system

Anocher data-naming convention which caused several errors and which could be corrected concerns the array data definitional statements While arraysof two and more dimensions are handled in a superior manner by the CONCOR proshygram single-dimension arrays pose a problem in coding as shown in the Figure 2 It is suggested that the command imperatives be changed to permit the codingof both rows and columns in single dimension arrays ie allow a single row vector as well as a single column vector to maintain the consistel -yof the array data definitional statements

A major requirement of COBOL CONCOR file processing concerns the fact that all related data records must be physically contiguous on the input file The implication of this requirement is that files may require preprocessing prior to actual data editing (This preprocessing is usually a sort routine upon a selected CONTROL-AREA key) While this type of processing merely introduces a new step in file processing a major limitation becomes apparent when a largenumber of DISCRETE DATA files of the same census or survey questionnaire are to be processed This limitation is the introduction of manual steps to save the most recent inputed values ie preventing the program from startingwith cold values each batch run If a command such as LOADUNLOAD ARRAYS was incorporated into the language (an enhancement not believed to be difficult to implement) manual processing would be reduced to a minimum between batches and the maximum benefits of the hot deck methodology would be realized It is envisioned that such a command would automatically insure the transfer of the appropriately designated hot values Automatic processing of this nature if done correctly can greatly reduce the time required to clean multishyvolume files for once CONCOR language statements have been compiled linked

While it is possible at this time to save the arrays that amp-e used in the imputation processes on a separate write-file right now it is not possibleto automatically load those values back to an object program and to iTmedishyately resume processing on another volume It isbelieved that suh an automatic feature of the language would cut down the manual processing time significantly enough that it warrants inclusion into the package prior to its general distribution

FIGURE 2

A05-DI FF-BETWEEN-AGE-OF-FEMALE-BY-RECATION v2 4 4

AGE iF LHUSBAND RELATION Connents The ARRAY-DATA command statement

12-L7 18-24 25-35 36+ provides the means to declare array identifiers

2 1 3v 4v HEAD with up to five dimensions Current documentation 2 -1 3 CHILD is not as explicite about the rules of this 1 31 -2 -4 OTHER command as is desirable The parameters of I 2 2 2 NONRFLATTVE the command should function as follows

user-identifier number of dimensions D R C M number of rows number of columns

magnitude of element intiial start up valuesA06-DI FF-BETI4EEN-ArE-DF-PERSON-AND-M4OTHER 114

(This coding generates the below In the example A05 is a two dimensional array 16 18 21 23 error message) with 4 rows 4 columns a default magvitude of 9

and cold deck values as labeled

A06-DI FF- ETWEEN-AGE-OF-PERSON-AND-MOTHER 11t 587

I 2 III jqARNINIGDD-207) COMMAND TERMINATOR I) NOT FOUND C) ASSUMED PRESENT (2) EPROR (DD-9lI) DIMENSION OF USER-SPECIFIED ARRAY IS LESS THAN THE MINIMUM VALUE PERMITTFD (2)

PREEV1OUS DIAGIOSTIC AT CINIE 563

As shown by the array variable A06 CONCORs treatment of vectors is not consistent with the above multidimensional array skeme ie

(Example of how vectors must be currently A06 must be coded as follows coded to be correct)

A06-DIFF-BETWEEN-AGE-OF- PERSON-AND-MOTHERtl42 user-identifier 1 dimension number of elements in vector magnitude of element initial start up values

A simple modification to this command would permit 6 Lthe coding of both row and column vectors and make

16 LB 21 23 this command less error prone

0 0 0

9

and stored as an object module on the system no other compilations should berequired for questionnaire processing files of the same type Theoreticallya single well-written CONCOR program is all that would be required to process an entire census run

Appendix H contrasts the internal identifiers of the old and new languageversions Without such identifiers a user would have little information about the status of input as it is processed by EDITOR As noted in theappendix most internal pointers are reset upon each break in the CONTROL-AREAprovided a CONTROL-AREA has been defined The limitation here is that there are obvious instances when the termination in the processing mode would beadvantageous based on run counts although a CONTROL-AREA has been specifiedeg debugging CONCOR programs or comparing input files Therefore another set of pointers should be implemented for this purpose and made available for programmer reference

One clearly disturbing development which needs to be pursued during inshydepth testing of the system concerns the MAX-STORAGE parameters of the DEFINE-RECORD statement As shown in the figure on the following page when MAX-STORAGE was set equal to the maximum value a COBOL program was generated whichrequired 1O00K of core to run The MAX-STORAGE value of 999 is clearly notrealistic under most processing circumstances This example drives home severalimportant points about CONCOR The core requiremenis of CONCOR generated proshygrams can be influenced significantly by the amount or nature of programmerspecified I0 operations In fact it is possible to generate a program of a size most foreign country machines could not process It is recommended that tests determine a realistic max-value restriction for implementation to prevent problems in this area

The final area of recommended modification concerns the newly implementedREPORT-DIVISION The purpose of the REPORT-DIVISION is to enable a user todescribe or specify certain CONCOR language statements which will generatestatistical reports These reports contain statistics generated by EDITOR as specified by the GENERATE-EDIT-STATISTICS command of the EXECUTION4-DIVISIONAll of the reports produced are organized according to the data fields definedby the AREA-CONTROL command of the DATA-DICTIONARY If the AREA-CONTROL command is not defined in the DATA-DICTIONARY then all the statistics aresummarized at the total run level If a control area field is defined then allstatistics will be summarized for each unique CONTROL AREA as encountered bythe EDITCR program on the input file Statistics by total run level will notbe available This in part relates back to previous discussions citing theneed for new internal identifiers Report listings may contain the values ofentire records or entire questionnaires depending upon the keyword used inthe report generation commands The problem centers upon the homogeneity of CONCOR printouts during a production run

It is virtually impossible to distinguish reports on the basis of thevolumes they were run against Some means should be provided to allow users touniquely and purposefully label the reports generated in this division Indeedthe whole name REPORT-DIVISION suggests that such a command is implicit andappropriate Such a LABEL-REPORT or REPORT-FILE command along with file inforshymation from the system should not be difficult to implement

FIGURE 3

C O N C o R PAGE

SC0 N S I S T E N C Y A N D C-0 R R E C t I ON ) RUN 6AE 0hi88n

E D I T A N D I M P U T ATI 0 N S Y S T EM

USER DICTIONARY DIVISION-SOURCE LISTING

LINE NUMBER

7070

71

72 DEFINE-RECORD RECORDNAME= HOUSING-RECORDi 72

73 MAX-STORAGE=- 999 73

74 RLrRD-TYPE 11t NOTE AN LITERAL 74

267

71

P6T

268 DEFINE-RECORD RECORD-NAME= POPULATION-RECORD p68

P69MAX-STORAGE= 999269

270RECORD-TYPE= F1270

IEF2R5I JES2JOB04966SO0114 SYSOUT iEF2851 SYS8nO18T095553RAOOOAMCP12ARO000005 PASSED

= IEF2851 VOL qFP NOS TmPOn2 IEF285I wHKHDGMCONCORP12AERRORRECS KEPT

= IEF2851 VOL SEP NOS TMPO03 IEF3731 STEP LOAD START 800181001

IEF3741 STEP LOAD STOP 800181002 CPU OMIN 0165SEC SRR flMIN O25SEC VIRT iO00K SYS 3

STEP START TIME - 100155 STEP END TIME - 100257 STEP CPU TIME - 165 SECONDS STEP COMPLETION CODE - C 0

O TAPES - 0 DISKS - 0 PEGION (REQUESTED - 1000K USED - 1000K) IO COUNT 211

CARDS READ - 0 CARDS PUNCHED - 0 LINES PRINTE6 - 22

STEPNAME - LOAD CHARGE (MACHINE UNITS - 1198 DOLLARS- S-77 5TONUM9E 1

11

Concluding Remarks of System Modifications

Version 1 of the COBOL CONCOR language was notorious for abending without the generation of diagnostic messages Appendix I sets forth a test of some of these historical problems In eac) instance the system provided messagesand source program line references wLch enabled immediate identification of the processing difficulties However tests of the various commands revealed the general high performance character of the language in its newest version Under these circumstances the recommended modifica-ions should be considered as ones which will uniformly allow CONCOR to realize t-r potential of its design enhance its utility and moreover endure as a software product

12

IV THE ADEQUACY OF COBOL CONCOR DOCUMENTATION

The current documentation of CONCOR consists of a newly developed relativelyvoluminous systems manual a Diagnostic Message Guide and some loose-leaf inshystructions concerning the installation of the system on IBM computers Of these three documents the Diagnostic Message Guide is clearly complete and adequateand could bear no changes except to provide message number tabs to enable a programmer to more quickly locate the text of a message number An example of the superior format of this guide appears as Appendix J

Conspicuously absent from the list of systems documents is a Users Guide as originally specified in the ISPC enhancement proposal

A thorough assessment of the Systems Manual has been undertaken and indishycates with some reorganization and editing it could be made a more useful and usable document as itis well constituted informationally In its current form the Systems Manual can be best described as a working document a document which was not intended to teach the CONCOR language but centered upon an audishyence already familiar with previous CONCOR releases Further it is not asystems manual per se as some of its components and appendixes could more appropriately compose a users guide For example excerpts from the POPSTAN publications which review the basic editing concepts and contain the POPSTAN example problem should be deleted from the systems manual and would be more contextually effective in the Users Guide The EXECUTION-DIVISION chapter of the Systems Manual would especially benefit from general editing and reorganishyzation (Specifically the contents of pages 90-92 should be moved to page 63 -shythese pages explain the placement and purpose of the three permissible routines of the CONCOR language Had the author been granted more time he could have outlined additional specific modifications to bring existing documentation to a level of adequacy) However some guidelines stand out

1 Installation instruction and considerations should compose a chapter of the Systems Manual rather than a separate document (Hand-out materials covering installation were not adequate) Further a documentation scheme setting forth how CONCOR was installed on each computer and any modifications required during installation could be serially added to and become a permanent part of the installations systems manual If CONCOR is going to be used and if agencies are going to actively support it as a package installation limitations must be known and well-documented The prescribed format for this documentation should be set forth and any modifishycations to CONCOR should be continuously updated on these shcets so that a history of changes to the language will be readily accessible to supporting agencies If CONCOR is to survive as a standardized system editing results must be capable of being replicated among installations Such a form could contain other information as well concerning the tyce of computer amount of care and nature of utilities supporting

13

users Upon installation a copy of this form could be

sent to the US agency which will ultimately be responsible

for supporting the CONCOR package

an appendix2 A complete COBOL CONCOR program should appear in

for reference

3 The development of the Users Guide should include an intensive

review of the editing concepts involved in processing census

data files beyond the POPSTAN materials

4 An explanation of the CONCOR benchmark program syould appear

in the Users Guide and the Systems Manual The running of a

supplied benchmark program should be a standard installation

protocol used to test all operational aspects of a new

installation

This tool which has5 The development of a CONCOR pocket card been proven useful in teaching and assisting programmers in

utilizing programming language lays out all commands options on

a single small card An example of such a pocket card is the

Such a small document is useful in clari-IBM Assembler Card fying the language and setting forth structural rules without

continual reference to full-size manuals

14

V CONCLUSION

In a situation of extreme need CONCOR could probably be utilized as it seems capable of performing its editing functiens Its usefulness as a data

cleaning tool is generally undisputed The importance of having CONCOR exshyhaustively tested by an independent agency cannot be over-emphasized in light of prior failures of the system to perform adequately Additionally core requirements and processing speed should be precisely determined One must be encouraged in the amount of progress which has been made in a relatively short amount of time towards the completion of the CONCOR package While it is clearly beyond the scope of this report to estimate the cost and the amount of time required to implement the herein outlined highly desirable changes to the CONCOR language and documentation and then exhaustively test the system it is believed that this process could be completed within a single quarter (The changes are of the nature that no structural redefinishytions of a systematic nature would be required) These enhancements would make the COBOL CONCOR package more complete consistent and reliable in the field ISPC has competently moved the system to the point that its comshypletion iswithin reach

Given the probable revision of the current Systems Reference Manual and the development of a suitable Users Guide document it is likely that this version of CONCOR could be taught to experienced programmers (unfamiliar with previous versions of the language) in as little as four weeks And while one is impressed by the potential power and overall simplicity of the COBOL CONCOR language it is difficult to envision individuals with no real computer programming experience utilizing its full capabilities Hence the importance of including a review of basic editing concepts in the documentation is undershylined

Further as documentation is often critical to the perceived ease of use of computer-based systems itwill be essential to have these various manuals examined independently prior to their general use

As previously mentioned Appendix G represents features of which the ISPC feels could have positive impact on CONCOR Depending upon the outcome of the testing recommended in previous chapters and the availability of funding supporting agencies may desire to look again at an assembler (ALC) version of the language

Overall COBOL CONCOR remains an intermediate product yet to realize its potential which is thought considerable Software development of this nature is characteristically of a long-term evolutionary design In light of this reality agencies should be prepared to evaluate such projects success or failure in terms other than that of immediate utility It is the philosophy underlying the system not the system itself which will ultimately be exported

APPENDIX A

Bucen Enforcement Proposal

APPENDIX A

BUCEN IN-HOUSE REVIEW ENHANCEMENT PROPOSAL

1 Easy to use interrecord referencing

2 Improved output file capabilities

A provide overflow protection on WRITE command

B provide OUTPUT command to create corrected questionnaire file that matches IP file data dictionary

3 Improvededit statistics reported (LISTERR)

A provide automatic (user-specified) area break

B provide options for compilation and displaying edit statistics at various levels

C provide automatic (user-specified) tolerance checking of error rates by area

D automatically capture IDs of areas failing tolerance check

4 Clean up known bugs in code

5 Comprehensive testing

6 Clean up and enhance documentation

A reference manual more examples error message guide

B installation guide

C systems manual

APPENDIX B

EVALUATIVE CRITERIA

APPEiUIX B [4 si 11 T q tII

UNITED STATES GOVERNMENshy

Pfern0 KGa 7 dll TO DSPOPFPSD Robert H Haladay

DATE December 3 1979

DSPOPDEIO Liliane Floge

SUBJECT Criteria for Evaluation of SuCen COBOL CONCOR September 30thVersion as presented at January 1930 orkshop

The following points should be considered when evaluatingCONCOR the COBOLediting system during participation at the January 19380 workshyshop

1 Are the Installation anual the System ilanual and theUsers anual ready for use Are these manuals adequate fortheir purposes Are the manuals clear and consistentthey be understood by subject-matter personnel Can

as well program ers

as

2 Is the editing system capable of impleaenting all thefeatures as described in the documenit_tion 3 Does the system appc3Ar to be ely debugged andtested and ready for release to dLvlooing conris 4 Does the system appear to be fast enough to edit a

census in a reasonable airount of time 5 hat size core does tine sys t- equire

6 CorFm-ent in general on coole te-nss usefulness andefficiency of ir-nease usethe n of by developingcountry prograrrers and other c nsus personnel

cc Su-a-- e Olds APiA SDi OrWJulio Ortuizar luotolaCelta Systems

APPENDIX C

WORKSHOP ITINERARY

APPENDIX C

CONCOR Workshop Schedule January 7-18 1980

U S Bureau of the Census International Statistical Programs Center

Scuderi Building Room 209 4235 28th Avenue Marlow Heights Maryland

Monday January 7

930 am shy 1000 Welcoming Remarks

Overview of Workshop

1000 - 1000 Introduction to CONCOR - Purpose and function

- History of development - General computer

requirements

115 - 200 System Description

-Constraints in design of CONCOR

-Basic subsystems of CONCOR

-User interactions with system

-Examples of outputs produced

1030 shy 1045 Break

1045 - 1200 Editing Concepts - Ways to interrogate

data - Ways to correct data

- Editing housing and

population data - POPSTAN

- Advantages of CONCOR

1200 shy 115 pm Break

200 - 230

230 - 245

245 - 325

User Program Organization -Divisions - Sections

- Routines - Commands

Break

Command Language

Description -Types of statements -Format -Syntax

is

2

Tuesday January 8

Dictionary Division Command Statements

115 pm-2 15 Input-Record-Section930 am-10

30 Punctuation - Define-RecordInput data referencing

Working-Data-SectionWorking data declara-- New-Datations - Array-DataInternal data represen-

tation and storage Break215 - 230

1030 - 1045 Break 230 - 325 Dictionary Examples

- Minimum dictionary1045 - 1200 Dictionary-Attributes-

structureSection - Maximum dictionary- Dictionary-Name

structure

- Hand out dictionaryFile-Section problem- Input-File

- Output-File - Write-File

- Error-File

1200 - 115 pm Break

Wednesday January 9

115 pm-2 1 5 Execution Division Command

930 am-1030 Discussion of Partici-Statements (continued)pants Dictionary

- Routines of Edit-Specishyproblems fications-Section

Free work time -Prolog-Routine -Filter-Routine -Epilog-Routine

- Types and functions 1030 - 1045 Break

of edit specification

commands1045 - 1200 Execution Division

Command Statements - Range- Punctuation - Assert- Subscripting

- Internal Identifiers - Report-Control-Section

-Generate-Edit-Statistics 215 - 230 Break-Count-Imputes

-Examples 230 - 325 - PassFail clauses

- List1200 - 115 pm Break

3

Thursday January 10

930 am-1030 Discussion of Problems - If115 pm-215

Free work time - UntilExit - Stop

1030 - 1045 Break 215 - 230 Break

1045 shy 1200 Edit Specification Command Statements 230 - 325 - Drecode

(continued) - Grecode

- Allocate - Update - Let

1200 - 115pm Break

Friday January 11

930 am-1030 Edit Specification 115 pm-3 25 Hand out Edit Specishyfications as problemCommand Statements

(continued) Free work time

- Output - Write

1030 - 1045 Break

1045 - 1200 Report Division Command Statements - Display-Control-

Section -Display-Edit-Statistics

- Tolerance-Control-Section -Error-Rate-Check -Reject-File

-Report Examples

1200 - 115 pm Break

I

29

Monday January 14

930 am-1030 Discuss procedures for running problems on computer

1030-1045 Break

1045-1200 Component Programs of the CONCOR system

1200- 115 pm Break

Tuesday January 15

930 am - 325 pm Free work time

Wednesday January 16

930 am 1200 Free work time

1200- 115 pm Break

115 pm-215 How to Install CONCOR on IBM 360370 OS

215- 230 Break

230-325 Free work time

Thursday January 17

930 am-325 Free work time

115 pm-215 Future CONCOR Design Considerations - for census processing - for survey processing

- manual correction system

215- 230 Break

230 - 245 Evaluation Guidelines

- Hand out evaluation forms

245 - 325 Free work time

Friday January 18

930 am-1030 Free work time 115-325 Free work time

1030 - 1045 Break

1045 - 1200 Group Discussion on CONCOR - submission of written evaluations Distribution of IBM installation tapes Presentation of Certificates to Participants

1200-115 pm Break

APPENDIX D

PARTICIPANTS

APPENDIX D

CONCOR WORKSHOP January 7 - 18 1980 Marlow Heights MD

PARTICIPANTS

KENYA

James Midianga Government Computer Centre Central Bureau of Statistics PO Box 30266 Nairobi Kenya

PANAMA

Omar Rivera Rios Data Processing Specialist Contraloria General de la Republica de Panama Apartado Postal 5213 Panama 5 Panama

PHILIPPINES Guida T Capellan OIC Computer Programming Division National Census and Statistics Office Solicarel Bldg 1 Magsaysay Blvd

Sta Mesa Manila Philippines

THAILAND

Angsumal Sunalai National Statistical Office Bangkok 1 Thailand

EGYPT

Farag Sedky Mourad Ghaleb CAPMAS (Centeral Agency for Public

Mobilization and Statistics)NASR City Cairo Egypt

L

2 31

SAUDI ARABIA

Shargi R Al-Shargi National Computer Center PO Box 2534 Riyadh Saudi Arabia

Ahmed S Al-Ghamdi Dept Director of National Computer Center PO Box 2534 Riyadh Saudi Arabia

OTHER

Robert W ONeal USREPJECOR APO New York NY 09038

John N Adams Data Processing Advisor DACCA Department of State Washington DC 20520

OR co UNDP PO Box 224 Dacca Bangladesh

Joe Quasney US Census Bureau Washington DC 20233

Howard Brunsman 5715 N Ninth St Arlington VA 22205

Larry Shiller and Julio Ortuzar Delta Systems Consultants Inc 264 Alhambra Circle Coral Cables FL 33134

Nicholas Ourusoff Burpee Hill Road New London NH 03257 (United Nations)

Frederic J Grant IV Systems Development GeorgiaWorld Congress Institute 580 North Omni International Atlanta Georgia 30303

3 32

Rafael Samper Mary Jo Keenan Catherine B Gleason LACDR Room 2246 NS Washington DC 20523

John Marshall AIDSERDMPSE Rm 706C SA-12 Washington DC 20523

Susanne Bacon and Mary K Friday ISPCData Processing US Census Bureau Washington DC 20233

Leo Dougherty ISPCWorld Census Staff US Bureau of the Census Washington DC 20233

33

CON C OR WORKSHOP

January 7-18 1979 Washington DC

Staff

Robert R Bair

Luis Garcia

David Malkovsky

Sandra Mansfield

Selma Sawaya

Vivian Toro

APPENDIX E

CONCOR EVALUATION FORM

0

APPENDIX E

CONCOR WORKSHOP January 16 1980

BASIC COBOL VERSION 2

December 1979 Release

Participant Evaluation of Package

1 What is your impression of this version of CONCOR in terms of its usefulness and reliability for editing housing and population census data in less developed countries

2 If you are familiar with previous versions of CONCOR how does this current version compare to them

2D

35

2

3 How would you compare the utility of this version of CONCOR for less developed countries with other systems for editing data of which you are familiar

4 How well do you feel the documentation (Reference Manual and Diagnostic Messages Guide) supports the software

36

3

5 How well do you think this version of CONCOR would serve for editing other types of statistical censuses or surveys

6 If your organization does statistical data processing by computer would you recommend to your superiors that this software be used to edit the data Do you feel assistance would be required in the installation of this software on your organizations computer andor in training userR on the packages applicability to their work

37

7 In what way(s) could improvements or enhancements be made to this version

of the CONCOR software andor its documentation

8 Other comments

1shy

APPENDIX F

NEWOLD COMMAND COMPARISONS

0 0

OLD CONCOR VERSION 1 DECEMBER 1978

DATA-DICTIONARY

DEFINE INPUT-FILE

DEFINE ERROR-FILE

DEFINE COMMON-DATA

(contains the questionnaire-IDand record type location information)

DEFINE RECORD-TYPE=

DEFINE NEW-DATA

DEFINE ARRAY-DATA

APPENDIX F

NEW CONCOR VERSION 2 DECEMBER 1979

DICTIONARY-DIVISION

DICTIONARY-ATTRIBUTES-SECTION

DICTIONARY-NAME

FILE-SECTION

INPUT-FILE OUTPUT-FILE WRITE-FILE ERROR-FILE

DOENTIFICATION-CONTROL-SECTION

AREA-CONTROL

QUESTIONNAIRE-CONTROL

RECORD-CONTROL

INPUT-RECORD-SECTION

DEFINE-RECORD

WORKING-DATA-SECTION

NEW-DATA ARRAY-DATA

END-DiVISION

COMMENTS

In Version 1 all command keyword labels had to begin in column 1 ofthe coding sheet The position notation of Version 2 permits thecommand to begin in columns 1-72

The period preceding a statement

signifies that it is but a commentcard Accordingly it should be noted that while the END-DIVISION command must be present the division name identifier DICTIONARY-DIVISION

has not been implemented

Note Some parameters for each commandhave been altered even where general correspondence exists

OLD CONCOR VERSION 1 DECEMBER 1978

EDIT-SPECIFICATION

PROLOG

FILTER TYPE ( )

EPILOG

ALLOCATE (ALL) ASSERT (AST)

END ERROR

FILTER TYPE ( ) WITH

IF LET

RANGE (RNG) RECODE (REC) STOP --keyword

UPDATE (UPD) WRITE (WRT) XRECODE (XREC)

NEW CONCOR VERSION 2 DECEMBER 1979

EXECUTION-DIVISION

REPORT-CONTROL-SECTION

COUNT-IMPUTES GENERATE-EDIT-STATISTICS

EDIT-SPECIFICATION-SECTION

PROLOG-ROUTINE

FILTER-ROUTINE (name-of-record-type)

EPILOG-ROUTINE

ALLOCATE (ALLOC) ASSERT (ASRT) DRECODE (DRCD) END-DIVISION

EXIT

GRECODE (GRCD) IF LET LIST OUTPUT RANGE (RNG)

STOP UNTIL UPDATE (UPD) WRITE (WRT)

END-DIVISION

COMMENTS

Periods preceding division and section names signify that they are treated as comments by CONCOR

END-DIVISION signifies the termination of the CONCOR division organization and must always be present even though division identifiers have not been implemented

This figure additionally permits a comparison of the EDIT-SPECI-FICATION commands of both CONCOR language versions Note the changes in abbreviations and that some keyword options (not shown) have been altered even where general corresponshydence exists Individual commands are discussed on the following pages

CONCOR LANGUAGE COMMAND STATEMENTS

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

ALLOCATE (ALL) ALLOCATE (ALLOC) Allows for the assignment of values and with the UPDATE command providesthe imputation mechanism Major difference is this statement now providesfor a receiving-identifier list and statistics are generated as coded inthe GENERATE-EDIT-STATISTICS command option of the EXECUTION-DIVISION

ASSERT fAST) ASSERT (ASRT) The ASSERT command is the basic consistency editor command of CONCOR Command now provides for a NOERROR and MESSAGE option Additionallyby utilizinq the DUMPR DUMPO keywords provides the user with the option of using all the current value of all user-identifiers defined for the record type being processed Other changes to this command include the pass end-pass fail end-fail constructions for the P purpose of maintaining structured logic

(see RECODE) DRECODE (DRCD) Provides a method for recoding data items having a starting value of zero and continue in ascending sequence Replaces t previous XRECODE command

EXIT Provides the means to terminate EXECUTION (ESCAPE) of command statements within the UNTIL command

END END-IF END-THEN

A solitary END command is no lonqer permitted in conditional construction Note the similarity to COBOL pseudocode

END-ELSE

ERROR not implemented No longer supported in language

(continued)

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

FILTER TYPE ( ) WITH not implemented No longer are option statements supported in language This command once specified essentially a GOTO depending on construction

(see XRECODE) GRECODE (GRCD) This new command provides a method for the group recoding of input values

IF IF The IF statement has properties found virtually in all programming languages ie provides a means of evaluating a condition and controlling the execution of alternative logical statements Similar to pseudocode a CONCOR programmer must utilize the structured command IF THEN END-THEN ELSE END-ELSE construction in place of the IF THEN END ELSE END statements acceptable to the 1978 version

LET LET Virtually unchanged

LIST New command which allows for the generation of specific messages to be displayed in the report of edit statistics by questionnaire as well as specific individual identifiers

OUTPUT New command which provides the means by which records will be written to the output-file

(continuej)

4)bull

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 VERSION 1

RANGE (RNG)

RECODE (REC)

STOP- keyword

UPDATE (UPD)

(continued)

DECEMBER 1979 VERSION 2

RANGE (RNG)

name changed

STOP- keyword

UNTIL DO END-DO

UPDATE (UPD)

COMMENTS

As a basic editing command of CONCOR the December 1979 version permits the listing of the out-of-range values as well as the entire record or questionnaire through DUMPR and DUMPO optionswithin the command Another new option of this command involves the utilization of a PASS END-PASS FAIL END-FAIL construction permitting the specification of logicalcommand paths depending upon the outcome of the range test

Name changed to DRECODE Virtually identical with the DRC command in the COCENTS system

Unchanged

DO-END-DO one of the most significant and powerful enhancements to the CONCOR language this command provides a loopingcapability similar to most other high-level pogramminglanguages The number of iterations is under user control as well as the value initially assigned to the user identifier (counter) which is incremented with each successive execution of the command statements

Virtually unchanged it is the basic mechanism for maintaining arrays involved in imputation processes

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

WRITE (WRT) WRITE (WRT) The WRITE language statement while once utilized as the primary output command is now used to create an auxiliary or derivative file Statistics may be accumulated with the specification of the GENERATE-EDIT-STATISTICS option of the EXECUTION-DIVISION (See WRITE-FILE of DICTIONARY-DIVISION)

XRECODE (XREC) name changed Old command which has become the GRECODE command in the December 1979 version

0 0 0

DECEMBER 1978 VERSION 1

DECEMBER 1979 VERSION 2

Not implemented REPORT-DIViSION

DISPLAY-CONTROL-SECTION

DISPLAY-EDIT-STATISTICS

TOLERANCE-CONTROL-SECTION

ERROR-RATE-CHECK REJECT-FILE

END-DIVISION

COMMENTS

This new division (note division and section names treated as comments) provides the means to controlthe type of edit statistics reports to be generated Generally all reports produced by the commands ofthis division are organized according to the AREA-CONTROL command of the DICTIONARY-DIVISION Notethat this means the input data file must be preproshycessed (presorted external to CONCOR) on the basisof the AREA-CONTROL data field as CONCOR is incapableof merging uncontiguous records

There is currently no command to enable users to specify unique report headings on the listingfrom this division

APPENDIX G

ISPC FUTURE ENHANCEMENT LIST

G-1

Appendix G

Future Versions of CONCOR Design Considerations

The present version of the CONCOR system (Basic COBOL Version 2) was designed to provide adequate computing power to properly edit housing and population census da a using automatic correction techniquns As a result CONCOR should not be construed to be the ultimate software package for generalized data editing The developers of this version of CONCOR realize that there is a group of improvements that can be made to the system which would facilitate the census data editing process A second group of changes or enhancements also exist that could facilitate the editing of survey or other types of census data It should be noted that some of the modifications to meet these two objectives at times are mutually exclusive Some of the possible modifications and considerations are outlined below

1 Housing and Population Census Processing

1 1 Software

1111 Dictionary Division

(1) Implementation of the Dictionary Attributes Section This would allow the user to specify the dictionary size (number of pages) on disk and control other file characteristics such as volume specification and retention periods if applicable

(2) Addition of an optional input andor output file(s) which could be used to initialize and save values stored in arrays and used during the allocation process This feature would require additional syntax analysis and the creation of new commands to perform the movement of the data values

(3) An output record format area (Output Record Section) could be added This would allow the creation of edited output files in a different format and length than the file read into CONCOR This implementation would require other enhanceients to provide a simple means of moving values from th-iT i tato tile output file Unique naming of user-identifiers would also need to be addressed Another possiblity is the declaration of both the input and output location in the same command when specifying the contents of tile data record to be edited

(4) Addition of user-identifier naires to fields used in the controlling of summary statistics and unique questionnaire determination This would require modification to the system level data dictionary

46 G-2

(5) Increasing the number of area and questionnaire control fields

This implementation would require modification lo the

dictionary as tell as to the error records passed to the Report

Division Another problem would be the formatting of the extra

data values in the report headings

(6) Permit the mixing of difFerent data item types in the CONTROL

commands This would allow the greatest flexibilty in choosing

control fields In order to implement this enhancement the

dictionary and the generated COBOL program (EDITOR) would

require modification

(7) Give the user access to the values in the control fields during

the editing process Either the user would be prohibited from

changing these values (as is currently done with CONCOR

internal identifiers) to prevent the creation of new questionnaires or the summary information gathered on a

questionnaire basis would become meaningless Another

ramification of this enhancement is how to handle fields defined as alphanumeric as CONCOR internally works in binary

(8) Allow input data records with a similar format to be defined by

the same DEFINE-RECORD command statement This could be

accomplished by permitting multiple values in the RECORD-TYPE

clause of the DEFINE-RECORD command A possible problem here

is in the determination of which value to be move to the output

record when the record is to be written out using the OUTPUT

command 0

(9) Implement a COMMION-DATA concept to handle items command to all

record types This could be costly in terms of core storage

and additional conversion time if a storage area is allocated

for each identifier for each record type Only permitting one

common area for all records would require validation of the

data to ensure the values were exactly the same on each record

This wJould require the CONCOR program to perform many more

internal checks when reading the data file into the store area

(10) Permit the selective inclusion of data items found in the

DEFINE-RECORD command statement into the universe count used

for the determination of tolerance levels when using the COUNT-IIPUTES command statement

(11) Increase the number and types of data format referencing now

permitted External numeric data (type N) could be expanded to

handle 18 digit numbers Also signed numeric and packed

decimal data format could be added The signed numeric format

would allow both leading blanks and negative data values This

format is essential ihen dealing with data files generated by

FORTRAN programs

0

+

G-3 47

(12) Increase the number of comparison strings permitted for alphanumeric data items This would allow for easier processing of alphanumeric strings but would require modifications to the system data dictionary

(13) If a range of valid values were specified with the declaration of a user-identifier in the DEFINE-RECORD command an automatic range check could be performed prior to CONCOR giving the user access to the data items on the questionnaire The inclusion of such values are now done by many users in the form of comment statements This enhancement would also facilitate the use of the dictionary by tabulation software

(14) The language format used in the Dictionary Division could be modified to recognize both the space and comma character as valid delimiters The user would still be required to code two commas to receive the system default value for any parameter

(15) Provide a repetition factor in the initialization of arrays

(16) Print the data dictionary name in the titles of the dictionary documentation or provide a means by which the user may specify a title to appear in the listings

112 Execution Division

(M) Implement the Run Control Section This would allow the user to increase execution speed of the EDITOR program This would be accomplished by eliminating some of the system protection features such as division by zero and out of bounds checking In this secion the user could also specify new maximum values for the EDITOR error limits

(2) Provide an internal identifier (OCCURRENCE-PTR) which would contain the occurrence number of the record currently being filtered The only problem to resolve is how to treat this id-ntifier outside of the Filter Routines

(3) Implement a CREATE commnand This command would allow the user to create a record that Was not found on the input file This implementation would require special care as it gives the user the ability to change the valme of a CONCOR internal identifier (TYPE-CGUNT) and modify the contents of the store Another consequence of this command is what action is to be taken in the calculation of tolerance in terms of this new records inclusion into the universe of observations and number of changes

iK

48 G-4

(4) The code generated in the EDITOR program for the DRECODE

command should be modified to utilized a lookup table approach

This would decrease the execution time of the command but woul C

require additional communication between the GENED and GENDD

programs

(5) Permit the user to continue the message text field over

successive lines of code

(6) Implementation of a SET command command that would change

CONCORs default occurrence pointer for the duration of the SET

command This command would be ever helpful as the validation

of the existence of a record would only have to be done once

when the SET command was encounterd The major drawback to

this command is in the users ability to remember the action

taken during the last execution of the command when the

Execution Division command statements may cover many pages of

code

(7) Implement a method of specifying different data formats for

fields on the WRITE command statement This would give greater

flexibility to the user but would require major revision to the

routines now used to process the WRITE command

(8) Provide the user with two sets of internal identifiers The

first set currently exists in the system and is valid on the

basis of the current control area (if specified) being

executed The second set would be accumulated on a run total

basis and would require the creation of new CONCOR internal

ident ifiers

(9) Implement automatic array declarations for capturing allocation

frequency distributions This could be done by the system for

discrete values (especially if point number 13 above is

implemented) but a mechanism for handling continuous values

would need to be designed nother program would then be

required to format the information gathered into a readable

form

113 Report Division

(I) Provide a means by which the user may specify their own headings or titles -for the reports

(2) Give the user the ability to override page ejection as a means

of saving paper

(3) Provide a neans by which the statistics gathered may be

presented in other aqgregations other than the area-break and

total levels now pro ided This would require another

procedure to aggregate the statistics and pass them onto the

Report Division before the printing of the reports began Thi

extra procedure is required because of program size

cons i derat ions

G-5

49

114 System Level

(1) Generation of assembler (ALC) language code for the EDITOR program to be used on the IBM 360370 series type computers

(2) Modification of the GENSRC program i utilize a lookup table instead of the linear search technioa2 now used

(3) Modification of the RDMSGTXT program to look at continuation lines when searching for the text to be supplied by the system

(4) Hlodification to the system data dictionary to better arrange and make available information needed most frequently Also to examine the possibility of making the section dealing with the message text and report generation a separate file Investigate the industry standards for dictionary format and see how CONCOR meets the requirements set forth

(5) Make the parsing and syntacial routines interactive so as to increase the product ity of the CONCOR user This does not mean to imply that the editing process itself should be made interactive but rather only the development of the user code should be put online

12 Documentation

(1)The development of a self-teaching CONCOR manual

(2) The development of a CONCOR Users Guide

G-6

0

50

2 Survey or other Census Processing

21 Software

(1) Make optional the specification of the record type and questionnaire identification information as some files are not

hierarchical in nature

(2) Provide another input file as a means of doing a check-in procedure of sample cases expected This new input file would contain the master list of those oberservations selected to be

examined

(3) Allow floating point calculations

(4) Provide a mechanism for referencing repetitive sections of questionnaire This referencing (loop letter approach) could be accomplished in the same fashion as interrecord checking is now implemented Note that by using this approach two versions of the software would be required One version would be as CONCOR is now implemented and the second would accept flat data files where the subscript indicated a repetitive section on the survey document

(5) Add a weighting scheme to allow meaningful totals and summary

statistics based upon the sampling frame used to gather the data being edited

(6) Increase the number of user-identifiers allowed in the system because of the more detailed information found on survey forms

(7) Provide a means of manual correction of erroneous data items This correction scheme would utilize the data dictionary and allow for correction based upon the user-identifer name and

questionnaire identification CELADE has a COBOL version of

this program but it has not been finished nor tested

42

APPENDIX H

CONCOR SYSTEM INTERNAL VARIABLES

0 APPENO H

CONCOR SYSTEM INTERNAL VARIABLES

1 EOF-FLAG

2 ERROR-TN-QUESTIONNAIRE

3 CURRENT-POINTER-VALUE

4 TYPE-COUNT ( )

5 RECORD-COUNT

6 QUESTIONNAIRE-COUNT

7 RECORDS-IN-STORE

8 INVALID-RECORD-TYPE-COUNT

9 INCOMPLETE-FLAG

10 CONTINUATION-FLAG

11 INVALID-RECORD-FLAG

EOF-FLAG

ERRORS-IN-QUESTIONNAIRE

TYPE-COUNT ( )

IN-RECORD-COUNT

IN-QUESTIONNAIRE-COUNT

RECORDS-IN-STORE

INVALID-RECORD-TYPE-COUNT

INCOMPLETE-FLAG

CONTINUATION-FLAG

OUT-OF-RANGE

NOT-NUMBER

BLANK

(new internal counters) OUT-RECORD-COUNT OUTPUT-QUEST-COUNT WRITE-RECORD-COUNT

Change in spelling of variable

Not mentioned--possible ommission from manual

Not implemented

Note Current documentation equivocates when some variables

are reset--most are initialized with each control area break

Though implemented as reserved identifiers due to poor documentation of (old) CONCOR exact values and utility were unknown

These apparently new counters permit access to valuable totals within control areas

Note These are not cumulative counters Also outputshyquest-count violates naming conversion eg in out

APPENDIX I

CONCOR-EDITOR EXECUTION STATISTICS

C 0 N C o R - E D I T O R EXECUT ION S-TA-T1 ST I CS RUN DATEt 517I

APPENDIX I

CONTROL AREA 0 INPUTS RELD OUTPUTS BY OUTPUTCOHMAND OUTPUTS BY W0ITECOMP4JD NO VALI 0 SEQUENCE NUMBER 00000 00 0 RECTYPES 0

OUESTIONNAIRES RECORDS 0 QUEST KEYWORD- 0 ECORDS- RECORDS FORQUFSTshy

bullDIVISION BY ZERO LINE NUMBER i18 to DIVISION BY ZERO LINE NUMBER 118 bo DIVISION BY ZERO 00 LINE NUMBER 118

Oboo DIVISION BY ZERO 00 LINE NUMBER 118 LTNE |UJMP 000 DIVISION BY ZERO 000 LINE NUMBER 118

bo DIVISION BY ZERO 0 LINE NUMBER 118 00 RUN ABORTED DUE TO DIVISION BY ZERO ERRORS 00 1it LIST IR003 = Ro3t

1 6 42

112 LIST

Comments One of the most sever criticisms 114 LIST IDIVISIION BY ZErO ERROR Cw FOLLOWS of the old CONCOR package was that ]l LET RO01 003 = O it would cease processing for no

apparent reason--ABEND without generating any messages The most )16 LIST iRO01 R003 SET TO ZEROS Pou19 Ron3 conmon types of cessations are division by zero array coordinate 117 LIST RO0 0 = errors and user specified terminations A program to deliberately 118 LET P003 = R002 RO01 test CONCORS ability to providediagnostics was written The system 119 LIST LIST R003 R002RO01 =1 R0031 perfoned exceptionally well and correctly provided the line numbers in the 120 source program which caused the data exceptions The program text has been 121 superimposed upon the Execution Statistics page for illustration purposes 122 END TEST OF OPERANDS AND MSG

123

124

1 5 PEGISTERS RESET TO ZERO

125 LET R001 RO00 R003 = nt

0 N C 1 - Z D I T 0 it - ( r S T A T I S T I C S QJi DAT 0119

I ~ - ~APPENDIX b)APPENDIX TL I A Cr14A) 01 3TUTS JRtTF -0 VtLIQ AP-A I PUT R - 3ITD JTS LY UTPUT - Y C

I)ESTIjAAIES - PECLRW cU] I S T igry-t QEC UP )S irrkqS F] ) T

VLLII= FF= 15( C 1 u U-1 Tr JJ TI A)[ 9 )i 1= TI =WATE ) L i - U = VI L IJ) L T J3r-= I n

D 1 4A 1 EItR P IXSTI J A 1 L = 7 II I fIAT E I i Y C

= y 4 I 19f CZ U AT -Z-2j 1ESTIJh14 E CJ)PI) IATE= 2 V L 1E= L T I P1 =UM 11

= rU F r

F R L- -- I JE TI rjA OrPR eTE= rUMiF=E II V1 LIIF= LIT V M

tYfZU ttj gR R QUIESTI3Ii1 j Vi r IATE= LI n l FJw== 0R V J L T C ( -pJIT E~Q - cUESTT Y2ZjI 1= g rfiflUI)IATL= VeLUJE= P LIf PP= 1 3)

bull~ - A ED 011 1 Tn CCy -tENCE C rGR DI T FOR UP S )r 0 JCCUJ0EICF FJ71 ECGPO

173

175 UNTIL ROO1 gt 10 VARY P001 FROM 1 By Comments

In this example RO01 a 2 dimensional array 177 DO with 10 rows and 10 columns is attempting

178 UNTIL R002 - 10 VARY BO FROM 1 BY 1 to update the smaller (5 X 5) TA2 Nonshy- Uexistent element references caused the error

The program text has been superimposed on this

page for illustration purposes

180 ALLOCATE NOOP = TA1 (RO0] ROO)I

JAI UPDATE TA2 (RO01 R002) RO031

182 LIST N002 = R0037 NOO R003

1P3 LIST VALUES OF TA1 = RO03v TAl (RO01 R002)

184 LIST

185 LIST ITA2=TA2 RO01 Rn02)l

l6 END-DOI LIST

187 END-DO

deg r- -- shy

APPENJDIXI1 J( j

CONTROL AREA INPUTS READ OUTPUTS BY OUTPUT LOMANO OUTPUTS TEltCOMMNDfN

SEQUENCE NUMBER 11T A QUESTIONNAIRES RECORDS QUEST KEYWORD RECORDS SFf- 0 T rr-

JOB STOPPED DUETO USER STOP-RUN COMMAND ON LINC 215 - - -- 2 15 _ _ 4$ A

EDITOR -- NO RMAL END OF J B r T I1)

214 IF IN-RECORD-COUNT gt Comments In this example when the internal variable1215 ITHEN STOP-RUN END-THEN was than

215 TE END-THEN IN-RECORD-COUNT was greater than 500 processing was directed by the source COBOL CONCOR language

216 1 program to cease The program text has been - upon this page for illustrationI _-EC superimposed

S IF IN-RECORD-COUNT- lt 2 - - purposes ~~~~~~~~~~- bull - - ------I-

218~ THEN -LIST lt(THAN 20RECS INR

amp-~~~~~ ntai-beDcent

I

APPENDIX J

DIAGNOSTIC MEFSAGE GUIDE EXAMPLE

APPENDIX J DD-2

WARNING(DD-001) BLANK COMMAND STATEMENT LINE

EXPLANATION A blank command statement line was found in the user Dictionary Division command statement file

ACTION TAKEN Parsing continued with the next command statement line in the Dictionary Division command file

USER RESPONSE Put a comment indicator () in column 1 of the command statement line if spacing is desired if not remove the command statement line

ERROR (DD-002) ILLEGAL FIRST CHARACTER IN STRING

EXPLANATION A character not in the CONCOR legal character set (A-Z 0-9 - + the comma character () the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found as the first character in the string

ACTION TAKEN Parsing began again with the next string

USER RESPONSE Check for possible keying errors and ensure that the first character is a member of the valid CONCOR

character set

ERROR (DD-003) END OF LINE REACHED WITH NO DELIMITER FOR ILLEGAL STRING

EXPLANATION A string starting with a character not valid in the CONCOR legal character set (A-Z 0-9 - + the comma character ()p the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found without one of the delimiter characters ( =) when the end of the command line was reached

ACTION TAKEN Parsing began again with the next user command statement line in the Dictionary Division command file

USER RESPONSE Check for possible keying errors and ensure that each of the characters in the string is a member of the CONCOR valid character set (A-Z 0-9 - + the comma character () he delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ())

ERROR (DD-004) COMMENT INDICATOR () IN STRING

EXPLANATION The comment indicator character () was found embedded in a string

Page 6: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …

2

contained in Appendix D During the concluding days of the workshop each participant was asked by ISPC to provide a written evaluation of the now-called December 1979 version of CONCOR This evaluation form Appendix E also inshycludes space for comments concerning the competence of the system documentation as well as any additional comments including these regarding the organization and clarity of workshop presentations It is assumed that in the near future summaries of these comments will be available to interested agencies

While virtually all instructional aspects of this two-week workshop were conducted in a highly professional manner -- a manner which revealed a high degree of coordination among staff members in their efforts -- there are several areas which future workshops may improve upon

1 All publications should be assembled in their entirety and proof-read prior to distribution

2 A complete CONCOR language program example and accompanying 110 documents should be provided at the onset of the workshyshop for reference

3 Numerous short application programming problems involving all CONCOR language divisions should be utilized in place of a single lengthy problem

It is noted that this workshop was not intended to teach the CONCOR language as the organization and presentation of materials probably would have been different It is believed that the two-week time period was sufficient time to provide participants a familiarity with the use of the new CONCOR features especially in light of the fact that workshop participants were permitted to work weekends and beyond normal working hours at their disshycretion Though funding was not generally available it is known that several workshop members chose to extend their stay inWashington to continue testing the COMCOR package or to work on projects which they could attempt to immedishyately install on their home computers At the conclusion of the workshops participants were permitted to take with them an installation tape of CONCOR as well as all the other materials they had acquired during the course of the project

3

II THE ADEQUACY OF CONCOR

CONCOR has been described by its designers as an adequate packageAdequacy as an evaluative criteria is often relative to need and should not be confused with readiness as an issue The CONCOR system exclusive of documentation is sufficiently corplete that in a situation of extreme need it could be used as a data-cleaning tool in the editing and imputation phaseof census processing Less extreme circumstances would impose reticence on such an endorsement Though non-exhaustive tests indicate that CONCOR appearsto be capable of performing all of the commands as implemented because of the rapidness with which the system was rewritten it is thought that there has not been enough time to fully test all aspects of the project Thereforeprior to its general dissemination it is recommended that an independent agency conduct exhaustive tests to certify the integrity of the system proshygrams The importance of this certification cannot be understated in lightof previous workshop experiences Concurrent with this testing process the same agency should determine the relative speed and size of the system under actual production circumstances and further determine CONCORs ease of nstalshylation Later sections of this discussion set forth additional testing recom endations

It is generally recognized that of all the data-cleaning tools available for exportation CONCOR is potentially the most powerful especially with the addition of its new commands as outlined in Appendix F While its utility is not in doubt one must ask the question of how much more useful could CONCOR be if modified and would this additional utility be worth the costs involved The nature of modifications (excluding documentation) to COBOL CONCOR approprishyate at this time for cnsideration are threefold

1 Adjustments to the elements of the system which are internallyinconsistent or awkward to facilitate its learnability and usability am ig developing country programmers

a Implementation of the division headings DICTIONARY-DIVISION EXECUTION-DIVISION and REPORT-DIVISION De-emphasis of section headings

b Improvement of the consistency among data identifiers allow alphanumeric variables to be coded without mandatory comshyparison strings throughout the DATA-DIVISION and to be of the same length of numeric variables Permit numeric identifiers to be of an equal length to NEW DATA identishyfiers Permit the coding of single dimension row and column vectors in the same manner as multi-dimensional arrays

2 Implementation of selective commands and internal variables to facilitate the production environment use of CONCOR in census applications These include

4

a LOADUNLOAD arrays Commands which would save and replace automatically hot-decked values from batch to batch

b TOTAL-QUESTIONNAIRE-COUNT-RECORD-COUNT internal

variables independent of AREA CONTROL

3 Other modifications

a Default values for max-storage parameter set in realistic range

b Allowance of more variables for survey applications

Some of these modifications are part of what ISPC calls its wish list for the future development of CONCOR This document has been included in this report as Appendix G It is arguable that these features are essential to the completion of the CONCOR package While it is beyond the scope of this report to draw a conclusion in this area the enhancements as outlined above are ones that would make the language more internally consistent and thereby easier to learn and apply to a census data production environment These modifications are not arbitrary or cosmetic but are a direct result of handsshyon programming experience in the language as well as observations and disshycussions with other workshop participants While it is probably impossible to ever be satisfied with the overall structure of any programming language the resolution of this issue of completeness must be made relative to the objecshytives for developing the COBOL CONCOR system in the first place An explicit statement of these objectivEs has been absent in all systems documentation to date

5

III PROPOSED CHANGES TO THE LANGUAGE STRUCTURE

Based upon the assumption that it is the intent uf sponso-ing agenciesto optimize the COBOL CONCOR package -- a goal which is believed currentlyobtainable -- an understanding of the nature of these changes and how theywould impact users is essential Appendix F sets forth in a comparative manner differences between the old December 1978 and the new December 1979 editions of CONCOR Studying this appendix obviates the fact that while the new version of the language is clearly superior to the old in nearly everyaspect the basic and overall structure of the language is essentially unshychanged Compartmentalization of aspects of the language into divisions represents a significant ideological enhancement to the language Indeeddevelopment of programs by divisions proved to be an extremely useful way of understanding the nature of editing work to be performed However note that while the END-DIVISION comnand is essential to the language the division headings DICTIONARY-DIVISION EXECUTION-DIVISION and REPORT-DIVISION were not implemented and are therefore preceded by a period to be treated as comment lines in the program listing It is inconsistent to implement END-DIVISION commands while not implementing the division headings It is believed that this division structuring is important enough to the overall organizational structure of a CONCOR source language program that it should be implementedprior to general distribution The section headers shown on the figures inAppendix F however are another matter They are cumbersome and were generally not coded by workshop participants and they could be deleted from this version of the language altogether with little loss in organizational understanding The CONCOR language is significantly powerful to stand on its own as a distinct product and is not meant to be a COBOL imitation its present degree of development and specialization do not warrant the structural drag of additional section identifiers is the probable intent of the originalCONCOR project was to develop a package which was uncomplicated and unwieldy to use The question of division and section names implementation while seemingly cosmetic can have real impact on its perceived easiness of learning and use

Figure 1 on the following page illustrates common mistakes programmersmake coding numeric and alphanumeric variables in the DATA-DIVISION These mistakes are the result of the inconsistent variable formats For instancein the numeric data definition statement it is permissible to specify 19-23 where N signifies numeric 9 signifies the length of the item and 23 specifiesthe starting position in the record In NEW-DATA however it is possible to code an item with a maximum length of 18 While on the surface this inconshysistency would seem harmless typically some data defined user variables in NEW-DATA defined N18 could be moved inadvertently to output record fields defined by a data definition statement 119-23 such an action would result in a data error Under certain circumstances itwould be highly desirable to output these larger length values A similar circumstance exists between the numeric and alphanumeric data coding conventions While the maximum lengthof the numeric is permitted to be 9 in the data definition statement (18 in NEW-DATA) the maximum alphanumeric variable is permitted to be only 4 characters in length In the current systems manual it is recommended that

FIGURE 1

DICTIONARY-DIVISION

DICTIONARY-NAME DATA-CODING-EXAMPLE

INPUT-FILE

OUTPUT-FILE

AREA-CONTROL N2-2 N2-4 N3-6 N2-9 N9-23 QUESTIONNAIRE-CONTROL A4-2 A3-6 A2-9 A3-11 A3-14

RECORD-CONTROL Al-l

DEFINE-RECORD

HOI-TYPE-OF-HOUISING-UNIT Nl-17

H02-MATERIAL-OF-ROOF N1-19 10 9

H03-TOTAL-PERSONS-IN-UNIT N8-40 NOT-NUMERIC BLAIK

1104-STATE-OF-UIIII-CODE A4-50 0 U 1 D

DEFINE-RECORD

P01-SEX 1-13 W F

NEW-DATA

NOI-SAVE-TYPE-OF-HOUSING-UNIT

N02-SAVE-TYPE-OF-ROOF 1

N03-COUNT-TOTAL-IN-UNITS 10 0

N04-AGGREGATE-INCOME 18 0

END-DIVISION

Explanations

N2-4 This is an example of an external numeric input data item (N) with a length of 2 bytes starting in column 4 of the input record The maximumlength of this type of variable outside of NEW-DATA is 9 When coded in

NEW-DATA 18 is permitted

A4-2 This is an example of an external alphanumeric input data item (A)

with a length of 4 bytes starting in column 2 of the input record This

construction for alphanumeric variable is valid only in the control stateshyments Additionally it can never be over 4 bytes in length When alphshynumeric data fields are defined within record types the EDITOR program

requires that the comparison strings always be specified A maximum of 3 is permitted The purpose of these strings is to force recode the data to a numeric value If no match is found EDITOR automatically assigns a unique negative value to the field

7

alphanumeric coding be utilized in the QUESTIONNAIRE-CONTROL and RECORD-CONTROL statements where each input data item must be of the same data type as shown in the example When alphanumeric data variables are used in these control stateshyments their construction is identical to that of numeric items However when used elsewhere in the DATA-DIVISION alphanumeric variables are required to specify one of three possible comparison values as shown There are number of production instances when it never would be necessary or even desirable to reshycode alphanumeric data However as CONCOR attempts to force data into a totally numeric format upon output there is no current way to preserve these values if desired

An unwieldy alternative to this situation which may be acceptable under some circumstances would be the expansion of the number of comparison stringsfrom three to a more realistic number The limitation of this compromise is that a full twenty-six comparison identifiers would be required in order to accommodate data which utilized the entire alphabet A better solutionhowever would be to make the general format of the alphanumeric variables identical to that of numeric identifiers ie A9-23 and to permit alphashynumeric values so defined to pass unaltered through the CONCOR system

Anocher data-naming convention which caused several errors and which could be corrected concerns the array data definitional statements While arraysof two and more dimensions are handled in a superior manner by the CONCOR proshygram single-dimension arrays pose a problem in coding as shown in the Figure 2 It is suggested that the command imperatives be changed to permit the codingof both rows and columns in single dimension arrays ie allow a single row vector as well as a single column vector to maintain the consistel -yof the array data definitional statements

A major requirement of COBOL CONCOR file processing concerns the fact that all related data records must be physically contiguous on the input file The implication of this requirement is that files may require preprocessing prior to actual data editing (This preprocessing is usually a sort routine upon a selected CONTROL-AREA key) While this type of processing merely introduces a new step in file processing a major limitation becomes apparent when a largenumber of DISCRETE DATA files of the same census or survey questionnaire are to be processed This limitation is the introduction of manual steps to save the most recent inputed values ie preventing the program from startingwith cold values each batch run If a command such as LOADUNLOAD ARRAYS was incorporated into the language (an enhancement not believed to be difficult to implement) manual processing would be reduced to a minimum between batches and the maximum benefits of the hot deck methodology would be realized It is envisioned that such a command would automatically insure the transfer of the appropriately designated hot values Automatic processing of this nature if done correctly can greatly reduce the time required to clean multishyvolume files for once CONCOR language statements have been compiled linked

While it is possible at this time to save the arrays that amp-e used in the imputation processes on a separate write-file right now it is not possibleto automatically load those values back to an object program and to iTmedishyately resume processing on another volume It isbelieved that suh an automatic feature of the language would cut down the manual processing time significantly enough that it warrants inclusion into the package prior to its general distribution

FIGURE 2

A05-DI FF-BETWEEN-AGE-OF-FEMALE-BY-RECATION v2 4 4

AGE iF LHUSBAND RELATION Connents The ARRAY-DATA command statement

12-L7 18-24 25-35 36+ provides the means to declare array identifiers

2 1 3v 4v HEAD with up to five dimensions Current documentation 2 -1 3 CHILD is not as explicite about the rules of this 1 31 -2 -4 OTHER command as is desirable The parameters of I 2 2 2 NONRFLATTVE the command should function as follows

user-identifier number of dimensions D R C M number of rows number of columns

magnitude of element intiial start up valuesA06-DI FF-BETI4EEN-ArE-DF-PERSON-AND-M4OTHER 114

(This coding generates the below In the example A05 is a two dimensional array 16 18 21 23 error message) with 4 rows 4 columns a default magvitude of 9

and cold deck values as labeled

A06-DI FF- ETWEEN-AGE-OF-PERSON-AND-MOTHER 11t 587

I 2 III jqARNINIGDD-207) COMMAND TERMINATOR I) NOT FOUND C) ASSUMED PRESENT (2) EPROR (DD-9lI) DIMENSION OF USER-SPECIFIED ARRAY IS LESS THAN THE MINIMUM VALUE PERMITTFD (2)

PREEV1OUS DIAGIOSTIC AT CINIE 563

As shown by the array variable A06 CONCORs treatment of vectors is not consistent with the above multidimensional array skeme ie

(Example of how vectors must be currently A06 must be coded as follows coded to be correct)

A06-DIFF-BETWEEN-AGE-OF- PERSON-AND-MOTHERtl42 user-identifier 1 dimension number of elements in vector magnitude of element initial start up values

A simple modification to this command would permit 6 Lthe coding of both row and column vectors and make

16 LB 21 23 this command less error prone

0 0 0

9

and stored as an object module on the system no other compilations should berequired for questionnaire processing files of the same type Theoreticallya single well-written CONCOR program is all that would be required to process an entire census run

Appendix H contrasts the internal identifiers of the old and new languageversions Without such identifiers a user would have little information about the status of input as it is processed by EDITOR As noted in theappendix most internal pointers are reset upon each break in the CONTROL-AREAprovided a CONTROL-AREA has been defined The limitation here is that there are obvious instances when the termination in the processing mode would beadvantageous based on run counts although a CONTROL-AREA has been specifiedeg debugging CONCOR programs or comparing input files Therefore another set of pointers should be implemented for this purpose and made available for programmer reference

One clearly disturbing development which needs to be pursued during inshydepth testing of the system concerns the MAX-STORAGE parameters of the DEFINE-RECORD statement As shown in the figure on the following page when MAX-STORAGE was set equal to the maximum value a COBOL program was generated whichrequired 1O00K of core to run The MAX-STORAGE value of 999 is clearly notrealistic under most processing circumstances This example drives home severalimportant points about CONCOR The core requiremenis of CONCOR generated proshygrams can be influenced significantly by the amount or nature of programmerspecified I0 operations In fact it is possible to generate a program of a size most foreign country machines could not process It is recommended that tests determine a realistic max-value restriction for implementation to prevent problems in this area

The final area of recommended modification concerns the newly implementedREPORT-DIVISION The purpose of the REPORT-DIVISION is to enable a user todescribe or specify certain CONCOR language statements which will generatestatistical reports These reports contain statistics generated by EDITOR as specified by the GENERATE-EDIT-STATISTICS command of the EXECUTION4-DIVISIONAll of the reports produced are organized according to the data fields definedby the AREA-CONTROL command of the DATA-DICTIONARY If the AREA-CONTROL command is not defined in the DATA-DICTIONARY then all the statistics aresummarized at the total run level If a control area field is defined then allstatistics will be summarized for each unique CONTROL AREA as encountered bythe EDITCR program on the input file Statistics by total run level will notbe available This in part relates back to previous discussions citing theneed for new internal identifiers Report listings may contain the values ofentire records or entire questionnaires depending upon the keyword used inthe report generation commands The problem centers upon the homogeneity of CONCOR printouts during a production run

It is virtually impossible to distinguish reports on the basis of thevolumes they were run against Some means should be provided to allow users touniquely and purposefully label the reports generated in this division Indeedthe whole name REPORT-DIVISION suggests that such a command is implicit andappropriate Such a LABEL-REPORT or REPORT-FILE command along with file inforshymation from the system should not be difficult to implement

FIGURE 3

C O N C o R PAGE

SC0 N S I S T E N C Y A N D C-0 R R E C t I ON ) RUN 6AE 0hi88n

E D I T A N D I M P U T ATI 0 N S Y S T EM

USER DICTIONARY DIVISION-SOURCE LISTING

LINE NUMBER

7070

71

72 DEFINE-RECORD RECORDNAME= HOUSING-RECORDi 72

73 MAX-STORAGE=- 999 73

74 RLrRD-TYPE 11t NOTE AN LITERAL 74

267

71

P6T

268 DEFINE-RECORD RECORD-NAME= POPULATION-RECORD p68

P69MAX-STORAGE= 999269

270RECORD-TYPE= F1270

IEF2R5I JES2JOB04966SO0114 SYSOUT iEF2851 SYS8nO18T095553RAOOOAMCP12ARO000005 PASSED

= IEF2851 VOL qFP NOS TmPOn2 IEF285I wHKHDGMCONCORP12AERRORRECS KEPT

= IEF2851 VOL SEP NOS TMPO03 IEF3731 STEP LOAD START 800181001

IEF3741 STEP LOAD STOP 800181002 CPU OMIN 0165SEC SRR flMIN O25SEC VIRT iO00K SYS 3

STEP START TIME - 100155 STEP END TIME - 100257 STEP CPU TIME - 165 SECONDS STEP COMPLETION CODE - C 0

O TAPES - 0 DISKS - 0 PEGION (REQUESTED - 1000K USED - 1000K) IO COUNT 211

CARDS READ - 0 CARDS PUNCHED - 0 LINES PRINTE6 - 22

STEPNAME - LOAD CHARGE (MACHINE UNITS - 1198 DOLLARS- S-77 5TONUM9E 1

11

Concluding Remarks of System Modifications

Version 1 of the COBOL CONCOR language was notorious for abending without the generation of diagnostic messages Appendix I sets forth a test of some of these historical problems In eac) instance the system provided messagesand source program line references wLch enabled immediate identification of the processing difficulties However tests of the various commands revealed the general high performance character of the language in its newest version Under these circumstances the recommended modifica-ions should be considered as ones which will uniformly allow CONCOR to realize t-r potential of its design enhance its utility and moreover endure as a software product

12

IV THE ADEQUACY OF COBOL CONCOR DOCUMENTATION

The current documentation of CONCOR consists of a newly developed relativelyvoluminous systems manual a Diagnostic Message Guide and some loose-leaf inshystructions concerning the installation of the system on IBM computers Of these three documents the Diagnostic Message Guide is clearly complete and adequateand could bear no changes except to provide message number tabs to enable a programmer to more quickly locate the text of a message number An example of the superior format of this guide appears as Appendix J

Conspicuously absent from the list of systems documents is a Users Guide as originally specified in the ISPC enhancement proposal

A thorough assessment of the Systems Manual has been undertaken and indishycates with some reorganization and editing it could be made a more useful and usable document as itis well constituted informationally In its current form the Systems Manual can be best described as a working document a document which was not intended to teach the CONCOR language but centered upon an audishyence already familiar with previous CONCOR releases Further it is not asystems manual per se as some of its components and appendixes could more appropriately compose a users guide For example excerpts from the POPSTAN publications which review the basic editing concepts and contain the POPSTAN example problem should be deleted from the systems manual and would be more contextually effective in the Users Guide The EXECUTION-DIVISION chapter of the Systems Manual would especially benefit from general editing and reorganishyzation (Specifically the contents of pages 90-92 should be moved to page 63 -shythese pages explain the placement and purpose of the three permissible routines of the CONCOR language Had the author been granted more time he could have outlined additional specific modifications to bring existing documentation to a level of adequacy) However some guidelines stand out

1 Installation instruction and considerations should compose a chapter of the Systems Manual rather than a separate document (Hand-out materials covering installation were not adequate) Further a documentation scheme setting forth how CONCOR was installed on each computer and any modifications required during installation could be serially added to and become a permanent part of the installations systems manual If CONCOR is going to be used and if agencies are going to actively support it as a package installation limitations must be known and well-documented The prescribed format for this documentation should be set forth and any modifishycations to CONCOR should be continuously updated on these shcets so that a history of changes to the language will be readily accessible to supporting agencies If CONCOR is to survive as a standardized system editing results must be capable of being replicated among installations Such a form could contain other information as well concerning the tyce of computer amount of care and nature of utilities supporting

13

users Upon installation a copy of this form could be

sent to the US agency which will ultimately be responsible

for supporting the CONCOR package

an appendix2 A complete COBOL CONCOR program should appear in

for reference

3 The development of the Users Guide should include an intensive

review of the editing concepts involved in processing census

data files beyond the POPSTAN materials

4 An explanation of the CONCOR benchmark program syould appear

in the Users Guide and the Systems Manual The running of a

supplied benchmark program should be a standard installation

protocol used to test all operational aspects of a new

installation

This tool which has5 The development of a CONCOR pocket card been proven useful in teaching and assisting programmers in

utilizing programming language lays out all commands options on

a single small card An example of such a pocket card is the

Such a small document is useful in clari-IBM Assembler Card fying the language and setting forth structural rules without

continual reference to full-size manuals

14

V CONCLUSION

In a situation of extreme need CONCOR could probably be utilized as it seems capable of performing its editing functiens Its usefulness as a data

cleaning tool is generally undisputed The importance of having CONCOR exshyhaustively tested by an independent agency cannot be over-emphasized in light of prior failures of the system to perform adequately Additionally core requirements and processing speed should be precisely determined One must be encouraged in the amount of progress which has been made in a relatively short amount of time towards the completion of the CONCOR package While it is clearly beyond the scope of this report to estimate the cost and the amount of time required to implement the herein outlined highly desirable changes to the CONCOR language and documentation and then exhaustively test the system it is believed that this process could be completed within a single quarter (The changes are of the nature that no structural redefinishytions of a systematic nature would be required) These enhancements would make the COBOL CONCOR package more complete consistent and reliable in the field ISPC has competently moved the system to the point that its comshypletion iswithin reach

Given the probable revision of the current Systems Reference Manual and the development of a suitable Users Guide document it is likely that this version of CONCOR could be taught to experienced programmers (unfamiliar with previous versions of the language) in as little as four weeks And while one is impressed by the potential power and overall simplicity of the COBOL CONCOR language it is difficult to envision individuals with no real computer programming experience utilizing its full capabilities Hence the importance of including a review of basic editing concepts in the documentation is undershylined

Further as documentation is often critical to the perceived ease of use of computer-based systems itwill be essential to have these various manuals examined independently prior to their general use

As previously mentioned Appendix G represents features of which the ISPC feels could have positive impact on CONCOR Depending upon the outcome of the testing recommended in previous chapters and the availability of funding supporting agencies may desire to look again at an assembler (ALC) version of the language

Overall COBOL CONCOR remains an intermediate product yet to realize its potential which is thought considerable Software development of this nature is characteristically of a long-term evolutionary design In light of this reality agencies should be prepared to evaluate such projects success or failure in terms other than that of immediate utility It is the philosophy underlying the system not the system itself which will ultimately be exported

APPENDIX A

Bucen Enforcement Proposal

APPENDIX A

BUCEN IN-HOUSE REVIEW ENHANCEMENT PROPOSAL

1 Easy to use interrecord referencing

2 Improved output file capabilities

A provide overflow protection on WRITE command

B provide OUTPUT command to create corrected questionnaire file that matches IP file data dictionary

3 Improvededit statistics reported (LISTERR)

A provide automatic (user-specified) area break

B provide options for compilation and displaying edit statistics at various levels

C provide automatic (user-specified) tolerance checking of error rates by area

D automatically capture IDs of areas failing tolerance check

4 Clean up known bugs in code

5 Comprehensive testing

6 Clean up and enhance documentation

A reference manual more examples error message guide

B installation guide

C systems manual

APPENDIX B

EVALUATIVE CRITERIA

APPEiUIX B [4 si 11 T q tII

UNITED STATES GOVERNMENshy

Pfern0 KGa 7 dll TO DSPOPFPSD Robert H Haladay

DATE December 3 1979

DSPOPDEIO Liliane Floge

SUBJECT Criteria for Evaluation of SuCen COBOL CONCOR September 30thVersion as presented at January 1930 orkshop

The following points should be considered when evaluatingCONCOR the COBOLediting system during participation at the January 19380 workshyshop

1 Are the Installation anual the System ilanual and theUsers anual ready for use Are these manuals adequate fortheir purposes Are the manuals clear and consistentthey be understood by subject-matter personnel Can

as well program ers

as

2 Is the editing system capable of impleaenting all thefeatures as described in the documenit_tion 3 Does the system appc3Ar to be ely debugged andtested and ready for release to dLvlooing conris 4 Does the system appear to be fast enough to edit a

census in a reasonable airount of time 5 hat size core does tine sys t- equire

6 CorFm-ent in general on coole te-nss usefulness andefficiency of ir-nease usethe n of by developingcountry prograrrers and other c nsus personnel

cc Su-a-- e Olds APiA SDi OrWJulio Ortuizar luotolaCelta Systems

APPENDIX C

WORKSHOP ITINERARY

APPENDIX C

CONCOR Workshop Schedule January 7-18 1980

U S Bureau of the Census International Statistical Programs Center

Scuderi Building Room 209 4235 28th Avenue Marlow Heights Maryland

Monday January 7

930 am shy 1000 Welcoming Remarks

Overview of Workshop

1000 - 1000 Introduction to CONCOR - Purpose and function

- History of development - General computer

requirements

115 - 200 System Description

-Constraints in design of CONCOR

-Basic subsystems of CONCOR

-User interactions with system

-Examples of outputs produced

1030 shy 1045 Break

1045 - 1200 Editing Concepts - Ways to interrogate

data - Ways to correct data

- Editing housing and

population data - POPSTAN

- Advantages of CONCOR

1200 shy 115 pm Break

200 - 230

230 - 245

245 - 325

User Program Organization -Divisions - Sections

- Routines - Commands

Break

Command Language

Description -Types of statements -Format -Syntax

is

2

Tuesday January 8

Dictionary Division Command Statements

115 pm-2 15 Input-Record-Section930 am-10

30 Punctuation - Define-RecordInput data referencing

Working-Data-SectionWorking data declara-- New-Datations - Array-DataInternal data represen-

tation and storage Break215 - 230

1030 - 1045 Break 230 - 325 Dictionary Examples

- Minimum dictionary1045 - 1200 Dictionary-Attributes-

structureSection - Maximum dictionary- Dictionary-Name

structure

- Hand out dictionaryFile-Section problem- Input-File

- Output-File - Write-File

- Error-File

1200 - 115 pm Break

Wednesday January 9

115 pm-2 1 5 Execution Division Command

930 am-1030 Discussion of Partici-Statements (continued)pants Dictionary

- Routines of Edit-Specishyproblems fications-Section

Free work time -Prolog-Routine -Filter-Routine -Epilog-Routine

- Types and functions 1030 - 1045 Break

of edit specification

commands1045 - 1200 Execution Division

Command Statements - Range- Punctuation - Assert- Subscripting

- Internal Identifiers - Report-Control-Section

-Generate-Edit-Statistics 215 - 230 Break-Count-Imputes

-Examples 230 - 325 - PassFail clauses

- List1200 - 115 pm Break

3

Thursday January 10

930 am-1030 Discussion of Problems - If115 pm-215

Free work time - UntilExit - Stop

1030 - 1045 Break 215 - 230 Break

1045 shy 1200 Edit Specification Command Statements 230 - 325 - Drecode

(continued) - Grecode

- Allocate - Update - Let

1200 - 115pm Break

Friday January 11

930 am-1030 Edit Specification 115 pm-3 25 Hand out Edit Specishyfications as problemCommand Statements

(continued) Free work time

- Output - Write

1030 - 1045 Break

1045 - 1200 Report Division Command Statements - Display-Control-

Section -Display-Edit-Statistics

- Tolerance-Control-Section -Error-Rate-Check -Reject-File

-Report Examples

1200 - 115 pm Break

I

29

Monday January 14

930 am-1030 Discuss procedures for running problems on computer

1030-1045 Break

1045-1200 Component Programs of the CONCOR system

1200- 115 pm Break

Tuesday January 15

930 am - 325 pm Free work time

Wednesday January 16

930 am 1200 Free work time

1200- 115 pm Break

115 pm-215 How to Install CONCOR on IBM 360370 OS

215- 230 Break

230-325 Free work time

Thursday January 17

930 am-325 Free work time

115 pm-215 Future CONCOR Design Considerations - for census processing - for survey processing

- manual correction system

215- 230 Break

230 - 245 Evaluation Guidelines

- Hand out evaluation forms

245 - 325 Free work time

Friday January 18

930 am-1030 Free work time 115-325 Free work time

1030 - 1045 Break

1045 - 1200 Group Discussion on CONCOR - submission of written evaluations Distribution of IBM installation tapes Presentation of Certificates to Participants

1200-115 pm Break

APPENDIX D

PARTICIPANTS

APPENDIX D

CONCOR WORKSHOP January 7 - 18 1980 Marlow Heights MD

PARTICIPANTS

KENYA

James Midianga Government Computer Centre Central Bureau of Statistics PO Box 30266 Nairobi Kenya

PANAMA

Omar Rivera Rios Data Processing Specialist Contraloria General de la Republica de Panama Apartado Postal 5213 Panama 5 Panama

PHILIPPINES Guida T Capellan OIC Computer Programming Division National Census and Statistics Office Solicarel Bldg 1 Magsaysay Blvd

Sta Mesa Manila Philippines

THAILAND

Angsumal Sunalai National Statistical Office Bangkok 1 Thailand

EGYPT

Farag Sedky Mourad Ghaleb CAPMAS (Centeral Agency for Public

Mobilization and Statistics)NASR City Cairo Egypt

L

2 31

SAUDI ARABIA

Shargi R Al-Shargi National Computer Center PO Box 2534 Riyadh Saudi Arabia

Ahmed S Al-Ghamdi Dept Director of National Computer Center PO Box 2534 Riyadh Saudi Arabia

OTHER

Robert W ONeal USREPJECOR APO New York NY 09038

John N Adams Data Processing Advisor DACCA Department of State Washington DC 20520

OR co UNDP PO Box 224 Dacca Bangladesh

Joe Quasney US Census Bureau Washington DC 20233

Howard Brunsman 5715 N Ninth St Arlington VA 22205

Larry Shiller and Julio Ortuzar Delta Systems Consultants Inc 264 Alhambra Circle Coral Cables FL 33134

Nicholas Ourusoff Burpee Hill Road New London NH 03257 (United Nations)

Frederic J Grant IV Systems Development GeorgiaWorld Congress Institute 580 North Omni International Atlanta Georgia 30303

3 32

Rafael Samper Mary Jo Keenan Catherine B Gleason LACDR Room 2246 NS Washington DC 20523

John Marshall AIDSERDMPSE Rm 706C SA-12 Washington DC 20523

Susanne Bacon and Mary K Friday ISPCData Processing US Census Bureau Washington DC 20233

Leo Dougherty ISPCWorld Census Staff US Bureau of the Census Washington DC 20233

33

CON C OR WORKSHOP

January 7-18 1979 Washington DC

Staff

Robert R Bair

Luis Garcia

David Malkovsky

Sandra Mansfield

Selma Sawaya

Vivian Toro

APPENDIX E

CONCOR EVALUATION FORM

0

APPENDIX E

CONCOR WORKSHOP January 16 1980

BASIC COBOL VERSION 2

December 1979 Release

Participant Evaluation of Package

1 What is your impression of this version of CONCOR in terms of its usefulness and reliability for editing housing and population census data in less developed countries

2 If you are familiar with previous versions of CONCOR how does this current version compare to them

2D

35

2

3 How would you compare the utility of this version of CONCOR for less developed countries with other systems for editing data of which you are familiar

4 How well do you feel the documentation (Reference Manual and Diagnostic Messages Guide) supports the software

36

3

5 How well do you think this version of CONCOR would serve for editing other types of statistical censuses or surveys

6 If your organization does statistical data processing by computer would you recommend to your superiors that this software be used to edit the data Do you feel assistance would be required in the installation of this software on your organizations computer andor in training userR on the packages applicability to their work

37

7 In what way(s) could improvements or enhancements be made to this version

of the CONCOR software andor its documentation

8 Other comments

1shy

APPENDIX F

NEWOLD COMMAND COMPARISONS

0 0

OLD CONCOR VERSION 1 DECEMBER 1978

DATA-DICTIONARY

DEFINE INPUT-FILE

DEFINE ERROR-FILE

DEFINE COMMON-DATA

(contains the questionnaire-IDand record type location information)

DEFINE RECORD-TYPE=

DEFINE NEW-DATA

DEFINE ARRAY-DATA

APPENDIX F

NEW CONCOR VERSION 2 DECEMBER 1979

DICTIONARY-DIVISION

DICTIONARY-ATTRIBUTES-SECTION

DICTIONARY-NAME

FILE-SECTION

INPUT-FILE OUTPUT-FILE WRITE-FILE ERROR-FILE

DOENTIFICATION-CONTROL-SECTION

AREA-CONTROL

QUESTIONNAIRE-CONTROL

RECORD-CONTROL

INPUT-RECORD-SECTION

DEFINE-RECORD

WORKING-DATA-SECTION

NEW-DATA ARRAY-DATA

END-DiVISION

COMMENTS

In Version 1 all command keyword labels had to begin in column 1 ofthe coding sheet The position notation of Version 2 permits thecommand to begin in columns 1-72

The period preceding a statement

signifies that it is but a commentcard Accordingly it should be noted that while the END-DIVISION command must be present the division name identifier DICTIONARY-DIVISION

has not been implemented

Note Some parameters for each commandhave been altered even where general correspondence exists

OLD CONCOR VERSION 1 DECEMBER 1978

EDIT-SPECIFICATION

PROLOG

FILTER TYPE ( )

EPILOG

ALLOCATE (ALL) ASSERT (AST)

END ERROR

FILTER TYPE ( ) WITH

IF LET

RANGE (RNG) RECODE (REC) STOP --keyword

UPDATE (UPD) WRITE (WRT) XRECODE (XREC)

NEW CONCOR VERSION 2 DECEMBER 1979

EXECUTION-DIVISION

REPORT-CONTROL-SECTION

COUNT-IMPUTES GENERATE-EDIT-STATISTICS

EDIT-SPECIFICATION-SECTION

PROLOG-ROUTINE

FILTER-ROUTINE (name-of-record-type)

EPILOG-ROUTINE

ALLOCATE (ALLOC) ASSERT (ASRT) DRECODE (DRCD) END-DIVISION

EXIT

GRECODE (GRCD) IF LET LIST OUTPUT RANGE (RNG)

STOP UNTIL UPDATE (UPD) WRITE (WRT)

END-DIVISION

COMMENTS

Periods preceding division and section names signify that they are treated as comments by CONCOR

END-DIVISION signifies the termination of the CONCOR division organization and must always be present even though division identifiers have not been implemented

This figure additionally permits a comparison of the EDIT-SPECI-FICATION commands of both CONCOR language versions Note the changes in abbreviations and that some keyword options (not shown) have been altered even where general corresponshydence exists Individual commands are discussed on the following pages

CONCOR LANGUAGE COMMAND STATEMENTS

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

ALLOCATE (ALL) ALLOCATE (ALLOC) Allows for the assignment of values and with the UPDATE command providesthe imputation mechanism Major difference is this statement now providesfor a receiving-identifier list and statistics are generated as coded inthe GENERATE-EDIT-STATISTICS command option of the EXECUTION-DIVISION

ASSERT fAST) ASSERT (ASRT) The ASSERT command is the basic consistency editor command of CONCOR Command now provides for a NOERROR and MESSAGE option Additionallyby utilizinq the DUMPR DUMPO keywords provides the user with the option of using all the current value of all user-identifiers defined for the record type being processed Other changes to this command include the pass end-pass fail end-fail constructions for the P purpose of maintaining structured logic

(see RECODE) DRECODE (DRCD) Provides a method for recoding data items having a starting value of zero and continue in ascending sequence Replaces t previous XRECODE command

EXIT Provides the means to terminate EXECUTION (ESCAPE) of command statements within the UNTIL command

END END-IF END-THEN

A solitary END command is no lonqer permitted in conditional construction Note the similarity to COBOL pseudocode

END-ELSE

ERROR not implemented No longer supported in language

(continued)

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

FILTER TYPE ( ) WITH not implemented No longer are option statements supported in language This command once specified essentially a GOTO depending on construction

(see XRECODE) GRECODE (GRCD) This new command provides a method for the group recoding of input values

IF IF The IF statement has properties found virtually in all programming languages ie provides a means of evaluating a condition and controlling the execution of alternative logical statements Similar to pseudocode a CONCOR programmer must utilize the structured command IF THEN END-THEN ELSE END-ELSE construction in place of the IF THEN END ELSE END statements acceptable to the 1978 version

LET LET Virtually unchanged

LIST New command which allows for the generation of specific messages to be displayed in the report of edit statistics by questionnaire as well as specific individual identifiers

OUTPUT New command which provides the means by which records will be written to the output-file

(continuej)

4)bull

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 VERSION 1

RANGE (RNG)

RECODE (REC)

STOP- keyword

UPDATE (UPD)

(continued)

DECEMBER 1979 VERSION 2

RANGE (RNG)

name changed

STOP- keyword

UNTIL DO END-DO

UPDATE (UPD)

COMMENTS

As a basic editing command of CONCOR the December 1979 version permits the listing of the out-of-range values as well as the entire record or questionnaire through DUMPR and DUMPO optionswithin the command Another new option of this command involves the utilization of a PASS END-PASS FAIL END-FAIL construction permitting the specification of logicalcommand paths depending upon the outcome of the range test

Name changed to DRECODE Virtually identical with the DRC command in the COCENTS system

Unchanged

DO-END-DO one of the most significant and powerful enhancements to the CONCOR language this command provides a loopingcapability similar to most other high-level pogramminglanguages The number of iterations is under user control as well as the value initially assigned to the user identifier (counter) which is incremented with each successive execution of the command statements

Virtually unchanged it is the basic mechanism for maintaining arrays involved in imputation processes

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

WRITE (WRT) WRITE (WRT) The WRITE language statement while once utilized as the primary output command is now used to create an auxiliary or derivative file Statistics may be accumulated with the specification of the GENERATE-EDIT-STATISTICS option of the EXECUTION-DIVISION (See WRITE-FILE of DICTIONARY-DIVISION)

XRECODE (XREC) name changed Old command which has become the GRECODE command in the December 1979 version

0 0 0

DECEMBER 1978 VERSION 1

DECEMBER 1979 VERSION 2

Not implemented REPORT-DIViSION

DISPLAY-CONTROL-SECTION

DISPLAY-EDIT-STATISTICS

TOLERANCE-CONTROL-SECTION

ERROR-RATE-CHECK REJECT-FILE

END-DIVISION

COMMENTS

This new division (note division and section names treated as comments) provides the means to controlthe type of edit statistics reports to be generated Generally all reports produced by the commands ofthis division are organized according to the AREA-CONTROL command of the DICTIONARY-DIVISION Notethat this means the input data file must be preproshycessed (presorted external to CONCOR) on the basisof the AREA-CONTROL data field as CONCOR is incapableof merging uncontiguous records

There is currently no command to enable users to specify unique report headings on the listingfrom this division

APPENDIX G

ISPC FUTURE ENHANCEMENT LIST

G-1

Appendix G

Future Versions of CONCOR Design Considerations

The present version of the CONCOR system (Basic COBOL Version 2) was designed to provide adequate computing power to properly edit housing and population census da a using automatic correction techniquns As a result CONCOR should not be construed to be the ultimate software package for generalized data editing The developers of this version of CONCOR realize that there is a group of improvements that can be made to the system which would facilitate the census data editing process A second group of changes or enhancements also exist that could facilitate the editing of survey or other types of census data It should be noted that some of the modifications to meet these two objectives at times are mutually exclusive Some of the possible modifications and considerations are outlined below

1 Housing and Population Census Processing

1 1 Software

1111 Dictionary Division

(1) Implementation of the Dictionary Attributes Section This would allow the user to specify the dictionary size (number of pages) on disk and control other file characteristics such as volume specification and retention periods if applicable

(2) Addition of an optional input andor output file(s) which could be used to initialize and save values stored in arrays and used during the allocation process This feature would require additional syntax analysis and the creation of new commands to perform the movement of the data values

(3) An output record format area (Output Record Section) could be added This would allow the creation of edited output files in a different format and length than the file read into CONCOR This implementation would require other enhanceients to provide a simple means of moving values from th-iT i tato tile output file Unique naming of user-identifiers would also need to be addressed Another possiblity is the declaration of both the input and output location in the same command when specifying the contents of tile data record to be edited

(4) Addition of user-identifier naires to fields used in the controlling of summary statistics and unique questionnaire determination This would require modification to the system level data dictionary

46 G-2

(5) Increasing the number of area and questionnaire control fields

This implementation would require modification lo the

dictionary as tell as to the error records passed to the Report

Division Another problem would be the formatting of the extra

data values in the report headings

(6) Permit the mixing of difFerent data item types in the CONTROL

commands This would allow the greatest flexibilty in choosing

control fields In order to implement this enhancement the

dictionary and the generated COBOL program (EDITOR) would

require modification

(7) Give the user access to the values in the control fields during

the editing process Either the user would be prohibited from

changing these values (as is currently done with CONCOR

internal identifiers) to prevent the creation of new questionnaires or the summary information gathered on a

questionnaire basis would become meaningless Another

ramification of this enhancement is how to handle fields defined as alphanumeric as CONCOR internally works in binary

(8) Allow input data records with a similar format to be defined by

the same DEFINE-RECORD command statement This could be

accomplished by permitting multiple values in the RECORD-TYPE

clause of the DEFINE-RECORD command A possible problem here

is in the determination of which value to be move to the output

record when the record is to be written out using the OUTPUT

command 0

(9) Implement a COMMION-DATA concept to handle items command to all

record types This could be costly in terms of core storage

and additional conversion time if a storage area is allocated

for each identifier for each record type Only permitting one

common area for all records would require validation of the

data to ensure the values were exactly the same on each record

This wJould require the CONCOR program to perform many more

internal checks when reading the data file into the store area

(10) Permit the selective inclusion of data items found in the

DEFINE-RECORD command statement into the universe count used

for the determination of tolerance levels when using the COUNT-IIPUTES command statement

(11) Increase the number and types of data format referencing now

permitted External numeric data (type N) could be expanded to

handle 18 digit numbers Also signed numeric and packed

decimal data format could be added The signed numeric format

would allow both leading blanks and negative data values This

format is essential ihen dealing with data files generated by

FORTRAN programs

0

+

G-3 47

(12) Increase the number of comparison strings permitted for alphanumeric data items This would allow for easier processing of alphanumeric strings but would require modifications to the system data dictionary

(13) If a range of valid values were specified with the declaration of a user-identifier in the DEFINE-RECORD command an automatic range check could be performed prior to CONCOR giving the user access to the data items on the questionnaire The inclusion of such values are now done by many users in the form of comment statements This enhancement would also facilitate the use of the dictionary by tabulation software

(14) The language format used in the Dictionary Division could be modified to recognize both the space and comma character as valid delimiters The user would still be required to code two commas to receive the system default value for any parameter

(15) Provide a repetition factor in the initialization of arrays

(16) Print the data dictionary name in the titles of the dictionary documentation or provide a means by which the user may specify a title to appear in the listings

112 Execution Division

(M) Implement the Run Control Section This would allow the user to increase execution speed of the EDITOR program This would be accomplished by eliminating some of the system protection features such as division by zero and out of bounds checking In this secion the user could also specify new maximum values for the EDITOR error limits

(2) Provide an internal identifier (OCCURRENCE-PTR) which would contain the occurrence number of the record currently being filtered The only problem to resolve is how to treat this id-ntifier outside of the Filter Routines

(3) Implement a CREATE commnand This command would allow the user to create a record that Was not found on the input file This implementation would require special care as it gives the user the ability to change the valme of a CONCOR internal identifier (TYPE-CGUNT) and modify the contents of the store Another consequence of this command is what action is to be taken in the calculation of tolerance in terms of this new records inclusion into the universe of observations and number of changes

iK

48 G-4

(4) The code generated in the EDITOR program for the DRECODE

command should be modified to utilized a lookup table approach

This would decrease the execution time of the command but woul C

require additional communication between the GENED and GENDD

programs

(5) Permit the user to continue the message text field over

successive lines of code

(6) Implementation of a SET command command that would change

CONCORs default occurrence pointer for the duration of the SET

command This command would be ever helpful as the validation

of the existence of a record would only have to be done once

when the SET command was encounterd The major drawback to

this command is in the users ability to remember the action

taken during the last execution of the command when the

Execution Division command statements may cover many pages of

code

(7) Implement a method of specifying different data formats for

fields on the WRITE command statement This would give greater

flexibility to the user but would require major revision to the

routines now used to process the WRITE command

(8) Provide the user with two sets of internal identifiers The

first set currently exists in the system and is valid on the

basis of the current control area (if specified) being

executed The second set would be accumulated on a run total

basis and would require the creation of new CONCOR internal

ident ifiers

(9) Implement automatic array declarations for capturing allocation

frequency distributions This could be done by the system for

discrete values (especially if point number 13 above is

implemented) but a mechanism for handling continuous values

would need to be designed nother program would then be

required to format the information gathered into a readable

form

113 Report Division

(I) Provide a means by which the user may specify their own headings or titles -for the reports

(2) Give the user the ability to override page ejection as a means

of saving paper

(3) Provide a neans by which the statistics gathered may be

presented in other aqgregations other than the area-break and

total levels now pro ided This would require another

procedure to aggregate the statistics and pass them onto the

Report Division before the printing of the reports began Thi

extra procedure is required because of program size

cons i derat ions

G-5

49

114 System Level

(1) Generation of assembler (ALC) language code for the EDITOR program to be used on the IBM 360370 series type computers

(2) Modification of the GENSRC program i utilize a lookup table instead of the linear search technioa2 now used

(3) Modification of the RDMSGTXT program to look at continuation lines when searching for the text to be supplied by the system

(4) Hlodification to the system data dictionary to better arrange and make available information needed most frequently Also to examine the possibility of making the section dealing with the message text and report generation a separate file Investigate the industry standards for dictionary format and see how CONCOR meets the requirements set forth

(5) Make the parsing and syntacial routines interactive so as to increase the product ity of the CONCOR user This does not mean to imply that the editing process itself should be made interactive but rather only the development of the user code should be put online

12 Documentation

(1)The development of a self-teaching CONCOR manual

(2) The development of a CONCOR Users Guide

G-6

0

50

2 Survey or other Census Processing

21 Software

(1) Make optional the specification of the record type and questionnaire identification information as some files are not

hierarchical in nature

(2) Provide another input file as a means of doing a check-in procedure of sample cases expected This new input file would contain the master list of those oberservations selected to be

examined

(3) Allow floating point calculations

(4) Provide a mechanism for referencing repetitive sections of questionnaire This referencing (loop letter approach) could be accomplished in the same fashion as interrecord checking is now implemented Note that by using this approach two versions of the software would be required One version would be as CONCOR is now implemented and the second would accept flat data files where the subscript indicated a repetitive section on the survey document

(5) Add a weighting scheme to allow meaningful totals and summary

statistics based upon the sampling frame used to gather the data being edited

(6) Increase the number of user-identifiers allowed in the system because of the more detailed information found on survey forms

(7) Provide a means of manual correction of erroneous data items This correction scheme would utilize the data dictionary and allow for correction based upon the user-identifer name and

questionnaire identification CELADE has a COBOL version of

this program but it has not been finished nor tested

42

APPENDIX H

CONCOR SYSTEM INTERNAL VARIABLES

0 APPENO H

CONCOR SYSTEM INTERNAL VARIABLES

1 EOF-FLAG

2 ERROR-TN-QUESTIONNAIRE

3 CURRENT-POINTER-VALUE

4 TYPE-COUNT ( )

5 RECORD-COUNT

6 QUESTIONNAIRE-COUNT

7 RECORDS-IN-STORE

8 INVALID-RECORD-TYPE-COUNT

9 INCOMPLETE-FLAG

10 CONTINUATION-FLAG

11 INVALID-RECORD-FLAG

EOF-FLAG

ERRORS-IN-QUESTIONNAIRE

TYPE-COUNT ( )

IN-RECORD-COUNT

IN-QUESTIONNAIRE-COUNT

RECORDS-IN-STORE

INVALID-RECORD-TYPE-COUNT

INCOMPLETE-FLAG

CONTINUATION-FLAG

OUT-OF-RANGE

NOT-NUMBER

BLANK

(new internal counters) OUT-RECORD-COUNT OUTPUT-QUEST-COUNT WRITE-RECORD-COUNT

Change in spelling of variable

Not mentioned--possible ommission from manual

Not implemented

Note Current documentation equivocates when some variables

are reset--most are initialized with each control area break

Though implemented as reserved identifiers due to poor documentation of (old) CONCOR exact values and utility were unknown

These apparently new counters permit access to valuable totals within control areas

Note These are not cumulative counters Also outputshyquest-count violates naming conversion eg in out

APPENDIX I

CONCOR-EDITOR EXECUTION STATISTICS

C 0 N C o R - E D I T O R EXECUT ION S-TA-T1 ST I CS RUN DATEt 517I

APPENDIX I

CONTROL AREA 0 INPUTS RELD OUTPUTS BY OUTPUTCOHMAND OUTPUTS BY W0ITECOMP4JD NO VALI 0 SEQUENCE NUMBER 00000 00 0 RECTYPES 0

OUESTIONNAIRES RECORDS 0 QUEST KEYWORD- 0 ECORDS- RECORDS FORQUFSTshy

bullDIVISION BY ZERO LINE NUMBER i18 to DIVISION BY ZERO LINE NUMBER 118 bo DIVISION BY ZERO 00 LINE NUMBER 118

Oboo DIVISION BY ZERO 00 LINE NUMBER 118 LTNE |UJMP 000 DIVISION BY ZERO 000 LINE NUMBER 118

bo DIVISION BY ZERO 0 LINE NUMBER 118 00 RUN ABORTED DUE TO DIVISION BY ZERO ERRORS 00 1it LIST IR003 = Ro3t

1 6 42

112 LIST

Comments One of the most sever criticisms 114 LIST IDIVISIION BY ZErO ERROR Cw FOLLOWS of the old CONCOR package was that ]l LET RO01 003 = O it would cease processing for no

apparent reason--ABEND without generating any messages The most )16 LIST iRO01 R003 SET TO ZEROS Pou19 Ron3 conmon types of cessations are division by zero array coordinate 117 LIST RO0 0 = errors and user specified terminations A program to deliberately 118 LET P003 = R002 RO01 test CONCORS ability to providediagnostics was written The system 119 LIST LIST R003 R002RO01 =1 R0031 perfoned exceptionally well and correctly provided the line numbers in the 120 source program which caused the data exceptions The program text has been 121 superimposed upon the Execution Statistics page for illustration purposes 122 END TEST OF OPERANDS AND MSG

123

124

1 5 PEGISTERS RESET TO ZERO

125 LET R001 RO00 R003 = nt

0 N C 1 - Z D I T 0 it - ( r S T A T I S T I C S QJi DAT 0119

I ~ - ~APPENDIX b)APPENDIX TL I A Cr14A) 01 3TUTS JRtTF -0 VtLIQ AP-A I PUT R - 3ITD JTS LY UTPUT - Y C

I)ESTIjAAIES - PECLRW cU] I S T igry-t QEC UP )S irrkqS F] ) T

VLLII= FF= 15( C 1 u U-1 Tr JJ TI A)[ 9 )i 1= TI =WATE ) L i - U = VI L IJ) L T J3r-= I n

D 1 4A 1 EItR P IXSTI J A 1 L = 7 II I fIAT E I i Y C

= y 4 I 19f CZ U AT -Z-2j 1ESTIJh14 E CJ)PI) IATE= 2 V L 1E= L T I P1 =UM 11

= rU F r

F R L- -- I JE TI rjA OrPR eTE= rUMiF=E II V1 LIIF= LIT V M

tYfZU ttj gR R QUIESTI3Ii1 j Vi r IATE= LI n l FJw== 0R V J L T C ( -pJIT E~Q - cUESTT Y2ZjI 1= g rfiflUI)IATL= VeLUJE= P LIf PP= 1 3)

bull~ - A ED 011 1 Tn CCy -tENCE C rGR DI T FOR UP S )r 0 JCCUJ0EICF FJ71 ECGPO

173

175 UNTIL ROO1 gt 10 VARY P001 FROM 1 By Comments

In this example RO01 a 2 dimensional array 177 DO with 10 rows and 10 columns is attempting

178 UNTIL R002 - 10 VARY BO FROM 1 BY 1 to update the smaller (5 X 5) TA2 Nonshy- Uexistent element references caused the error

The program text has been superimposed on this

page for illustration purposes

180 ALLOCATE NOOP = TA1 (RO0] ROO)I

JAI UPDATE TA2 (RO01 R002) RO031

182 LIST N002 = R0037 NOO R003

1P3 LIST VALUES OF TA1 = RO03v TAl (RO01 R002)

184 LIST

185 LIST ITA2=TA2 RO01 Rn02)l

l6 END-DOI LIST

187 END-DO

deg r- -- shy

APPENJDIXI1 J( j

CONTROL AREA INPUTS READ OUTPUTS BY OUTPUT LOMANO OUTPUTS TEltCOMMNDfN

SEQUENCE NUMBER 11T A QUESTIONNAIRES RECORDS QUEST KEYWORD RECORDS SFf- 0 T rr-

JOB STOPPED DUETO USER STOP-RUN COMMAND ON LINC 215 - - -- 2 15 _ _ 4$ A

EDITOR -- NO RMAL END OF J B r T I1)

214 IF IN-RECORD-COUNT gt Comments In this example when the internal variable1215 ITHEN STOP-RUN END-THEN was than

215 TE END-THEN IN-RECORD-COUNT was greater than 500 processing was directed by the source COBOL CONCOR language

216 1 program to cease The program text has been - upon this page for illustrationI _-EC superimposed

S IF IN-RECORD-COUNT- lt 2 - - purposes ~~~~~~~~~~- bull - - ------I-

218~ THEN -LIST lt(THAN 20RECS INR

amp-~~~~~ ntai-beDcent

I

APPENDIX J

DIAGNOSTIC MEFSAGE GUIDE EXAMPLE

APPENDIX J DD-2

WARNING(DD-001) BLANK COMMAND STATEMENT LINE

EXPLANATION A blank command statement line was found in the user Dictionary Division command statement file

ACTION TAKEN Parsing continued with the next command statement line in the Dictionary Division command file

USER RESPONSE Put a comment indicator () in column 1 of the command statement line if spacing is desired if not remove the command statement line

ERROR (DD-002) ILLEGAL FIRST CHARACTER IN STRING

EXPLANATION A character not in the CONCOR legal character set (A-Z 0-9 - + the comma character () the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found as the first character in the string

ACTION TAKEN Parsing began again with the next string

USER RESPONSE Check for possible keying errors and ensure that the first character is a member of the valid CONCOR

character set

ERROR (DD-003) END OF LINE REACHED WITH NO DELIMITER FOR ILLEGAL STRING

EXPLANATION A string starting with a character not valid in the CONCOR legal character set (A-Z 0-9 - + the comma character ()p the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found without one of the delimiter characters ( =) when the end of the command line was reached

ACTION TAKEN Parsing began again with the next user command statement line in the Dictionary Division command file

USER RESPONSE Check for possible keying errors and ensure that each of the characters in the string is a member of the CONCOR valid character set (A-Z 0-9 - + the comma character () he delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ())

ERROR (DD-004) COMMENT INDICATOR () IN STRING

EXPLANATION The comment indicator character () was found embedded in a string

Page 7: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …

3

II THE ADEQUACY OF CONCOR

CONCOR has been described by its designers as an adequate packageAdequacy as an evaluative criteria is often relative to need and should not be confused with readiness as an issue The CONCOR system exclusive of documentation is sufficiently corplete that in a situation of extreme need it could be used as a data-cleaning tool in the editing and imputation phaseof census processing Less extreme circumstances would impose reticence on such an endorsement Though non-exhaustive tests indicate that CONCOR appearsto be capable of performing all of the commands as implemented because of the rapidness with which the system was rewritten it is thought that there has not been enough time to fully test all aspects of the project Thereforeprior to its general dissemination it is recommended that an independent agency conduct exhaustive tests to certify the integrity of the system proshygrams The importance of this certification cannot be understated in lightof previous workshop experiences Concurrent with this testing process the same agency should determine the relative speed and size of the system under actual production circumstances and further determine CONCORs ease of nstalshylation Later sections of this discussion set forth additional testing recom endations

It is generally recognized that of all the data-cleaning tools available for exportation CONCOR is potentially the most powerful especially with the addition of its new commands as outlined in Appendix F While its utility is not in doubt one must ask the question of how much more useful could CONCOR be if modified and would this additional utility be worth the costs involved The nature of modifications (excluding documentation) to COBOL CONCOR approprishyate at this time for cnsideration are threefold

1 Adjustments to the elements of the system which are internallyinconsistent or awkward to facilitate its learnability and usability am ig developing country programmers

a Implementation of the division headings DICTIONARY-DIVISION EXECUTION-DIVISION and REPORT-DIVISION De-emphasis of section headings

b Improvement of the consistency among data identifiers allow alphanumeric variables to be coded without mandatory comshyparison strings throughout the DATA-DIVISION and to be of the same length of numeric variables Permit numeric identifiers to be of an equal length to NEW DATA identishyfiers Permit the coding of single dimension row and column vectors in the same manner as multi-dimensional arrays

2 Implementation of selective commands and internal variables to facilitate the production environment use of CONCOR in census applications These include

4

a LOADUNLOAD arrays Commands which would save and replace automatically hot-decked values from batch to batch

b TOTAL-QUESTIONNAIRE-COUNT-RECORD-COUNT internal

variables independent of AREA CONTROL

3 Other modifications

a Default values for max-storage parameter set in realistic range

b Allowance of more variables for survey applications

Some of these modifications are part of what ISPC calls its wish list for the future development of CONCOR This document has been included in this report as Appendix G It is arguable that these features are essential to the completion of the CONCOR package While it is beyond the scope of this report to draw a conclusion in this area the enhancements as outlined above are ones that would make the language more internally consistent and thereby easier to learn and apply to a census data production environment These modifications are not arbitrary or cosmetic but are a direct result of handsshyon programming experience in the language as well as observations and disshycussions with other workshop participants While it is probably impossible to ever be satisfied with the overall structure of any programming language the resolution of this issue of completeness must be made relative to the objecshytives for developing the COBOL CONCOR system in the first place An explicit statement of these objectivEs has been absent in all systems documentation to date

5

III PROPOSED CHANGES TO THE LANGUAGE STRUCTURE

Based upon the assumption that it is the intent uf sponso-ing agenciesto optimize the COBOL CONCOR package -- a goal which is believed currentlyobtainable -- an understanding of the nature of these changes and how theywould impact users is essential Appendix F sets forth in a comparative manner differences between the old December 1978 and the new December 1979 editions of CONCOR Studying this appendix obviates the fact that while the new version of the language is clearly superior to the old in nearly everyaspect the basic and overall structure of the language is essentially unshychanged Compartmentalization of aspects of the language into divisions represents a significant ideological enhancement to the language Indeeddevelopment of programs by divisions proved to be an extremely useful way of understanding the nature of editing work to be performed However note that while the END-DIVISION comnand is essential to the language the division headings DICTIONARY-DIVISION EXECUTION-DIVISION and REPORT-DIVISION were not implemented and are therefore preceded by a period to be treated as comment lines in the program listing It is inconsistent to implement END-DIVISION commands while not implementing the division headings It is believed that this division structuring is important enough to the overall organizational structure of a CONCOR source language program that it should be implementedprior to general distribution The section headers shown on the figures inAppendix F however are another matter They are cumbersome and were generally not coded by workshop participants and they could be deleted from this version of the language altogether with little loss in organizational understanding The CONCOR language is significantly powerful to stand on its own as a distinct product and is not meant to be a COBOL imitation its present degree of development and specialization do not warrant the structural drag of additional section identifiers is the probable intent of the originalCONCOR project was to develop a package which was uncomplicated and unwieldy to use The question of division and section names implementation while seemingly cosmetic can have real impact on its perceived easiness of learning and use

Figure 1 on the following page illustrates common mistakes programmersmake coding numeric and alphanumeric variables in the DATA-DIVISION These mistakes are the result of the inconsistent variable formats For instancein the numeric data definition statement it is permissible to specify 19-23 where N signifies numeric 9 signifies the length of the item and 23 specifiesthe starting position in the record In NEW-DATA however it is possible to code an item with a maximum length of 18 While on the surface this inconshysistency would seem harmless typically some data defined user variables in NEW-DATA defined N18 could be moved inadvertently to output record fields defined by a data definition statement 119-23 such an action would result in a data error Under certain circumstances itwould be highly desirable to output these larger length values A similar circumstance exists between the numeric and alphanumeric data coding conventions While the maximum lengthof the numeric is permitted to be 9 in the data definition statement (18 in NEW-DATA) the maximum alphanumeric variable is permitted to be only 4 characters in length In the current systems manual it is recommended that

FIGURE 1

DICTIONARY-DIVISION

DICTIONARY-NAME DATA-CODING-EXAMPLE

INPUT-FILE

OUTPUT-FILE

AREA-CONTROL N2-2 N2-4 N3-6 N2-9 N9-23 QUESTIONNAIRE-CONTROL A4-2 A3-6 A2-9 A3-11 A3-14

RECORD-CONTROL Al-l

DEFINE-RECORD

HOI-TYPE-OF-HOUISING-UNIT Nl-17

H02-MATERIAL-OF-ROOF N1-19 10 9

H03-TOTAL-PERSONS-IN-UNIT N8-40 NOT-NUMERIC BLAIK

1104-STATE-OF-UIIII-CODE A4-50 0 U 1 D

DEFINE-RECORD

P01-SEX 1-13 W F

NEW-DATA

NOI-SAVE-TYPE-OF-HOUSING-UNIT

N02-SAVE-TYPE-OF-ROOF 1

N03-COUNT-TOTAL-IN-UNITS 10 0

N04-AGGREGATE-INCOME 18 0

END-DIVISION

Explanations

N2-4 This is an example of an external numeric input data item (N) with a length of 2 bytes starting in column 4 of the input record The maximumlength of this type of variable outside of NEW-DATA is 9 When coded in

NEW-DATA 18 is permitted

A4-2 This is an example of an external alphanumeric input data item (A)

with a length of 4 bytes starting in column 2 of the input record This

construction for alphanumeric variable is valid only in the control stateshyments Additionally it can never be over 4 bytes in length When alphshynumeric data fields are defined within record types the EDITOR program

requires that the comparison strings always be specified A maximum of 3 is permitted The purpose of these strings is to force recode the data to a numeric value If no match is found EDITOR automatically assigns a unique negative value to the field

7

alphanumeric coding be utilized in the QUESTIONNAIRE-CONTROL and RECORD-CONTROL statements where each input data item must be of the same data type as shown in the example When alphanumeric data variables are used in these control stateshyments their construction is identical to that of numeric items However when used elsewhere in the DATA-DIVISION alphanumeric variables are required to specify one of three possible comparison values as shown There are number of production instances when it never would be necessary or even desirable to reshycode alphanumeric data However as CONCOR attempts to force data into a totally numeric format upon output there is no current way to preserve these values if desired

An unwieldy alternative to this situation which may be acceptable under some circumstances would be the expansion of the number of comparison stringsfrom three to a more realistic number The limitation of this compromise is that a full twenty-six comparison identifiers would be required in order to accommodate data which utilized the entire alphabet A better solutionhowever would be to make the general format of the alphanumeric variables identical to that of numeric identifiers ie A9-23 and to permit alphashynumeric values so defined to pass unaltered through the CONCOR system

Anocher data-naming convention which caused several errors and which could be corrected concerns the array data definitional statements While arraysof two and more dimensions are handled in a superior manner by the CONCOR proshygram single-dimension arrays pose a problem in coding as shown in the Figure 2 It is suggested that the command imperatives be changed to permit the codingof both rows and columns in single dimension arrays ie allow a single row vector as well as a single column vector to maintain the consistel -yof the array data definitional statements

A major requirement of COBOL CONCOR file processing concerns the fact that all related data records must be physically contiguous on the input file The implication of this requirement is that files may require preprocessing prior to actual data editing (This preprocessing is usually a sort routine upon a selected CONTROL-AREA key) While this type of processing merely introduces a new step in file processing a major limitation becomes apparent when a largenumber of DISCRETE DATA files of the same census or survey questionnaire are to be processed This limitation is the introduction of manual steps to save the most recent inputed values ie preventing the program from startingwith cold values each batch run If a command such as LOADUNLOAD ARRAYS was incorporated into the language (an enhancement not believed to be difficult to implement) manual processing would be reduced to a minimum between batches and the maximum benefits of the hot deck methodology would be realized It is envisioned that such a command would automatically insure the transfer of the appropriately designated hot values Automatic processing of this nature if done correctly can greatly reduce the time required to clean multishyvolume files for once CONCOR language statements have been compiled linked

While it is possible at this time to save the arrays that amp-e used in the imputation processes on a separate write-file right now it is not possibleto automatically load those values back to an object program and to iTmedishyately resume processing on another volume It isbelieved that suh an automatic feature of the language would cut down the manual processing time significantly enough that it warrants inclusion into the package prior to its general distribution

FIGURE 2

A05-DI FF-BETWEEN-AGE-OF-FEMALE-BY-RECATION v2 4 4

AGE iF LHUSBAND RELATION Connents The ARRAY-DATA command statement

12-L7 18-24 25-35 36+ provides the means to declare array identifiers

2 1 3v 4v HEAD with up to five dimensions Current documentation 2 -1 3 CHILD is not as explicite about the rules of this 1 31 -2 -4 OTHER command as is desirable The parameters of I 2 2 2 NONRFLATTVE the command should function as follows

user-identifier number of dimensions D R C M number of rows number of columns

magnitude of element intiial start up valuesA06-DI FF-BETI4EEN-ArE-DF-PERSON-AND-M4OTHER 114

(This coding generates the below In the example A05 is a two dimensional array 16 18 21 23 error message) with 4 rows 4 columns a default magvitude of 9

and cold deck values as labeled

A06-DI FF- ETWEEN-AGE-OF-PERSON-AND-MOTHER 11t 587

I 2 III jqARNINIGDD-207) COMMAND TERMINATOR I) NOT FOUND C) ASSUMED PRESENT (2) EPROR (DD-9lI) DIMENSION OF USER-SPECIFIED ARRAY IS LESS THAN THE MINIMUM VALUE PERMITTFD (2)

PREEV1OUS DIAGIOSTIC AT CINIE 563

As shown by the array variable A06 CONCORs treatment of vectors is not consistent with the above multidimensional array skeme ie

(Example of how vectors must be currently A06 must be coded as follows coded to be correct)

A06-DIFF-BETWEEN-AGE-OF- PERSON-AND-MOTHERtl42 user-identifier 1 dimension number of elements in vector magnitude of element initial start up values

A simple modification to this command would permit 6 Lthe coding of both row and column vectors and make

16 LB 21 23 this command less error prone

0 0 0

9

and stored as an object module on the system no other compilations should berequired for questionnaire processing files of the same type Theoreticallya single well-written CONCOR program is all that would be required to process an entire census run

Appendix H contrasts the internal identifiers of the old and new languageversions Without such identifiers a user would have little information about the status of input as it is processed by EDITOR As noted in theappendix most internal pointers are reset upon each break in the CONTROL-AREAprovided a CONTROL-AREA has been defined The limitation here is that there are obvious instances when the termination in the processing mode would beadvantageous based on run counts although a CONTROL-AREA has been specifiedeg debugging CONCOR programs or comparing input files Therefore another set of pointers should be implemented for this purpose and made available for programmer reference

One clearly disturbing development which needs to be pursued during inshydepth testing of the system concerns the MAX-STORAGE parameters of the DEFINE-RECORD statement As shown in the figure on the following page when MAX-STORAGE was set equal to the maximum value a COBOL program was generated whichrequired 1O00K of core to run The MAX-STORAGE value of 999 is clearly notrealistic under most processing circumstances This example drives home severalimportant points about CONCOR The core requiremenis of CONCOR generated proshygrams can be influenced significantly by the amount or nature of programmerspecified I0 operations In fact it is possible to generate a program of a size most foreign country machines could not process It is recommended that tests determine a realistic max-value restriction for implementation to prevent problems in this area

The final area of recommended modification concerns the newly implementedREPORT-DIVISION The purpose of the REPORT-DIVISION is to enable a user todescribe or specify certain CONCOR language statements which will generatestatistical reports These reports contain statistics generated by EDITOR as specified by the GENERATE-EDIT-STATISTICS command of the EXECUTION4-DIVISIONAll of the reports produced are organized according to the data fields definedby the AREA-CONTROL command of the DATA-DICTIONARY If the AREA-CONTROL command is not defined in the DATA-DICTIONARY then all the statistics aresummarized at the total run level If a control area field is defined then allstatistics will be summarized for each unique CONTROL AREA as encountered bythe EDITCR program on the input file Statistics by total run level will notbe available This in part relates back to previous discussions citing theneed for new internal identifiers Report listings may contain the values ofentire records or entire questionnaires depending upon the keyword used inthe report generation commands The problem centers upon the homogeneity of CONCOR printouts during a production run

It is virtually impossible to distinguish reports on the basis of thevolumes they were run against Some means should be provided to allow users touniquely and purposefully label the reports generated in this division Indeedthe whole name REPORT-DIVISION suggests that such a command is implicit andappropriate Such a LABEL-REPORT or REPORT-FILE command along with file inforshymation from the system should not be difficult to implement

FIGURE 3

C O N C o R PAGE

SC0 N S I S T E N C Y A N D C-0 R R E C t I ON ) RUN 6AE 0hi88n

E D I T A N D I M P U T ATI 0 N S Y S T EM

USER DICTIONARY DIVISION-SOURCE LISTING

LINE NUMBER

7070

71

72 DEFINE-RECORD RECORDNAME= HOUSING-RECORDi 72

73 MAX-STORAGE=- 999 73

74 RLrRD-TYPE 11t NOTE AN LITERAL 74

267

71

P6T

268 DEFINE-RECORD RECORD-NAME= POPULATION-RECORD p68

P69MAX-STORAGE= 999269

270RECORD-TYPE= F1270

IEF2R5I JES2JOB04966SO0114 SYSOUT iEF2851 SYS8nO18T095553RAOOOAMCP12ARO000005 PASSED

= IEF2851 VOL qFP NOS TmPOn2 IEF285I wHKHDGMCONCORP12AERRORRECS KEPT

= IEF2851 VOL SEP NOS TMPO03 IEF3731 STEP LOAD START 800181001

IEF3741 STEP LOAD STOP 800181002 CPU OMIN 0165SEC SRR flMIN O25SEC VIRT iO00K SYS 3

STEP START TIME - 100155 STEP END TIME - 100257 STEP CPU TIME - 165 SECONDS STEP COMPLETION CODE - C 0

O TAPES - 0 DISKS - 0 PEGION (REQUESTED - 1000K USED - 1000K) IO COUNT 211

CARDS READ - 0 CARDS PUNCHED - 0 LINES PRINTE6 - 22

STEPNAME - LOAD CHARGE (MACHINE UNITS - 1198 DOLLARS- S-77 5TONUM9E 1

11

Concluding Remarks of System Modifications

Version 1 of the COBOL CONCOR language was notorious for abending without the generation of diagnostic messages Appendix I sets forth a test of some of these historical problems In eac) instance the system provided messagesand source program line references wLch enabled immediate identification of the processing difficulties However tests of the various commands revealed the general high performance character of the language in its newest version Under these circumstances the recommended modifica-ions should be considered as ones which will uniformly allow CONCOR to realize t-r potential of its design enhance its utility and moreover endure as a software product

12

IV THE ADEQUACY OF COBOL CONCOR DOCUMENTATION

The current documentation of CONCOR consists of a newly developed relativelyvoluminous systems manual a Diagnostic Message Guide and some loose-leaf inshystructions concerning the installation of the system on IBM computers Of these three documents the Diagnostic Message Guide is clearly complete and adequateand could bear no changes except to provide message number tabs to enable a programmer to more quickly locate the text of a message number An example of the superior format of this guide appears as Appendix J

Conspicuously absent from the list of systems documents is a Users Guide as originally specified in the ISPC enhancement proposal

A thorough assessment of the Systems Manual has been undertaken and indishycates with some reorganization and editing it could be made a more useful and usable document as itis well constituted informationally In its current form the Systems Manual can be best described as a working document a document which was not intended to teach the CONCOR language but centered upon an audishyence already familiar with previous CONCOR releases Further it is not asystems manual per se as some of its components and appendixes could more appropriately compose a users guide For example excerpts from the POPSTAN publications which review the basic editing concepts and contain the POPSTAN example problem should be deleted from the systems manual and would be more contextually effective in the Users Guide The EXECUTION-DIVISION chapter of the Systems Manual would especially benefit from general editing and reorganishyzation (Specifically the contents of pages 90-92 should be moved to page 63 -shythese pages explain the placement and purpose of the three permissible routines of the CONCOR language Had the author been granted more time he could have outlined additional specific modifications to bring existing documentation to a level of adequacy) However some guidelines stand out

1 Installation instruction and considerations should compose a chapter of the Systems Manual rather than a separate document (Hand-out materials covering installation were not adequate) Further a documentation scheme setting forth how CONCOR was installed on each computer and any modifications required during installation could be serially added to and become a permanent part of the installations systems manual If CONCOR is going to be used and if agencies are going to actively support it as a package installation limitations must be known and well-documented The prescribed format for this documentation should be set forth and any modifishycations to CONCOR should be continuously updated on these shcets so that a history of changes to the language will be readily accessible to supporting agencies If CONCOR is to survive as a standardized system editing results must be capable of being replicated among installations Such a form could contain other information as well concerning the tyce of computer amount of care and nature of utilities supporting

13

users Upon installation a copy of this form could be

sent to the US agency which will ultimately be responsible

for supporting the CONCOR package

an appendix2 A complete COBOL CONCOR program should appear in

for reference

3 The development of the Users Guide should include an intensive

review of the editing concepts involved in processing census

data files beyond the POPSTAN materials

4 An explanation of the CONCOR benchmark program syould appear

in the Users Guide and the Systems Manual The running of a

supplied benchmark program should be a standard installation

protocol used to test all operational aspects of a new

installation

This tool which has5 The development of a CONCOR pocket card been proven useful in teaching and assisting programmers in

utilizing programming language lays out all commands options on

a single small card An example of such a pocket card is the

Such a small document is useful in clari-IBM Assembler Card fying the language and setting forth structural rules without

continual reference to full-size manuals

14

V CONCLUSION

In a situation of extreme need CONCOR could probably be utilized as it seems capable of performing its editing functiens Its usefulness as a data

cleaning tool is generally undisputed The importance of having CONCOR exshyhaustively tested by an independent agency cannot be over-emphasized in light of prior failures of the system to perform adequately Additionally core requirements and processing speed should be precisely determined One must be encouraged in the amount of progress which has been made in a relatively short amount of time towards the completion of the CONCOR package While it is clearly beyond the scope of this report to estimate the cost and the amount of time required to implement the herein outlined highly desirable changes to the CONCOR language and documentation and then exhaustively test the system it is believed that this process could be completed within a single quarter (The changes are of the nature that no structural redefinishytions of a systematic nature would be required) These enhancements would make the COBOL CONCOR package more complete consistent and reliable in the field ISPC has competently moved the system to the point that its comshypletion iswithin reach

Given the probable revision of the current Systems Reference Manual and the development of a suitable Users Guide document it is likely that this version of CONCOR could be taught to experienced programmers (unfamiliar with previous versions of the language) in as little as four weeks And while one is impressed by the potential power and overall simplicity of the COBOL CONCOR language it is difficult to envision individuals with no real computer programming experience utilizing its full capabilities Hence the importance of including a review of basic editing concepts in the documentation is undershylined

Further as documentation is often critical to the perceived ease of use of computer-based systems itwill be essential to have these various manuals examined independently prior to their general use

As previously mentioned Appendix G represents features of which the ISPC feels could have positive impact on CONCOR Depending upon the outcome of the testing recommended in previous chapters and the availability of funding supporting agencies may desire to look again at an assembler (ALC) version of the language

Overall COBOL CONCOR remains an intermediate product yet to realize its potential which is thought considerable Software development of this nature is characteristically of a long-term evolutionary design In light of this reality agencies should be prepared to evaluate such projects success or failure in terms other than that of immediate utility It is the philosophy underlying the system not the system itself which will ultimately be exported

APPENDIX A

Bucen Enforcement Proposal

APPENDIX A

BUCEN IN-HOUSE REVIEW ENHANCEMENT PROPOSAL

1 Easy to use interrecord referencing

2 Improved output file capabilities

A provide overflow protection on WRITE command

B provide OUTPUT command to create corrected questionnaire file that matches IP file data dictionary

3 Improvededit statistics reported (LISTERR)

A provide automatic (user-specified) area break

B provide options for compilation and displaying edit statistics at various levels

C provide automatic (user-specified) tolerance checking of error rates by area

D automatically capture IDs of areas failing tolerance check

4 Clean up known bugs in code

5 Comprehensive testing

6 Clean up and enhance documentation

A reference manual more examples error message guide

B installation guide

C systems manual

APPENDIX B

EVALUATIVE CRITERIA

APPEiUIX B [4 si 11 T q tII

UNITED STATES GOVERNMENshy

Pfern0 KGa 7 dll TO DSPOPFPSD Robert H Haladay

DATE December 3 1979

DSPOPDEIO Liliane Floge

SUBJECT Criteria for Evaluation of SuCen COBOL CONCOR September 30thVersion as presented at January 1930 orkshop

The following points should be considered when evaluatingCONCOR the COBOLediting system during participation at the January 19380 workshyshop

1 Are the Installation anual the System ilanual and theUsers anual ready for use Are these manuals adequate fortheir purposes Are the manuals clear and consistentthey be understood by subject-matter personnel Can

as well program ers

as

2 Is the editing system capable of impleaenting all thefeatures as described in the documenit_tion 3 Does the system appc3Ar to be ely debugged andtested and ready for release to dLvlooing conris 4 Does the system appear to be fast enough to edit a

census in a reasonable airount of time 5 hat size core does tine sys t- equire

6 CorFm-ent in general on coole te-nss usefulness andefficiency of ir-nease usethe n of by developingcountry prograrrers and other c nsus personnel

cc Su-a-- e Olds APiA SDi OrWJulio Ortuizar luotolaCelta Systems

APPENDIX C

WORKSHOP ITINERARY

APPENDIX C

CONCOR Workshop Schedule January 7-18 1980

U S Bureau of the Census International Statistical Programs Center

Scuderi Building Room 209 4235 28th Avenue Marlow Heights Maryland

Monday January 7

930 am shy 1000 Welcoming Remarks

Overview of Workshop

1000 - 1000 Introduction to CONCOR - Purpose and function

- History of development - General computer

requirements

115 - 200 System Description

-Constraints in design of CONCOR

-Basic subsystems of CONCOR

-User interactions with system

-Examples of outputs produced

1030 shy 1045 Break

1045 - 1200 Editing Concepts - Ways to interrogate

data - Ways to correct data

- Editing housing and

population data - POPSTAN

- Advantages of CONCOR

1200 shy 115 pm Break

200 - 230

230 - 245

245 - 325

User Program Organization -Divisions - Sections

- Routines - Commands

Break

Command Language

Description -Types of statements -Format -Syntax

is

2

Tuesday January 8

Dictionary Division Command Statements

115 pm-2 15 Input-Record-Section930 am-10

30 Punctuation - Define-RecordInput data referencing

Working-Data-SectionWorking data declara-- New-Datations - Array-DataInternal data represen-

tation and storage Break215 - 230

1030 - 1045 Break 230 - 325 Dictionary Examples

- Minimum dictionary1045 - 1200 Dictionary-Attributes-

structureSection - Maximum dictionary- Dictionary-Name

structure

- Hand out dictionaryFile-Section problem- Input-File

- Output-File - Write-File

- Error-File

1200 - 115 pm Break

Wednesday January 9

115 pm-2 1 5 Execution Division Command

930 am-1030 Discussion of Partici-Statements (continued)pants Dictionary

- Routines of Edit-Specishyproblems fications-Section

Free work time -Prolog-Routine -Filter-Routine -Epilog-Routine

- Types and functions 1030 - 1045 Break

of edit specification

commands1045 - 1200 Execution Division

Command Statements - Range- Punctuation - Assert- Subscripting

- Internal Identifiers - Report-Control-Section

-Generate-Edit-Statistics 215 - 230 Break-Count-Imputes

-Examples 230 - 325 - PassFail clauses

- List1200 - 115 pm Break

3

Thursday January 10

930 am-1030 Discussion of Problems - If115 pm-215

Free work time - UntilExit - Stop

1030 - 1045 Break 215 - 230 Break

1045 shy 1200 Edit Specification Command Statements 230 - 325 - Drecode

(continued) - Grecode

- Allocate - Update - Let

1200 - 115pm Break

Friday January 11

930 am-1030 Edit Specification 115 pm-3 25 Hand out Edit Specishyfications as problemCommand Statements

(continued) Free work time

- Output - Write

1030 - 1045 Break

1045 - 1200 Report Division Command Statements - Display-Control-

Section -Display-Edit-Statistics

- Tolerance-Control-Section -Error-Rate-Check -Reject-File

-Report Examples

1200 - 115 pm Break

I

29

Monday January 14

930 am-1030 Discuss procedures for running problems on computer

1030-1045 Break

1045-1200 Component Programs of the CONCOR system

1200- 115 pm Break

Tuesday January 15

930 am - 325 pm Free work time

Wednesday January 16

930 am 1200 Free work time

1200- 115 pm Break

115 pm-215 How to Install CONCOR on IBM 360370 OS

215- 230 Break

230-325 Free work time

Thursday January 17

930 am-325 Free work time

115 pm-215 Future CONCOR Design Considerations - for census processing - for survey processing

- manual correction system

215- 230 Break

230 - 245 Evaluation Guidelines

- Hand out evaluation forms

245 - 325 Free work time

Friday January 18

930 am-1030 Free work time 115-325 Free work time

1030 - 1045 Break

1045 - 1200 Group Discussion on CONCOR - submission of written evaluations Distribution of IBM installation tapes Presentation of Certificates to Participants

1200-115 pm Break

APPENDIX D

PARTICIPANTS

APPENDIX D

CONCOR WORKSHOP January 7 - 18 1980 Marlow Heights MD

PARTICIPANTS

KENYA

James Midianga Government Computer Centre Central Bureau of Statistics PO Box 30266 Nairobi Kenya

PANAMA

Omar Rivera Rios Data Processing Specialist Contraloria General de la Republica de Panama Apartado Postal 5213 Panama 5 Panama

PHILIPPINES Guida T Capellan OIC Computer Programming Division National Census and Statistics Office Solicarel Bldg 1 Magsaysay Blvd

Sta Mesa Manila Philippines

THAILAND

Angsumal Sunalai National Statistical Office Bangkok 1 Thailand

EGYPT

Farag Sedky Mourad Ghaleb CAPMAS (Centeral Agency for Public

Mobilization and Statistics)NASR City Cairo Egypt

L

2 31

SAUDI ARABIA

Shargi R Al-Shargi National Computer Center PO Box 2534 Riyadh Saudi Arabia

Ahmed S Al-Ghamdi Dept Director of National Computer Center PO Box 2534 Riyadh Saudi Arabia

OTHER

Robert W ONeal USREPJECOR APO New York NY 09038

John N Adams Data Processing Advisor DACCA Department of State Washington DC 20520

OR co UNDP PO Box 224 Dacca Bangladesh

Joe Quasney US Census Bureau Washington DC 20233

Howard Brunsman 5715 N Ninth St Arlington VA 22205

Larry Shiller and Julio Ortuzar Delta Systems Consultants Inc 264 Alhambra Circle Coral Cables FL 33134

Nicholas Ourusoff Burpee Hill Road New London NH 03257 (United Nations)

Frederic J Grant IV Systems Development GeorgiaWorld Congress Institute 580 North Omni International Atlanta Georgia 30303

3 32

Rafael Samper Mary Jo Keenan Catherine B Gleason LACDR Room 2246 NS Washington DC 20523

John Marshall AIDSERDMPSE Rm 706C SA-12 Washington DC 20523

Susanne Bacon and Mary K Friday ISPCData Processing US Census Bureau Washington DC 20233

Leo Dougherty ISPCWorld Census Staff US Bureau of the Census Washington DC 20233

33

CON C OR WORKSHOP

January 7-18 1979 Washington DC

Staff

Robert R Bair

Luis Garcia

David Malkovsky

Sandra Mansfield

Selma Sawaya

Vivian Toro

APPENDIX E

CONCOR EVALUATION FORM

0

APPENDIX E

CONCOR WORKSHOP January 16 1980

BASIC COBOL VERSION 2

December 1979 Release

Participant Evaluation of Package

1 What is your impression of this version of CONCOR in terms of its usefulness and reliability for editing housing and population census data in less developed countries

2 If you are familiar with previous versions of CONCOR how does this current version compare to them

2D

35

2

3 How would you compare the utility of this version of CONCOR for less developed countries with other systems for editing data of which you are familiar

4 How well do you feel the documentation (Reference Manual and Diagnostic Messages Guide) supports the software

36

3

5 How well do you think this version of CONCOR would serve for editing other types of statistical censuses or surveys

6 If your organization does statistical data processing by computer would you recommend to your superiors that this software be used to edit the data Do you feel assistance would be required in the installation of this software on your organizations computer andor in training userR on the packages applicability to their work

37

7 In what way(s) could improvements or enhancements be made to this version

of the CONCOR software andor its documentation

8 Other comments

1shy

APPENDIX F

NEWOLD COMMAND COMPARISONS

0 0

OLD CONCOR VERSION 1 DECEMBER 1978

DATA-DICTIONARY

DEFINE INPUT-FILE

DEFINE ERROR-FILE

DEFINE COMMON-DATA

(contains the questionnaire-IDand record type location information)

DEFINE RECORD-TYPE=

DEFINE NEW-DATA

DEFINE ARRAY-DATA

APPENDIX F

NEW CONCOR VERSION 2 DECEMBER 1979

DICTIONARY-DIVISION

DICTIONARY-ATTRIBUTES-SECTION

DICTIONARY-NAME

FILE-SECTION

INPUT-FILE OUTPUT-FILE WRITE-FILE ERROR-FILE

DOENTIFICATION-CONTROL-SECTION

AREA-CONTROL

QUESTIONNAIRE-CONTROL

RECORD-CONTROL

INPUT-RECORD-SECTION

DEFINE-RECORD

WORKING-DATA-SECTION

NEW-DATA ARRAY-DATA

END-DiVISION

COMMENTS

In Version 1 all command keyword labels had to begin in column 1 ofthe coding sheet The position notation of Version 2 permits thecommand to begin in columns 1-72

The period preceding a statement

signifies that it is but a commentcard Accordingly it should be noted that while the END-DIVISION command must be present the division name identifier DICTIONARY-DIVISION

has not been implemented

Note Some parameters for each commandhave been altered even where general correspondence exists

OLD CONCOR VERSION 1 DECEMBER 1978

EDIT-SPECIFICATION

PROLOG

FILTER TYPE ( )

EPILOG

ALLOCATE (ALL) ASSERT (AST)

END ERROR

FILTER TYPE ( ) WITH

IF LET

RANGE (RNG) RECODE (REC) STOP --keyword

UPDATE (UPD) WRITE (WRT) XRECODE (XREC)

NEW CONCOR VERSION 2 DECEMBER 1979

EXECUTION-DIVISION

REPORT-CONTROL-SECTION

COUNT-IMPUTES GENERATE-EDIT-STATISTICS

EDIT-SPECIFICATION-SECTION

PROLOG-ROUTINE

FILTER-ROUTINE (name-of-record-type)

EPILOG-ROUTINE

ALLOCATE (ALLOC) ASSERT (ASRT) DRECODE (DRCD) END-DIVISION

EXIT

GRECODE (GRCD) IF LET LIST OUTPUT RANGE (RNG)

STOP UNTIL UPDATE (UPD) WRITE (WRT)

END-DIVISION

COMMENTS

Periods preceding division and section names signify that they are treated as comments by CONCOR

END-DIVISION signifies the termination of the CONCOR division organization and must always be present even though division identifiers have not been implemented

This figure additionally permits a comparison of the EDIT-SPECI-FICATION commands of both CONCOR language versions Note the changes in abbreviations and that some keyword options (not shown) have been altered even where general corresponshydence exists Individual commands are discussed on the following pages

CONCOR LANGUAGE COMMAND STATEMENTS

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

ALLOCATE (ALL) ALLOCATE (ALLOC) Allows for the assignment of values and with the UPDATE command providesthe imputation mechanism Major difference is this statement now providesfor a receiving-identifier list and statistics are generated as coded inthe GENERATE-EDIT-STATISTICS command option of the EXECUTION-DIVISION

ASSERT fAST) ASSERT (ASRT) The ASSERT command is the basic consistency editor command of CONCOR Command now provides for a NOERROR and MESSAGE option Additionallyby utilizinq the DUMPR DUMPO keywords provides the user with the option of using all the current value of all user-identifiers defined for the record type being processed Other changes to this command include the pass end-pass fail end-fail constructions for the P purpose of maintaining structured logic

(see RECODE) DRECODE (DRCD) Provides a method for recoding data items having a starting value of zero and continue in ascending sequence Replaces t previous XRECODE command

EXIT Provides the means to terminate EXECUTION (ESCAPE) of command statements within the UNTIL command

END END-IF END-THEN

A solitary END command is no lonqer permitted in conditional construction Note the similarity to COBOL pseudocode

END-ELSE

ERROR not implemented No longer supported in language

(continued)

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

FILTER TYPE ( ) WITH not implemented No longer are option statements supported in language This command once specified essentially a GOTO depending on construction

(see XRECODE) GRECODE (GRCD) This new command provides a method for the group recoding of input values

IF IF The IF statement has properties found virtually in all programming languages ie provides a means of evaluating a condition and controlling the execution of alternative logical statements Similar to pseudocode a CONCOR programmer must utilize the structured command IF THEN END-THEN ELSE END-ELSE construction in place of the IF THEN END ELSE END statements acceptable to the 1978 version

LET LET Virtually unchanged

LIST New command which allows for the generation of specific messages to be displayed in the report of edit statistics by questionnaire as well as specific individual identifiers

OUTPUT New command which provides the means by which records will be written to the output-file

(continuej)

4)bull

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 VERSION 1

RANGE (RNG)

RECODE (REC)

STOP- keyword

UPDATE (UPD)

(continued)

DECEMBER 1979 VERSION 2

RANGE (RNG)

name changed

STOP- keyword

UNTIL DO END-DO

UPDATE (UPD)

COMMENTS

As a basic editing command of CONCOR the December 1979 version permits the listing of the out-of-range values as well as the entire record or questionnaire through DUMPR and DUMPO optionswithin the command Another new option of this command involves the utilization of a PASS END-PASS FAIL END-FAIL construction permitting the specification of logicalcommand paths depending upon the outcome of the range test

Name changed to DRECODE Virtually identical with the DRC command in the COCENTS system

Unchanged

DO-END-DO one of the most significant and powerful enhancements to the CONCOR language this command provides a loopingcapability similar to most other high-level pogramminglanguages The number of iterations is under user control as well as the value initially assigned to the user identifier (counter) which is incremented with each successive execution of the command statements

Virtually unchanged it is the basic mechanism for maintaining arrays involved in imputation processes

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

WRITE (WRT) WRITE (WRT) The WRITE language statement while once utilized as the primary output command is now used to create an auxiliary or derivative file Statistics may be accumulated with the specification of the GENERATE-EDIT-STATISTICS option of the EXECUTION-DIVISION (See WRITE-FILE of DICTIONARY-DIVISION)

XRECODE (XREC) name changed Old command which has become the GRECODE command in the December 1979 version

0 0 0

DECEMBER 1978 VERSION 1

DECEMBER 1979 VERSION 2

Not implemented REPORT-DIViSION

DISPLAY-CONTROL-SECTION

DISPLAY-EDIT-STATISTICS

TOLERANCE-CONTROL-SECTION

ERROR-RATE-CHECK REJECT-FILE

END-DIVISION

COMMENTS

This new division (note division and section names treated as comments) provides the means to controlthe type of edit statistics reports to be generated Generally all reports produced by the commands ofthis division are organized according to the AREA-CONTROL command of the DICTIONARY-DIVISION Notethat this means the input data file must be preproshycessed (presorted external to CONCOR) on the basisof the AREA-CONTROL data field as CONCOR is incapableof merging uncontiguous records

There is currently no command to enable users to specify unique report headings on the listingfrom this division

APPENDIX G

ISPC FUTURE ENHANCEMENT LIST

G-1

Appendix G

Future Versions of CONCOR Design Considerations

The present version of the CONCOR system (Basic COBOL Version 2) was designed to provide adequate computing power to properly edit housing and population census da a using automatic correction techniquns As a result CONCOR should not be construed to be the ultimate software package for generalized data editing The developers of this version of CONCOR realize that there is a group of improvements that can be made to the system which would facilitate the census data editing process A second group of changes or enhancements also exist that could facilitate the editing of survey or other types of census data It should be noted that some of the modifications to meet these two objectives at times are mutually exclusive Some of the possible modifications and considerations are outlined below

1 Housing and Population Census Processing

1 1 Software

1111 Dictionary Division

(1) Implementation of the Dictionary Attributes Section This would allow the user to specify the dictionary size (number of pages) on disk and control other file characteristics such as volume specification and retention periods if applicable

(2) Addition of an optional input andor output file(s) which could be used to initialize and save values stored in arrays and used during the allocation process This feature would require additional syntax analysis and the creation of new commands to perform the movement of the data values

(3) An output record format area (Output Record Section) could be added This would allow the creation of edited output files in a different format and length than the file read into CONCOR This implementation would require other enhanceients to provide a simple means of moving values from th-iT i tato tile output file Unique naming of user-identifiers would also need to be addressed Another possiblity is the declaration of both the input and output location in the same command when specifying the contents of tile data record to be edited

(4) Addition of user-identifier naires to fields used in the controlling of summary statistics and unique questionnaire determination This would require modification to the system level data dictionary

46 G-2

(5) Increasing the number of area and questionnaire control fields

This implementation would require modification lo the

dictionary as tell as to the error records passed to the Report

Division Another problem would be the formatting of the extra

data values in the report headings

(6) Permit the mixing of difFerent data item types in the CONTROL

commands This would allow the greatest flexibilty in choosing

control fields In order to implement this enhancement the

dictionary and the generated COBOL program (EDITOR) would

require modification

(7) Give the user access to the values in the control fields during

the editing process Either the user would be prohibited from

changing these values (as is currently done with CONCOR

internal identifiers) to prevent the creation of new questionnaires or the summary information gathered on a

questionnaire basis would become meaningless Another

ramification of this enhancement is how to handle fields defined as alphanumeric as CONCOR internally works in binary

(8) Allow input data records with a similar format to be defined by

the same DEFINE-RECORD command statement This could be

accomplished by permitting multiple values in the RECORD-TYPE

clause of the DEFINE-RECORD command A possible problem here

is in the determination of which value to be move to the output

record when the record is to be written out using the OUTPUT

command 0

(9) Implement a COMMION-DATA concept to handle items command to all

record types This could be costly in terms of core storage

and additional conversion time if a storage area is allocated

for each identifier for each record type Only permitting one

common area for all records would require validation of the

data to ensure the values were exactly the same on each record

This wJould require the CONCOR program to perform many more

internal checks when reading the data file into the store area

(10) Permit the selective inclusion of data items found in the

DEFINE-RECORD command statement into the universe count used

for the determination of tolerance levels when using the COUNT-IIPUTES command statement

(11) Increase the number and types of data format referencing now

permitted External numeric data (type N) could be expanded to

handle 18 digit numbers Also signed numeric and packed

decimal data format could be added The signed numeric format

would allow both leading blanks and negative data values This

format is essential ihen dealing with data files generated by

FORTRAN programs

0

+

G-3 47

(12) Increase the number of comparison strings permitted for alphanumeric data items This would allow for easier processing of alphanumeric strings but would require modifications to the system data dictionary

(13) If a range of valid values were specified with the declaration of a user-identifier in the DEFINE-RECORD command an automatic range check could be performed prior to CONCOR giving the user access to the data items on the questionnaire The inclusion of such values are now done by many users in the form of comment statements This enhancement would also facilitate the use of the dictionary by tabulation software

(14) The language format used in the Dictionary Division could be modified to recognize both the space and comma character as valid delimiters The user would still be required to code two commas to receive the system default value for any parameter

(15) Provide a repetition factor in the initialization of arrays

(16) Print the data dictionary name in the titles of the dictionary documentation or provide a means by which the user may specify a title to appear in the listings

112 Execution Division

(M) Implement the Run Control Section This would allow the user to increase execution speed of the EDITOR program This would be accomplished by eliminating some of the system protection features such as division by zero and out of bounds checking In this secion the user could also specify new maximum values for the EDITOR error limits

(2) Provide an internal identifier (OCCURRENCE-PTR) which would contain the occurrence number of the record currently being filtered The only problem to resolve is how to treat this id-ntifier outside of the Filter Routines

(3) Implement a CREATE commnand This command would allow the user to create a record that Was not found on the input file This implementation would require special care as it gives the user the ability to change the valme of a CONCOR internal identifier (TYPE-CGUNT) and modify the contents of the store Another consequence of this command is what action is to be taken in the calculation of tolerance in terms of this new records inclusion into the universe of observations and number of changes

iK

48 G-4

(4) The code generated in the EDITOR program for the DRECODE

command should be modified to utilized a lookup table approach

This would decrease the execution time of the command but woul C

require additional communication between the GENED and GENDD

programs

(5) Permit the user to continue the message text field over

successive lines of code

(6) Implementation of a SET command command that would change

CONCORs default occurrence pointer for the duration of the SET

command This command would be ever helpful as the validation

of the existence of a record would only have to be done once

when the SET command was encounterd The major drawback to

this command is in the users ability to remember the action

taken during the last execution of the command when the

Execution Division command statements may cover many pages of

code

(7) Implement a method of specifying different data formats for

fields on the WRITE command statement This would give greater

flexibility to the user but would require major revision to the

routines now used to process the WRITE command

(8) Provide the user with two sets of internal identifiers The

first set currently exists in the system and is valid on the

basis of the current control area (if specified) being

executed The second set would be accumulated on a run total

basis and would require the creation of new CONCOR internal

ident ifiers

(9) Implement automatic array declarations for capturing allocation

frequency distributions This could be done by the system for

discrete values (especially if point number 13 above is

implemented) but a mechanism for handling continuous values

would need to be designed nother program would then be

required to format the information gathered into a readable

form

113 Report Division

(I) Provide a means by which the user may specify their own headings or titles -for the reports

(2) Give the user the ability to override page ejection as a means

of saving paper

(3) Provide a neans by which the statistics gathered may be

presented in other aqgregations other than the area-break and

total levels now pro ided This would require another

procedure to aggregate the statistics and pass them onto the

Report Division before the printing of the reports began Thi

extra procedure is required because of program size

cons i derat ions

G-5

49

114 System Level

(1) Generation of assembler (ALC) language code for the EDITOR program to be used on the IBM 360370 series type computers

(2) Modification of the GENSRC program i utilize a lookup table instead of the linear search technioa2 now used

(3) Modification of the RDMSGTXT program to look at continuation lines when searching for the text to be supplied by the system

(4) Hlodification to the system data dictionary to better arrange and make available information needed most frequently Also to examine the possibility of making the section dealing with the message text and report generation a separate file Investigate the industry standards for dictionary format and see how CONCOR meets the requirements set forth

(5) Make the parsing and syntacial routines interactive so as to increase the product ity of the CONCOR user This does not mean to imply that the editing process itself should be made interactive but rather only the development of the user code should be put online

12 Documentation

(1)The development of a self-teaching CONCOR manual

(2) The development of a CONCOR Users Guide

G-6

0

50

2 Survey or other Census Processing

21 Software

(1) Make optional the specification of the record type and questionnaire identification information as some files are not

hierarchical in nature

(2) Provide another input file as a means of doing a check-in procedure of sample cases expected This new input file would contain the master list of those oberservations selected to be

examined

(3) Allow floating point calculations

(4) Provide a mechanism for referencing repetitive sections of questionnaire This referencing (loop letter approach) could be accomplished in the same fashion as interrecord checking is now implemented Note that by using this approach two versions of the software would be required One version would be as CONCOR is now implemented and the second would accept flat data files where the subscript indicated a repetitive section on the survey document

(5) Add a weighting scheme to allow meaningful totals and summary

statistics based upon the sampling frame used to gather the data being edited

(6) Increase the number of user-identifiers allowed in the system because of the more detailed information found on survey forms

(7) Provide a means of manual correction of erroneous data items This correction scheme would utilize the data dictionary and allow for correction based upon the user-identifer name and

questionnaire identification CELADE has a COBOL version of

this program but it has not been finished nor tested

42

APPENDIX H

CONCOR SYSTEM INTERNAL VARIABLES

0 APPENO H

CONCOR SYSTEM INTERNAL VARIABLES

1 EOF-FLAG

2 ERROR-TN-QUESTIONNAIRE

3 CURRENT-POINTER-VALUE

4 TYPE-COUNT ( )

5 RECORD-COUNT

6 QUESTIONNAIRE-COUNT

7 RECORDS-IN-STORE

8 INVALID-RECORD-TYPE-COUNT

9 INCOMPLETE-FLAG

10 CONTINUATION-FLAG

11 INVALID-RECORD-FLAG

EOF-FLAG

ERRORS-IN-QUESTIONNAIRE

TYPE-COUNT ( )

IN-RECORD-COUNT

IN-QUESTIONNAIRE-COUNT

RECORDS-IN-STORE

INVALID-RECORD-TYPE-COUNT

INCOMPLETE-FLAG

CONTINUATION-FLAG

OUT-OF-RANGE

NOT-NUMBER

BLANK

(new internal counters) OUT-RECORD-COUNT OUTPUT-QUEST-COUNT WRITE-RECORD-COUNT

Change in spelling of variable

Not mentioned--possible ommission from manual

Not implemented

Note Current documentation equivocates when some variables

are reset--most are initialized with each control area break

Though implemented as reserved identifiers due to poor documentation of (old) CONCOR exact values and utility were unknown

These apparently new counters permit access to valuable totals within control areas

Note These are not cumulative counters Also outputshyquest-count violates naming conversion eg in out

APPENDIX I

CONCOR-EDITOR EXECUTION STATISTICS

C 0 N C o R - E D I T O R EXECUT ION S-TA-T1 ST I CS RUN DATEt 517I

APPENDIX I

CONTROL AREA 0 INPUTS RELD OUTPUTS BY OUTPUTCOHMAND OUTPUTS BY W0ITECOMP4JD NO VALI 0 SEQUENCE NUMBER 00000 00 0 RECTYPES 0

OUESTIONNAIRES RECORDS 0 QUEST KEYWORD- 0 ECORDS- RECORDS FORQUFSTshy

bullDIVISION BY ZERO LINE NUMBER i18 to DIVISION BY ZERO LINE NUMBER 118 bo DIVISION BY ZERO 00 LINE NUMBER 118

Oboo DIVISION BY ZERO 00 LINE NUMBER 118 LTNE |UJMP 000 DIVISION BY ZERO 000 LINE NUMBER 118

bo DIVISION BY ZERO 0 LINE NUMBER 118 00 RUN ABORTED DUE TO DIVISION BY ZERO ERRORS 00 1it LIST IR003 = Ro3t

1 6 42

112 LIST

Comments One of the most sever criticisms 114 LIST IDIVISIION BY ZErO ERROR Cw FOLLOWS of the old CONCOR package was that ]l LET RO01 003 = O it would cease processing for no

apparent reason--ABEND without generating any messages The most )16 LIST iRO01 R003 SET TO ZEROS Pou19 Ron3 conmon types of cessations are division by zero array coordinate 117 LIST RO0 0 = errors and user specified terminations A program to deliberately 118 LET P003 = R002 RO01 test CONCORS ability to providediagnostics was written The system 119 LIST LIST R003 R002RO01 =1 R0031 perfoned exceptionally well and correctly provided the line numbers in the 120 source program which caused the data exceptions The program text has been 121 superimposed upon the Execution Statistics page for illustration purposes 122 END TEST OF OPERANDS AND MSG

123

124

1 5 PEGISTERS RESET TO ZERO

125 LET R001 RO00 R003 = nt

0 N C 1 - Z D I T 0 it - ( r S T A T I S T I C S QJi DAT 0119

I ~ - ~APPENDIX b)APPENDIX TL I A Cr14A) 01 3TUTS JRtTF -0 VtLIQ AP-A I PUT R - 3ITD JTS LY UTPUT - Y C

I)ESTIjAAIES - PECLRW cU] I S T igry-t QEC UP )S irrkqS F] ) T

VLLII= FF= 15( C 1 u U-1 Tr JJ TI A)[ 9 )i 1= TI =WATE ) L i - U = VI L IJ) L T J3r-= I n

D 1 4A 1 EItR P IXSTI J A 1 L = 7 II I fIAT E I i Y C

= y 4 I 19f CZ U AT -Z-2j 1ESTIJh14 E CJ)PI) IATE= 2 V L 1E= L T I P1 =UM 11

= rU F r

F R L- -- I JE TI rjA OrPR eTE= rUMiF=E II V1 LIIF= LIT V M

tYfZU ttj gR R QUIESTI3Ii1 j Vi r IATE= LI n l FJw== 0R V J L T C ( -pJIT E~Q - cUESTT Y2ZjI 1= g rfiflUI)IATL= VeLUJE= P LIf PP= 1 3)

bull~ - A ED 011 1 Tn CCy -tENCE C rGR DI T FOR UP S )r 0 JCCUJ0EICF FJ71 ECGPO

173

175 UNTIL ROO1 gt 10 VARY P001 FROM 1 By Comments

In this example RO01 a 2 dimensional array 177 DO with 10 rows and 10 columns is attempting

178 UNTIL R002 - 10 VARY BO FROM 1 BY 1 to update the smaller (5 X 5) TA2 Nonshy- Uexistent element references caused the error

The program text has been superimposed on this

page for illustration purposes

180 ALLOCATE NOOP = TA1 (RO0] ROO)I

JAI UPDATE TA2 (RO01 R002) RO031

182 LIST N002 = R0037 NOO R003

1P3 LIST VALUES OF TA1 = RO03v TAl (RO01 R002)

184 LIST

185 LIST ITA2=TA2 RO01 Rn02)l

l6 END-DOI LIST

187 END-DO

deg r- -- shy

APPENJDIXI1 J( j

CONTROL AREA INPUTS READ OUTPUTS BY OUTPUT LOMANO OUTPUTS TEltCOMMNDfN

SEQUENCE NUMBER 11T A QUESTIONNAIRES RECORDS QUEST KEYWORD RECORDS SFf- 0 T rr-

JOB STOPPED DUETO USER STOP-RUN COMMAND ON LINC 215 - - -- 2 15 _ _ 4$ A

EDITOR -- NO RMAL END OF J B r T I1)

214 IF IN-RECORD-COUNT gt Comments In this example when the internal variable1215 ITHEN STOP-RUN END-THEN was than

215 TE END-THEN IN-RECORD-COUNT was greater than 500 processing was directed by the source COBOL CONCOR language

216 1 program to cease The program text has been - upon this page for illustrationI _-EC superimposed

S IF IN-RECORD-COUNT- lt 2 - - purposes ~~~~~~~~~~- bull - - ------I-

218~ THEN -LIST lt(THAN 20RECS INR

amp-~~~~~ ntai-beDcent

I

APPENDIX J

DIAGNOSTIC MEFSAGE GUIDE EXAMPLE

APPENDIX J DD-2

WARNING(DD-001) BLANK COMMAND STATEMENT LINE

EXPLANATION A blank command statement line was found in the user Dictionary Division command statement file

ACTION TAKEN Parsing continued with the next command statement line in the Dictionary Division command file

USER RESPONSE Put a comment indicator () in column 1 of the command statement line if spacing is desired if not remove the command statement line

ERROR (DD-002) ILLEGAL FIRST CHARACTER IN STRING

EXPLANATION A character not in the CONCOR legal character set (A-Z 0-9 - + the comma character () the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found as the first character in the string

ACTION TAKEN Parsing began again with the next string

USER RESPONSE Check for possible keying errors and ensure that the first character is a member of the valid CONCOR

character set

ERROR (DD-003) END OF LINE REACHED WITH NO DELIMITER FOR ILLEGAL STRING

EXPLANATION A string starting with a character not valid in the CONCOR legal character set (A-Z 0-9 - + the comma character ()p the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found without one of the delimiter characters ( =) when the end of the command line was reached

ACTION TAKEN Parsing began again with the next user command statement line in the Dictionary Division command file

USER RESPONSE Check for possible keying errors and ensure that each of the characters in the string is a member of the CONCOR valid character set (A-Z 0-9 - + the comma character () he delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ())

ERROR (DD-004) COMMENT INDICATOR () IN STRING

EXPLANATION The comment indicator character () was found embedded in a string

Page 8: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …

4

a LOADUNLOAD arrays Commands which would save and replace automatically hot-decked values from batch to batch

b TOTAL-QUESTIONNAIRE-COUNT-RECORD-COUNT internal

variables independent of AREA CONTROL

3 Other modifications

a Default values for max-storage parameter set in realistic range

b Allowance of more variables for survey applications

Some of these modifications are part of what ISPC calls its wish list for the future development of CONCOR This document has been included in this report as Appendix G It is arguable that these features are essential to the completion of the CONCOR package While it is beyond the scope of this report to draw a conclusion in this area the enhancements as outlined above are ones that would make the language more internally consistent and thereby easier to learn and apply to a census data production environment These modifications are not arbitrary or cosmetic but are a direct result of handsshyon programming experience in the language as well as observations and disshycussions with other workshop participants While it is probably impossible to ever be satisfied with the overall structure of any programming language the resolution of this issue of completeness must be made relative to the objecshytives for developing the COBOL CONCOR system in the first place An explicit statement of these objectivEs has been absent in all systems documentation to date

5

III PROPOSED CHANGES TO THE LANGUAGE STRUCTURE

Based upon the assumption that it is the intent uf sponso-ing agenciesto optimize the COBOL CONCOR package -- a goal which is believed currentlyobtainable -- an understanding of the nature of these changes and how theywould impact users is essential Appendix F sets forth in a comparative manner differences between the old December 1978 and the new December 1979 editions of CONCOR Studying this appendix obviates the fact that while the new version of the language is clearly superior to the old in nearly everyaspect the basic and overall structure of the language is essentially unshychanged Compartmentalization of aspects of the language into divisions represents a significant ideological enhancement to the language Indeeddevelopment of programs by divisions proved to be an extremely useful way of understanding the nature of editing work to be performed However note that while the END-DIVISION comnand is essential to the language the division headings DICTIONARY-DIVISION EXECUTION-DIVISION and REPORT-DIVISION were not implemented and are therefore preceded by a period to be treated as comment lines in the program listing It is inconsistent to implement END-DIVISION commands while not implementing the division headings It is believed that this division structuring is important enough to the overall organizational structure of a CONCOR source language program that it should be implementedprior to general distribution The section headers shown on the figures inAppendix F however are another matter They are cumbersome and were generally not coded by workshop participants and they could be deleted from this version of the language altogether with little loss in organizational understanding The CONCOR language is significantly powerful to stand on its own as a distinct product and is not meant to be a COBOL imitation its present degree of development and specialization do not warrant the structural drag of additional section identifiers is the probable intent of the originalCONCOR project was to develop a package which was uncomplicated and unwieldy to use The question of division and section names implementation while seemingly cosmetic can have real impact on its perceived easiness of learning and use

Figure 1 on the following page illustrates common mistakes programmersmake coding numeric and alphanumeric variables in the DATA-DIVISION These mistakes are the result of the inconsistent variable formats For instancein the numeric data definition statement it is permissible to specify 19-23 where N signifies numeric 9 signifies the length of the item and 23 specifiesthe starting position in the record In NEW-DATA however it is possible to code an item with a maximum length of 18 While on the surface this inconshysistency would seem harmless typically some data defined user variables in NEW-DATA defined N18 could be moved inadvertently to output record fields defined by a data definition statement 119-23 such an action would result in a data error Under certain circumstances itwould be highly desirable to output these larger length values A similar circumstance exists between the numeric and alphanumeric data coding conventions While the maximum lengthof the numeric is permitted to be 9 in the data definition statement (18 in NEW-DATA) the maximum alphanumeric variable is permitted to be only 4 characters in length In the current systems manual it is recommended that

FIGURE 1

DICTIONARY-DIVISION

DICTIONARY-NAME DATA-CODING-EXAMPLE

INPUT-FILE

OUTPUT-FILE

AREA-CONTROL N2-2 N2-4 N3-6 N2-9 N9-23 QUESTIONNAIRE-CONTROL A4-2 A3-6 A2-9 A3-11 A3-14

RECORD-CONTROL Al-l

DEFINE-RECORD

HOI-TYPE-OF-HOUISING-UNIT Nl-17

H02-MATERIAL-OF-ROOF N1-19 10 9

H03-TOTAL-PERSONS-IN-UNIT N8-40 NOT-NUMERIC BLAIK

1104-STATE-OF-UIIII-CODE A4-50 0 U 1 D

DEFINE-RECORD

P01-SEX 1-13 W F

NEW-DATA

NOI-SAVE-TYPE-OF-HOUSING-UNIT

N02-SAVE-TYPE-OF-ROOF 1

N03-COUNT-TOTAL-IN-UNITS 10 0

N04-AGGREGATE-INCOME 18 0

END-DIVISION

Explanations

N2-4 This is an example of an external numeric input data item (N) with a length of 2 bytes starting in column 4 of the input record The maximumlength of this type of variable outside of NEW-DATA is 9 When coded in

NEW-DATA 18 is permitted

A4-2 This is an example of an external alphanumeric input data item (A)

with a length of 4 bytes starting in column 2 of the input record This

construction for alphanumeric variable is valid only in the control stateshyments Additionally it can never be over 4 bytes in length When alphshynumeric data fields are defined within record types the EDITOR program

requires that the comparison strings always be specified A maximum of 3 is permitted The purpose of these strings is to force recode the data to a numeric value If no match is found EDITOR automatically assigns a unique negative value to the field

7

alphanumeric coding be utilized in the QUESTIONNAIRE-CONTROL and RECORD-CONTROL statements where each input data item must be of the same data type as shown in the example When alphanumeric data variables are used in these control stateshyments their construction is identical to that of numeric items However when used elsewhere in the DATA-DIVISION alphanumeric variables are required to specify one of three possible comparison values as shown There are number of production instances when it never would be necessary or even desirable to reshycode alphanumeric data However as CONCOR attempts to force data into a totally numeric format upon output there is no current way to preserve these values if desired

An unwieldy alternative to this situation which may be acceptable under some circumstances would be the expansion of the number of comparison stringsfrom three to a more realistic number The limitation of this compromise is that a full twenty-six comparison identifiers would be required in order to accommodate data which utilized the entire alphabet A better solutionhowever would be to make the general format of the alphanumeric variables identical to that of numeric identifiers ie A9-23 and to permit alphashynumeric values so defined to pass unaltered through the CONCOR system

Anocher data-naming convention which caused several errors and which could be corrected concerns the array data definitional statements While arraysof two and more dimensions are handled in a superior manner by the CONCOR proshygram single-dimension arrays pose a problem in coding as shown in the Figure 2 It is suggested that the command imperatives be changed to permit the codingof both rows and columns in single dimension arrays ie allow a single row vector as well as a single column vector to maintain the consistel -yof the array data definitional statements

A major requirement of COBOL CONCOR file processing concerns the fact that all related data records must be physically contiguous on the input file The implication of this requirement is that files may require preprocessing prior to actual data editing (This preprocessing is usually a sort routine upon a selected CONTROL-AREA key) While this type of processing merely introduces a new step in file processing a major limitation becomes apparent when a largenumber of DISCRETE DATA files of the same census or survey questionnaire are to be processed This limitation is the introduction of manual steps to save the most recent inputed values ie preventing the program from startingwith cold values each batch run If a command such as LOADUNLOAD ARRAYS was incorporated into the language (an enhancement not believed to be difficult to implement) manual processing would be reduced to a minimum between batches and the maximum benefits of the hot deck methodology would be realized It is envisioned that such a command would automatically insure the transfer of the appropriately designated hot values Automatic processing of this nature if done correctly can greatly reduce the time required to clean multishyvolume files for once CONCOR language statements have been compiled linked

While it is possible at this time to save the arrays that amp-e used in the imputation processes on a separate write-file right now it is not possibleto automatically load those values back to an object program and to iTmedishyately resume processing on another volume It isbelieved that suh an automatic feature of the language would cut down the manual processing time significantly enough that it warrants inclusion into the package prior to its general distribution

FIGURE 2

A05-DI FF-BETWEEN-AGE-OF-FEMALE-BY-RECATION v2 4 4

AGE iF LHUSBAND RELATION Connents The ARRAY-DATA command statement

12-L7 18-24 25-35 36+ provides the means to declare array identifiers

2 1 3v 4v HEAD with up to five dimensions Current documentation 2 -1 3 CHILD is not as explicite about the rules of this 1 31 -2 -4 OTHER command as is desirable The parameters of I 2 2 2 NONRFLATTVE the command should function as follows

user-identifier number of dimensions D R C M number of rows number of columns

magnitude of element intiial start up valuesA06-DI FF-BETI4EEN-ArE-DF-PERSON-AND-M4OTHER 114

(This coding generates the below In the example A05 is a two dimensional array 16 18 21 23 error message) with 4 rows 4 columns a default magvitude of 9

and cold deck values as labeled

A06-DI FF- ETWEEN-AGE-OF-PERSON-AND-MOTHER 11t 587

I 2 III jqARNINIGDD-207) COMMAND TERMINATOR I) NOT FOUND C) ASSUMED PRESENT (2) EPROR (DD-9lI) DIMENSION OF USER-SPECIFIED ARRAY IS LESS THAN THE MINIMUM VALUE PERMITTFD (2)

PREEV1OUS DIAGIOSTIC AT CINIE 563

As shown by the array variable A06 CONCORs treatment of vectors is not consistent with the above multidimensional array skeme ie

(Example of how vectors must be currently A06 must be coded as follows coded to be correct)

A06-DIFF-BETWEEN-AGE-OF- PERSON-AND-MOTHERtl42 user-identifier 1 dimension number of elements in vector magnitude of element initial start up values

A simple modification to this command would permit 6 Lthe coding of both row and column vectors and make

16 LB 21 23 this command less error prone

0 0 0

9

and stored as an object module on the system no other compilations should berequired for questionnaire processing files of the same type Theoreticallya single well-written CONCOR program is all that would be required to process an entire census run

Appendix H contrasts the internal identifiers of the old and new languageversions Without such identifiers a user would have little information about the status of input as it is processed by EDITOR As noted in theappendix most internal pointers are reset upon each break in the CONTROL-AREAprovided a CONTROL-AREA has been defined The limitation here is that there are obvious instances when the termination in the processing mode would beadvantageous based on run counts although a CONTROL-AREA has been specifiedeg debugging CONCOR programs or comparing input files Therefore another set of pointers should be implemented for this purpose and made available for programmer reference

One clearly disturbing development which needs to be pursued during inshydepth testing of the system concerns the MAX-STORAGE parameters of the DEFINE-RECORD statement As shown in the figure on the following page when MAX-STORAGE was set equal to the maximum value a COBOL program was generated whichrequired 1O00K of core to run The MAX-STORAGE value of 999 is clearly notrealistic under most processing circumstances This example drives home severalimportant points about CONCOR The core requiremenis of CONCOR generated proshygrams can be influenced significantly by the amount or nature of programmerspecified I0 operations In fact it is possible to generate a program of a size most foreign country machines could not process It is recommended that tests determine a realistic max-value restriction for implementation to prevent problems in this area

The final area of recommended modification concerns the newly implementedREPORT-DIVISION The purpose of the REPORT-DIVISION is to enable a user todescribe or specify certain CONCOR language statements which will generatestatistical reports These reports contain statistics generated by EDITOR as specified by the GENERATE-EDIT-STATISTICS command of the EXECUTION4-DIVISIONAll of the reports produced are organized according to the data fields definedby the AREA-CONTROL command of the DATA-DICTIONARY If the AREA-CONTROL command is not defined in the DATA-DICTIONARY then all the statistics aresummarized at the total run level If a control area field is defined then allstatistics will be summarized for each unique CONTROL AREA as encountered bythe EDITCR program on the input file Statistics by total run level will notbe available This in part relates back to previous discussions citing theneed for new internal identifiers Report listings may contain the values ofentire records or entire questionnaires depending upon the keyword used inthe report generation commands The problem centers upon the homogeneity of CONCOR printouts during a production run

It is virtually impossible to distinguish reports on the basis of thevolumes they were run against Some means should be provided to allow users touniquely and purposefully label the reports generated in this division Indeedthe whole name REPORT-DIVISION suggests that such a command is implicit andappropriate Such a LABEL-REPORT or REPORT-FILE command along with file inforshymation from the system should not be difficult to implement

FIGURE 3

C O N C o R PAGE

SC0 N S I S T E N C Y A N D C-0 R R E C t I ON ) RUN 6AE 0hi88n

E D I T A N D I M P U T ATI 0 N S Y S T EM

USER DICTIONARY DIVISION-SOURCE LISTING

LINE NUMBER

7070

71

72 DEFINE-RECORD RECORDNAME= HOUSING-RECORDi 72

73 MAX-STORAGE=- 999 73

74 RLrRD-TYPE 11t NOTE AN LITERAL 74

267

71

P6T

268 DEFINE-RECORD RECORD-NAME= POPULATION-RECORD p68

P69MAX-STORAGE= 999269

270RECORD-TYPE= F1270

IEF2R5I JES2JOB04966SO0114 SYSOUT iEF2851 SYS8nO18T095553RAOOOAMCP12ARO000005 PASSED

= IEF2851 VOL qFP NOS TmPOn2 IEF285I wHKHDGMCONCORP12AERRORRECS KEPT

= IEF2851 VOL SEP NOS TMPO03 IEF3731 STEP LOAD START 800181001

IEF3741 STEP LOAD STOP 800181002 CPU OMIN 0165SEC SRR flMIN O25SEC VIRT iO00K SYS 3

STEP START TIME - 100155 STEP END TIME - 100257 STEP CPU TIME - 165 SECONDS STEP COMPLETION CODE - C 0

O TAPES - 0 DISKS - 0 PEGION (REQUESTED - 1000K USED - 1000K) IO COUNT 211

CARDS READ - 0 CARDS PUNCHED - 0 LINES PRINTE6 - 22

STEPNAME - LOAD CHARGE (MACHINE UNITS - 1198 DOLLARS- S-77 5TONUM9E 1

11

Concluding Remarks of System Modifications

Version 1 of the COBOL CONCOR language was notorious for abending without the generation of diagnostic messages Appendix I sets forth a test of some of these historical problems In eac) instance the system provided messagesand source program line references wLch enabled immediate identification of the processing difficulties However tests of the various commands revealed the general high performance character of the language in its newest version Under these circumstances the recommended modifica-ions should be considered as ones which will uniformly allow CONCOR to realize t-r potential of its design enhance its utility and moreover endure as a software product

12

IV THE ADEQUACY OF COBOL CONCOR DOCUMENTATION

The current documentation of CONCOR consists of a newly developed relativelyvoluminous systems manual a Diagnostic Message Guide and some loose-leaf inshystructions concerning the installation of the system on IBM computers Of these three documents the Diagnostic Message Guide is clearly complete and adequateand could bear no changes except to provide message number tabs to enable a programmer to more quickly locate the text of a message number An example of the superior format of this guide appears as Appendix J

Conspicuously absent from the list of systems documents is a Users Guide as originally specified in the ISPC enhancement proposal

A thorough assessment of the Systems Manual has been undertaken and indishycates with some reorganization and editing it could be made a more useful and usable document as itis well constituted informationally In its current form the Systems Manual can be best described as a working document a document which was not intended to teach the CONCOR language but centered upon an audishyence already familiar with previous CONCOR releases Further it is not asystems manual per se as some of its components and appendixes could more appropriately compose a users guide For example excerpts from the POPSTAN publications which review the basic editing concepts and contain the POPSTAN example problem should be deleted from the systems manual and would be more contextually effective in the Users Guide The EXECUTION-DIVISION chapter of the Systems Manual would especially benefit from general editing and reorganishyzation (Specifically the contents of pages 90-92 should be moved to page 63 -shythese pages explain the placement and purpose of the three permissible routines of the CONCOR language Had the author been granted more time he could have outlined additional specific modifications to bring existing documentation to a level of adequacy) However some guidelines stand out

1 Installation instruction and considerations should compose a chapter of the Systems Manual rather than a separate document (Hand-out materials covering installation were not adequate) Further a documentation scheme setting forth how CONCOR was installed on each computer and any modifications required during installation could be serially added to and become a permanent part of the installations systems manual If CONCOR is going to be used and if agencies are going to actively support it as a package installation limitations must be known and well-documented The prescribed format for this documentation should be set forth and any modifishycations to CONCOR should be continuously updated on these shcets so that a history of changes to the language will be readily accessible to supporting agencies If CONCOR is to survive as a standardized system editing results must be capable of being replicated among installations Such a form could contain other information as well concerning the tyce of computer amount of care and nature of utilities supporting

13

users Upon installation a copy of this form could be

sent to the US agency which will ultimately be responsible

for supporting the CONCOR package

an appendix2 A complete COBOL CONCOR program should appear in

for reference

3 The development of the Users Guide should include an intensive

review of the editing concepts involved in processing census

data files beyond the POPSTAN materials

4 An explanation of the CONCOR benchmark program syould appear

in the Users Guide and the Systems Manual The running of a

supplied benchmark program should be a standard installation

protocol used to test all operational aspects of a new

installation

This tool which has5 The development of a CONCOR pocket card been proven useful in teaching and assisting programmers in

utilizing programming language lays out all commands options on

a single small card An example of such a pocket card is the

Such a small document is useful in clari-IBM Assembler Card fying the language and setting forth structural rules without

continual reference to full-size manuals

14

V CONCLUSION

In a situation of extreme need CONCOR could probably be utilized as it seems capable of performing its editing functiens Its usefulness as a data

cleaning tool is generally undisputed The importance of having CONCOR exshyhaustively tested by an independent agency cannot be over-emphasized in light of prior failures of the system to perform adequately Additionally core requirements and processing speed should be precisely determined One must be encouraged in the amount of progress which has been made in a relatively short amount of time towards the completion of the CONCOR package While it is clearly beyond the scope of this report to estimate the cost and the amount of time required to implement the herein outlined highly desirable changes to the CONCOR language and documentation and then exhaustively test the system it is believed that this process could be completed within a single quarter (The changes are of the nature that no structural redefinishytions of a systematic nature would be required) These enhancements would make the COBOL CONCOR package more complete consistent and reliable in the field ISPC has competently moved the system to the point that its comshypletion iswithin reach

Given the probable revision of the current Systems Reference Manual and the development of a suitable Users Guide document it is likely that this version of CONCOR could be taught to experienced programmers (unfamiliar with previous versions of the language) in as little as four weeks And while one is impressed by the potential power and overall simplicity of the COBOL CONCOR language it is difficult to envision individuals with no real computer programming experience utilizing its full capabilities Hence the importance of including a review of basic editing concepts in the documentation is undershylined

Further as documentation is often critical to the perceived ease of use of computer-based systems itwill be essential to have these various manuals examined independently prior to their general use

As previously mentioned Appendix G represents features of which the ISPC feels could have positive impact on CONCOR Depending upon the outcome of the testing recommended in previous chapters and the availability of funding supporting agencies may desire to look again at an assembler (ALC) version of the language

Overall COBOL CONCOR remains an intermediate product yet to realize its potential which is thought considerable Software development of this nature is characteristically of a long-term evolutionary design In light of this reality agencies should be prepared to evaluate such projects success or failure in terms other than that of immediate utility It is the philosophy underlying the system not the system itself which will ultimately be exported

APPENDIX A

Bucen Enforcement Proposal

APPENDIX A

BUCEN IN-HOUSE REVIEW ENHANCEMENT PROPOSAL

1 Easy to use interrecord referencing

2 Improved output file capabilities

A provide overflow protection on WRITE command

B provide OUTPUT command to create corrected questionnaire file that matches IP file data dictionary

3 Improvededit statistics reported (LISTERR)

A provide automatic (user-specified) area break

B provide options for compilation and displaying edit statistics at various levels

C provide automatic (user-specified) tolerance checking of error rates by area

D automatically capture IDs of areas failing tolerance check

4 Clean up known bugs in code

5 Comprehensive testing

6 Clean up and enhance documentation

A reference manual more examples error message guide

B installation guide

C systems manual

APPENDIX B

EVALUATIVE CRITERIA

APPEiUIX B [4 si 11 T q tII

UNITED STATES GOVERNMENshy

Pfern0 KGa 7 dll TO DSPOPFPSD Robert H Haladay

DATE December 3 1979

DSPOPDEIO Liliane Floge

SUBJECT Criteria for Evaluation of SuCen COBOL CONCOR September 30thVersion as presented at January 1930 orkshop

The following points should be considered when evaluatingCONCOR the COBOLediting system during participation at the January 19380 workshyshop

1 Are the Installation anual the System ilanual and theUsers anual ready for use Are these manuals adequate fortheir purposes Are the manuals clear and consistentthey be understood by subject-matter personnel Can

as well program ers

as

2 Is the editing system capable of impleaenting all thefeatures as described in the documenit_tion 3 Does the system appc3Ar to be ely debugged andtested and ready for release to dLvlooing conris 4 Does the system appear to be fast enough to edit a

census in a reasonable airount of time 5 hat size core does tine sys t- equire

6 CorFm-ent in general on coole te-nss usefulness andefficiency of ir-nease usethe n of by developingcountry prograrrers and other c nsus personnel

cc Su-a-- e Olds APiA SDi OrWJulio Ortuizar luotolaCelta Systems

APPENDIX C

WORKSHOP ITINERARY

APPENDIX C

CONCOR Workshop Schedule January 7-18 1980

U S Bureau of the Census International Statistical Programs Center

Scuderi Building Room 209 4235 28th Avenue Marlow Heights Maryland

Monday January 7

930 am shy 1000 Welcoming Remarks

Overview of Workshop

1000 - 1000 Introduction to CONCOR - Purpose and function

- History of development - General computer

requirements

115 - 200 System Description

-Constraints in design of CONCOR

-Basic subsystems of CONCOR

-User interactions with system

-Examples of outputs produced

1030 shy 1045 Break

1045 - 1200 Editing Concepts - Ways to interrogate

data - Ways to correct data

- Editing housing and

population data - POPSTAN

- Advantages of CONCOR

1200 shy 115 pm Break

200 - 230

230 - 245

245 - 325

User Program Organization -Divisions - Sections

- Routines - Commands

Break

Command Language

Description -Types of statements -Format -Syntax

is

2

Tuesday January 8

Dictionary Division Command Statements

115 pm-2 15 Input-Record-Section930 am-10

30 Punctuation - Define-RecordInput data referencing

Working-Data-SectionWorking data declara-- New-Datations - Array-DataInternal data represen-

tation and storage Break215 - 230

1030 - 1045 Break 230 - 325 Dictionary Examples

- Minimum dictionary1045 - 1200 Dictionary-Attributes-

structureSection - Maximum dictionary- Dictionary-Name

structure

- Hand out dictionaryFile-Section problem- Input-File

- Output-File - Write-File

- Error-File

1200 - 115 pm Break

Wednesday January 9

115 pm-2 1 5 Execution Division Command

930 am-1030 Discussion of Partici-Statements (continued)pants Dictionary

- Routines of Edit-Specishyproblems fications-Section

Free work time -Prolog-Routine -Filter-Routine -Epilog-Routine

- Types and functions 1030 - 1045 Break

of edit specification

commands1045 - 1200 Execution Division

Command Statements - Range- Punctuation - Assert- Subscripting

- Internal Identifiers - Report-Control-Section

-Generate-Edit-Statistics 215 - 230 Break-Count-Imputes

-Examples 230 - 325 - PassFail clauses

- List1200 - 115 pm Break

3

Thursday January 10

930 am-1030 Discussion of Problems - If115 pm-215

Free work time - UntilExit - Stop

1030 - 1045 Break 215 - 230 Break

1045 shy 1200 Edit Specification Command Statements 230 - 325 - Drecode

(continued) - Grecode

- Allocate - Update - Let

1200 - 115pm Break

Friday January 11

930 am-1030 Edit Specification 115 pm-3 25 Hand out Edit Specishyfications as problemCommand Statements

(continued) Free work time

- Output - Write

1030 - 1045 Break

1045 - 1200 Report Division Command Statements - Display-Control-

Section -Display-Edit-Statistics

- Tolerance-Control-Section -Error-Rate-Check -Reject-File

-Report Examples

1200 - 115 pm Break

I

29

Monday January 14

930 am-1030 Discuss procedures for running problems on computer

1030-1045 Break

1045-1200 Component Programs of the CONCOR system

1200- 115 pm Break

Tuesday January 15

930 am - 325 pm Free work time

Wednesday January 16

930 am 1200 Free work time

1200- 115 pm Break

115 pm-215 How to Install CONCOR on IBM 360370 OS

215- 230 Break

230-325 Free work time

Thursday January 17

930 am-325 Free work time

115 pm-215 Future CONCOR Design Considerations - for census processing - for survey processing

- manual correction system

215- 230 Break

230 - 245 Evaluation Guidelines

- Hand out evaluation forms

245 - 325 Free work time

Friday January 18

930 am-1030 Free work time 115-325 Free work time

1030 - 1045 Break

1045 - 1200 Group Discussion on CONCOR - submission of written evaluations Distribution of IBM installation tapes Presentation of Certificates to Participants

1200-115 pm Break

APPENDIX D

PARTICIPANTS

APPENDIX D

CONCOR WORKSHOP January 7 - 18 1980 Marlow Heights MD

PARTICIPANTS

KENYA

James Midianga Government Computer Centre Central Bureau of Statistics PO Box 30266 Nairobi Kenya

PANAMA

Omar Rivera Rios Data Processing Specialist Contraloria General de la Republica de Panama Apartado Postal 5213 Panama 5 Panama

PHILIPPINES Guida T Capellan OIC Computer Programming Division National Census and Statistics Office Solicarel Bldg 1 Magsaysay Blvd

Sta Mesa Manila Philippines

THAILAND

Angsumal Sunalai National Statistical Office Bangkok 1 Thailand

EGYPT

Farag Sedky Mourad Ghaleb CAPMAS (Centeral Agency for Public

Mobilization and Statistics)NASR City Cairo Egypt

L

2 31

SAUDI ARABIA

Shargi R Al-Shargi National Computer Center PO Box 2534 Riyadh Saudi Arabia

Ahmed S Al-Ghamdi Dept Director of National Computer Center PO Box 2534 Riyadh Saudi Arabia

OTHER

Robert W ONeal USREPJECOR APO New York NY 09038

John N Adams Data Processing Advisor DACCA Department of State Washington DC 20520

OR co UNDP PO Box 224 Dacca Bangladesh

Joe Quasney US Census Bureau Washington DC 20233

Howard Brunsman 5715 N Ninth St Arlington VA 22205

Larry Shiller and Julio Ortuzar Delta Systems Consultants Inc 264 Alhambra Circle Coral Cables FL 33134

Nicholas Ourusoff Burpee Hill Road New London NH 03257 (United Nations)

Frederic J Grant IV Systems Development GeorgiaWorld Congress Institute 580 North Omni International Atlanta Georgia 30303

3 32

Rafael Samper Mary Jo Keenan Catherine B Gleason LACDR Room 2246 NS Washington DC 20523

John Marshall AIDSERDMPSE Rm 706C SA-12 Washington DC 20523

Susanne Bacon and Mary K Friday ISPCData Processing US Census Bureau Washington DC 20233

Leo Dougherty ISPCWorld Census Staff US Bureau of the Census Washington DC 20233

33

CON C OR WORKSHOP

January 7-18 1979 Washington DC

Staff

Robert R Bair

Luis Garcia

David Malkovsky

Sandra Mansfield

Selma Sawaya

Vivian Toro

APPENDIX E

CONCOR EVALUATION FORM

0

APPENDIX E

CONCOR WORKSHOP January 16 1980

BASIC COBOL VERSION 2

December 1979 Release

Participant Evaluation of Package

1 What is your impression of this version of CONCOR in terms of its usefulness and reliability for editing housing and population census data in less developed countries

2 If you are familiar with previous versions of CONCOR how does this current version compare to them

2D

35

2

3 How would you compare the utility of this version of CONCOR for less developed countries with other systems for editing data of which you are familiar

4 How well do you feel the documentation (Reference Manual and Diagnostic Messages Guide) supports the software

36

3

5 How well do you think this version of CONCOR would serve for editing other types of statistical censuses or surveys

6 If your organization does statistical data processing by computer would you recommend to your superiors that this software be used to edit the data Do you feel assistance would be required in the installation of this software on your organizations computer andor in training userR on the packages applicability to their work

37

7 In what way(s) could improvements or enhancements be made to this version

of the CONCOR software andor its documentation

8 Other comments

1shy

APPENDIX F

NEWOLD COMMAND COMPARISONS

0 0

OLD CONCOR VERSION 1 DECEMBER 1978

DATA-DICTIONARY

DEFINE INPUT-FILE

DEFINE ERROR-FILE

DEFINE COMMON-DATA

(contains the questionnaire-IDand record type location information)

DEFINE RECORD-TYPE=

DEFINE NEW-DATA

DEFINE ARRAY-DATA

APPENDIX F

NEW CONCOR VERSION 2 DECEMBER 1979

DICTIONARY-DIVISION

DICTIONARY-ATTRIBUTES-SECTION

DICTIONARY-NAME

FILE-SECTION

INPUT-FILE OUTPUT-FILE WRITE-FILE ERROR-FILE

DOENTIFICATION-CONTROL-SECTION

AREA-CONTROL

QUESTIONNAIRE-CONTROL

RECORD-CONTROL

INPUT-RECORD-SECTION

DEFINE-RECORD

WORKING-DATA-SECTION

NEW-DATA ARRAY-DATA

END-DiVISION

COMMENTS

In Version 1 all command keyword labels had to begin in column 1 ofthe coding sheet The position notation of Version 2 permits thecommand to begin in columns 1-72

The period preceding a statement

signifies that it is but a commentcard Accordingly it should be noted that while the END-DIVISION command must be present the division name identifier DICTIONARY-DIVISION

has not been implemented

Note Some parameters for each commandhave been altered even where general correspondence exists

OLD CONCOR VERSION 1 DECEMBER 1978

EDIT-SPECIFICATION

PROLOG

FILTER TYPE ( )

EPILOG

ALLOCATE (ALL) ASSERT (AST)

END ERROR

FILTER TYPE ( ) WITH

IF LET

RANGE (RNG) RECODE (REC) STOP --keyword

UPDATE (UPD) WRITE (WRT) XRECODE (XREC)

NEW CONCOR VERSION 2 DECEMBER 1979

EXECUTION-DIVISION

REPORT-CONTROL-SECTION

COUNT-IMPUTES GENERATE-EDIT-STATISTICS

EDIT-SPECIFICATION-SECTION

PROLOG-ROUTINE

FILTER-ROUTINE (name-of-record-type)

EPILOG-ROUTINE

ALLOCATE (ALLOC) ASSERT (ASRT) DRECODE (DRCD) END-DIVISION

EXIT

GRECODE (GRCD) IF LET LIST OUTPUT RANGE (RNG)

STOP UNTIL UPDATE (UPD) WRITE (WRT)

END-DIVISION

COMMENTS

Periods preceding division and section names signify that they are treated as comments by CONCOR

END-DIVISION signifies the termination of the CONCOR division organization and must always be present even though division identifiers have not been implemented

This figure additionally permits a comparison of the EDIT-SPECI-FICATION commands of both CONCOR language versions Note the changes in abbreviations and that some keyword options (not shown) have been altered even where general corresponshydence exists Individual commands are discussed on the following pages

CONCOR LANGUAGE COMMAND STATEMENTS

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

ALLOCATE (ALL) ALLOCATE (ALLOC) Allows for the assignment of values and with the UPDATE command providesthe imputation mechanism Major difference is this statement now providesfor a receiving-identifier list and statistics are generated as coded inthe GENERATE-EDIT-STATISTICS command option of the EXECUTION-DIVISION

ASSERT fAST) ASSERT (ASRT) The ASSERT command is the basic consistency editor command of CONCOR Command now provides for a NOERROR and MESSAGE option Additionallyby utilizinq the DUMPR DUMPO keywords provides the user with the option of using all the current value of all user-identifiers defined for the record type being processed Other changes to this command include the pass end-pass fail end-fail constructions for the P purpose of maintaining structured logic

(see RECODE) DRECODE (DRCD) Provides a method for recoding data items having a starting value of zero and continue in ascending sequence Replaces t previous XRECODE command

EXIT Provides the means to terminate EXECUTION (ESCAPE) of command statements within the UNTIL command

END END-IF END-THEN

A solitary END command is no lonqer permitted in conditional construction Note the similarity to COBOL pseudocode

END-ELSE

ERROR not implemented No longer supported in language

(continued)

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

FILTER TYPE ( ) WITH not implemented No longer are option statements supported in language This command once specified essentially a GOTO depending on construction

(see XRECODE) GRECODE (GRCD) This new command provides a method for the group recoding of input values

IF IF The IF statement has properties found virtually in all programming languages ie provides a means of evaluating a condition and controlling the execution of alternative logical statements Similar to pseudocode a CONCOR programmer must utilize the structured command IF THEN END-THEN ELSE END-ELSE construction in place of the IF THEN END ELSE END statements acceptable to the 1978 version

LET LET Virtually unchanged

LIST New command which allows for the generation of specific messages to be displayed in the report of edit statistics by questionnaire as well as specific individual identifiers

OUTPUT New command which provides the means by which records will be written to the output-file

(continuej)

4)bull

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 VERSION 1

RANGE (RNG)

RECODE (REC)

STOP- keyword

UPDATE (UPD)

(continued)

DECEMBER 1979 VERSION 2

RANGE (RNG)

name changed

STOP- keyword

UNTIL DO END-DO

UPDATE (UPD)

COMMENTS

As a basic editing command of CONCOR the December 1979 version permits the listing of the out-of-range values as well as the entire record or questionnaire through DUMPR and DUMPO optionswithin the command Another new option of this command involves the utilization of a PASS END-PASS FAIL END-FAIL construction permitting the specification of logicalcommand paths depending upon the outcome of the range test

Name changed to DRECODE Virtually identical with the DRC command in the COCENTS system

Unchanged

DO-END-DO one of the most significant and powerful enhancements to the CONCOR language this command provides a loopingcapability similar to most other high-level pogramminglanguages The number of iterations is under user control as well as the value initially assigned to the user identifier (counter) which is incremented with each successive execution of the command statements

Virtually unchanged it is the basic mechanism for maintaining arrays involved in imputation processes

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

WRITE (WRT) WRITE (WRT) The WRITE language statement while once utilized as the primary output command is now used to create an auxiliary or derivative file Statistics may be accumulated with the specification of the GENERATE-EDIT-STATISTICS option of the EXECUTION-DIVISION (See WRITE-FILE of DICTIONARY-DIVISION)

XRECODE (XREC) name changed Old command which has become the GRECODE command in the December 1979 version

0 0 0

DECEMBER 1978 VERSION 1

DECEMBER 1979 VERSION 2

Not implemented REPORT-DIViSION

DISPLAY-CONTROL-SECTION

DISPLAY-EDIT-STATISTICS

TOLERANCE-CONTROL-SECTION

ERROR-RATE-CHECK REJECT-FILE

END-DIVISION

COMMENTS

This new division (note division and section names treated as comments) provides the means to controlthe type of edit statistics reports to be generated Generally all reports produced by the commands ofthis division are organized according to the AREA-CONTROL command of the DICTIONARY-DIVISION Notethat this means the input data file must be preproshycessed (presorted external to CONCOR) on the basisof the AREA-CONTROL data field as CONCOR is incapableof merging uncontiguous records

There is currently no command to enable users to specify unique report headings on the listingfrom this division

APPENDIX G

ISPC FUTURE ENHANCEMENT LIST

G-1

Appendix G

Future Versions of CONCOR Design Considerations

The present version of the CONCOR system (Basic COBOL Version 2) was designed to provide adequate computing power to properly edit housing and population census da a using automatic correction techniquns As a result CONCOR should not be construed to be the ultimate software package for generalized data editing The developers of this version of CONCOR realize that there is a group of improvements that can be made to the system which would facilitate the census data editing process A second group of changes or enhancements also exist that could facilitate the editing of survey or other types of census data It should be noted that some of the modifications to meet these two objectives at times are mutually exclusive Some of the possible modifications and considerations are outlined below

1 Housing and Population Census Processing

1 1 Software

1111 Dictionary Division

(1) Implementation of the Dictionary Attributes Section This would allow the user to specify the dictionary size (number of pages) on disk and control other file characteristics such as volume specification and retention periods if applicable

(2) Addition of an optional input andor output file(s) which could be used to initialize and save values stored in arrays and used during the allocation process This feature would require additional syntax analysis and the creation of new commands to perform the movement of the data values

(3) An output record format area (Output Record Section) could be added This would allow the creation of edited output files in a different format and length than the file read into CONCOR This implementation would require other enhanceients to provide a simple means of moving values from th-iT i tato tile output file Unique naming of user-identifiers would also need to be addressed Another possiblity is the declaration of both the input and output location in the same command when specifying the contents of tile data record to be edited

(4) Addition of user-identifier naires to fields used in the controlling of summary statistics and unique questionnaire determination This would require modification to the system level data dictionary

46 G-2

(5) Increasing the number of area and questionnaire control fields

This implementation would require modification lo the

dictionary as tell as to the error records passed to the Report

Division Another problem would be the formatting of the extra

data values in the report headings

(6) Permit the mixing of difFerent data item types in the CONTROL

commands This would allow the greatest flexibilty in choosing

control fields In order to implement this enhancement the

dictionary and the generated COBOL program (EDITOR) would

require modification

(7) Give the user access to the values in the control fields during

the editing process Either the user would be prohibited from

changing these values (as is currently done with CONCOR

internal identifiers) to prevent the creation of new questionnaires or the summary information gathered on a

questionnaire basis would become meaningless Another

ramification of this enhancement is how to handle fields defined as alphanumeric as CONCOR internally works in binary

(8) Allow input data records with a similar format to be defined by

the same DEFINE-RECORD command statement This could be

accomplished by permitting multiple values in the RECORD-TYPE

clause of the DEFINE-RECORD command A possible problem here

is in the determination of which value to be move to the output

record when the record is to be written out using the OUTPUT

command 0

(9) Implement a COMMION-DATA concept to handle items command to all

record types This could be costly in terms of core storage

and additional conversion time if a storage area is allocated

for each identifier for each record type Only permitting one

common area for all records would require validation of the

data to ensure the values were exactly the same on each record

This wJould require the CONCOR program to perform many more

internal checks when reading the data file into the store area

(10) Permit the selective inclusion of data items found in the

DEFINE-RECORD command statement into the universe count used

for the determination of tolerance levels when using the COUNT-IIPUTES command statement

(11) Increase the number and types of data format referencing now

permitted External numeric data (type N) could be expanded to

handle 18 digit numbers Also signed numeric and packed

decimal data format could be added The signed numeric format

would allow both leading blanks and negative data values This

format is essential ihen dealing with data files generated by

FORTRAN programs

0

+

G-3 47

(12) Increase the number of comparison strings permitted for alphanumeric data items This would allow for easier processing of alphanumeric strings but would require modifications to the system data dictionary

(13) If a range of valid values were specified with the declaration of a user-identifier in the DEFINE-RECORD command an automatic range check could be performed prior to CONCOR giving the user access to the data items on the questionnaire The inclusion of such values are now done by many users in the form of comment statements This enhancement would also facilitate the use of the dictionary by tabulation software

(14) The language format used in the Dictionary Division could be modified to recognize both the space and comma character as valid delimiters The user would still be required to code two commas to receive the system default value for any parameter

(15) Provide a repetition factor in the initialization of arrays

(16) Print the data dictionary name in the titles of the dictionary documentation or provide a means by which the user may specify a title to appear in the listings

112 Execution Division

(M) Implement the Run Control Section This would allow the user to increase execution speed of the EDITOR program This would be accomplished by eliminating some of the system protection features such as division by zero and out of bounds checking In this secion the user could also specify new maximum values for the EDITOR error limits

(2) Provide an internal identifier (OCCURRENCE-PTR) which would contain the occurrence number of the record currently being filtered The only problem to resolve is how to treat this id-ntifier outside of the Filter Routines

(3) Implement a CREATE commnand This command would allow the user to create a record that Was not found on the input file This implementation would require special care as it gives the user the ability to change the valme of a CONCOR internal identifier (TYPE-CGUNT) and modify the contents of the store Another consequence of this command is what action is to be taken in the calculation of tolerance in terms of this new records inclusion into the universe of observations and number of changes

iK

48 G-4

(4) The code generated in the EDITOR program for the DRECODE

command should be modified to utilized a lookup table approach

This would decrease the execution time of the command but woul C

require additional communication between the GENED and GENDD

programs

(5) Permit the user to continue the message text field over

successive lines of code

(6) Implementation of a SET command command that would change

CONCORs default occurrence pointer for the duration of the SET

command This command would be ever helpful as the validation

of the existence of a record would only have to be done once

when the SET command was encounterd The major drawback to

this command is in the users ability to remember the action

taken during the last execution of the command when the

Execution Division command statements may cover many pages of

code

(7) Implement a method of specifying different data formats for

fields on the WRITE command statement This would give greater

flexibility to the user but would require major revision to the

routines now used to process the WRITE command

(8) Provide the user with two sets of internal identifiers The

first set currently exists in the system and is valid on the

basis of the current control area (if specified) being

executed The second set would be accumulated on a run total

basis and would require the creation of new CONCOR internal

ident ifiers

(9) Implement automatic array declarations for capturing allocation

frequency distributions This could be done by the system for

discrete values (especially if point number 13 above is

implemented) but a mechanism for handling continuous values

would need to be designed nother program would then be

required to format the information gathered into a readable

form

113 Report Division

(I) Provide a means by which the user may specify their own headings or titles -for the reports

(2) Give the user the ability to override page ejection as a means

of saving paper

(3) Provide a neans by which the statistics gathered may be

presented in other aqgregations other than the area-break and

total levels now pro ided This would require another

procedure to aggregate the statistics and pass them onto the

Report Division before the printing of the reports began Thi

extra procedure is required because of program size

cons i derat ions

G-5

49

114 System Level

(1) Generation of assembler (ALC) language code for the EDITOR program to be used on the IBM 360370 series type computers

(2) Modification of the GENSRC program i utilize a lookup table instead of the linear search technioa2 now used

(3) Modification of the RDMSGTXT program to look at continuation lines when searching for the text to be supplied by the system

(4) Hlodification to the system data dictionary to better arrange and make available information needed most frequently Also to examine the possibility of making the section dealing with the message text and report generation a separate file Investigate the industry standards for dictionary format and see how CONCOR meets the requirements set forth

(5) Make the parsing and syntacial routines interactive so as to increase the product ity of the CONCOR user This does not mean to imply that the editing process itself should be made interactive but rather only the development of the user code should be put online

12 Documentation

(1)The development of a self-teaching CONCOR manual

(2) The development of a CONCOR Users Guide

G-6

0

50

2 Survey or other Census Processing

21 Software

(1) Make optional the specification of the record type and questionnaire identification information as some files are not

hierarchical in nature

(2) Provide another input file as a means of doing a check-in procedure of sample cases expected This new input file would contain the master list of those oberservations selected to be

examined

(3) Allow floating point calculations

(4) Provide a mechanism for referencing repetitive sections of questionnaire This referencing (loop letter approach) could be accomplished in the same fashion as interrecord checking is now implemented Note that by using this approach two versions of the software would be required One version would be as CONCOR is now implemented and the second would accept flat data files where the subscript indicated a repetitive section on the survey document

(5) Add a weighting scheme to allow meaningful totals and summary

statistics based upon the sampling frame used to gather the data being edited

(6) Increase the number of user-identifiers allowed in the system because of the more detailed information found on survey forms

(7) Provide a means of manual correction of erroneous data items This correction scheme would utilize the data dictionary and allow for correction based upon the user-identifer name and

questionnaire identification CELADE has a COBOL version of

this program but it has not been finished nor tested

42

APPENDIX H

CONCOR SYSTEM INTERNAL VARIABLES

0 APPENO H

CONCOR SYSTEM INTERNAL VARIABLES

1 EOF-FLAG

2 ERROR-TN-QUESTIONNAIRE

3 CURRENT-POINTER-VALUE

4 TYPE-COUNT ( )

5 RECORD-COUNT

6 QUESTIONNAIRE-COUNT

7 RECORDS-IN-STORE

8 INVALID-RECORD-TYPE-COUNT

9 INCOMPLETE-FLAG

10 CONTINUATION-FLAG

11 INVALID-RECORD-FLAG

EOF-FLAG

ERRORS-IN-QUESTIONNAIRE

TYPE-COUNT ( )

IN-RECORD-COUNT

IN-QUESTIONNAIRE-COUNT

RECORDS-IN-STORE

INVALID-RECORD-TYPE-COUNT

INCOMPLETE-FLAG

CONTINUATION-FLAG

OUT-OF-RANGE

NOT-NUMBER

BLANK

(new internal counters) OUT-RECORD-COUNT OUTPUT-QUEST-COUNT WRITE-RECORD-COUNT

Change in spelling of variable

Not mentioned--possible ommission from manual

Not implemented

Note Current documentation equivocates when some variables

are reset--most are initialized with each control area break

Though implemented as reserved identifiers due to poor documentation of (old) CONCOR exact values and utility were unknown

These apparently new counters permit access to valuable totals within control areas

Note These are not cumulative counters Also outputshyquest-count violates naming conversion eg in out

APPENDIX I

CONCOR-EDITOR EXECUTION STATISTICS

C 0 N C o R - E D I T O R EXECUT ION S-TA-T1 ST I CS RUN DATEt 517I

APPENDIX I

CONTROL AREA 0 INPUTS RELD OUTPUTS BY OUTPUTCOHMAND OUTPUTS BY W0ITECOMP4JD NO VALI 0 SEQUENCE NUMBER 00000 00 0 RECTYPES 0

OUESTIONNAIRES RECORDS 0 QUEST KEYWORD- 0 ECORDS- RECORDS FORQUFSTshy

bullDIVISION BY ZERO LINE NUMBER i18 to DIVISION BY ZERO LINE NUMBER 118 bo DIVISION BY ZERO 00 LINE NUMBER 118

Oboo DIVISION BY ZERO 00 LINE NUMBER 118 LTNE |UJMP 000 DIVISION BY ZERO 000 LINE NUMBER 118

bo DIVISION BY ZERO 0 LINE NUMBER 118 00 RUN ABORTED DUE TO DIVISION BY ZERO ERRORS 00 1it LIST IR003 = Ro3t

1 6 42

112 LIST

Comments One of the most sever criticisms 114 LIST IDIVISIION BY ZErO ERROR Cw FOLLOWS of the old CONCOR package was that ]l LET RO01 003 = O it would cease processing for no

apparent reason--ABEND without generating any messages The most )16 LIST iRO01 R003 SET TO ZEROS Pou19 Ron3 conmon types of cessations are division by zero array coordinate 117 LIST RO0 0 = errors and user specified terminations A program to deliberately 118 LET P003 = R002 RO01 test CONCORS ability to providediagnostics was written The system 119 LIST LIST R003 R002RO01 =1 R0031 perfoned exceptionally well and correctly provided the line numbers in the 120 source program which caused the data exceptions The program text has been 121 superimposed upon the Execution Statistics page for illustration purposes 122 END TEST OF OPERANDS AND MSG

123

124

1 5 PEGISTERS RESET TO ZERO

125 LET R001 RO00 R003 = nt

0 N C 1 - Z D I T 0 it - ( r S T A T I S T I C S QJi DAT 0119

I ~ - ~APPENDIX b)APPENDIX TL I A Cr14A) 01 3TUTS JRtTF -0 VtLIQ AP-A I PUT R - 3ITD JTS LY UTPUT - Y C

I)ESTIjAAIES - PECLRW cU] I S T igry-t QEC UP )S irrkqS F] ) T

VLLII= FF= 15( C 1 u U-1 Tr JJ TI A)[ 9 )i 1= TI =WATE ) L i - U = VI L IJ) L T J3r-= I n

D 1 4A 1 EItR P IXSTI J A 1 L = 7 II I fIAT E I i Y C

= y 4 I 19f CZ U AT -Z-2j 1ESTIJh14 E CJ)PI) IATE= 2 V L 1E= L T I P1 =UM 11

= rU F r

F R L- -- I JE TI rjA OrPR eTE= rUMiF=E II V1 LIIF= LIT V M

tYfZU ttj gR R QUIESTI3Ii1 j Vi r IATE= LI n l FJw== 0R V J L T C ( -pJIT E~Q - cUESTT Y2ZjI 1= g rfiflUI)IATL= VeLUJE= P LIf PP= 1 3)

bull~ - A ED 011 1 Tn CCy -tENCE C rGR DI T FOR UP S )r 0 JCCUJ0EICF FJ71 ECGPO

173

175 UNTIL ROO1 gt 10 VARY P001 FROM 1 By Comments

In this example RO01 a 2 dimensional array 177 DO with 10 rows and 10 columns is attempting

178 UNTIL R002 - 10 VARY BO FROM 1 BY 1 to update the smaller (5 X 5) TA2 Nonshy- Uexistent element references caused the error

The program text has been superimposed on this

page for illustration purposes

180 ALLOCATE NOOP = TA1 (RO0] ROO)I

JAI UPDATE TA2 (RO01 R002) RO031

182 LIST N002 = R0037 NOO R003

1P3 LIST VALUES OF TA1 = RO03v TAl (RO01 R002)

184 LIST

185 LIST ITA2=TA2 RO01 Rn02)l

l6 END-DOI LIST

187 END-DO

deg r- -- shy

APPENJDIXI1 J( j

CONTROL AREA INPUTS READ OUTPUTS BY OUTPUT LOMANO OUTPUTS TEltCOMMNDfN

SEQUENCE NUMBER 11T A QUESTIONNAIRES RECORDS QUEST KEYWORD RECORDS SFf- 0 T rr-

JOB STOPPED DUETO USER STOP-RUN COMMAND ON LINC 215 - - -- 2 15 _ _ 4$ A

EDITOR -- NO RMAL END OF J B r T I1)

214 IF IN-RECORD-COUNT gt Comments In this example when the internal variable1215 ITHEN STOP-RUN END-THEN was than

215 TE END-THEN IN-RECORD-COUNT was greater than 500 processing was directed by the source COBOL CONCOR language

216 1 program to cease The program text has been - upon this page for illustrationI _-EC superimposed

S IF IN-RECORD-COUNT- lt 2 - - purposes ~~~~~~~~~~- bull - - ------I-

218~ THEN -LIST lt(THAN 20RECS INR

amp-~~~~~ ntai-beDcent

I

APPENDIX J

DIAGNOSTIC MEFSAGE GUIDE EXAMPLE

APPENDIX J DD-2

WARNING(DD-001) BLANK COMMAND STATEMENT LINE

EXPLANATION A blank command statement line was found in the user Dictionary Division command statement file

ACTION TAKEN Parsing continued with the next command statement line in the Dictionary Division command file

USER RESPONSE Put a comment indicator () in column 1 of the command statement line if spacing is desired if not remove the command statement line

ERROR (DD-002) ILLEGAL FIRST CHARACTER IN STRING

EXPLANATION A character not in the CONCOR legal character set (A-Z 0-9 - + the comma character () the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found as the first character in the string

ACTION TAKEN Parsing began again with the next string

USER RESPONSE Check for possible keying errors and ensure that the first character is a member of the valid CONCOR

character set

ERROR (DD-003) END OF LINE REACHED WITH NO DELIMITER FOR ILLEGAL STRING

EXPLANATION A string starting with a character not valid in the CONCOR legal character set (A-Z 0-9 - + the comma character ()p the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found without one of the delimiter characters ( =) when the end of the command line was reached

ACTION TAKEN Parsing began again with the next user command statement line in the Dictionary Division command file

USER RESPONSE Check for possible keying errors and ensure that each of the characters in the string is a member of the CONCOR valid character set (A-Z 0-9 - + the comma character () he delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ())

ERROR (DD-004) COMMENT INDICATOR () IN STRING

EXPLANATION The comment indicator character () was found embedded in a string

Page 9: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …

5

III PROPOSED CHANGES TO THE LANGUAGE STRUCTURE

Based upon the assumption that it is the intent uf sponso-ing agenciesto optimize the COBOL CONCOR package -- a goal which is believed currentlyobtainable -- an understanding of the nature of these changes and how theywould impact users is essential Appendix F sets forth in a comparative manner differences between the old December 1978 and the new December 1979 editions of CONCOR Studying this appendix obviates the fact that while the new version of the language is clearly superior to the old in nearly everyaspect the basic and overall structure of the language is essentially unshychanged Compartmentalization of aspects of the language into divisions represents a significant ideological enhancement to the language Indeeddevelopment of programs by divisions proved to be an extremely useful way of understanding the nature of editing work to be performed However note that while the END-DIVISION comnand is essential to the language the division headings DICTIONARY-DIVISION EXECUTION-DIVISION and REPORT-DIVISION were not implemented and are therefore preceded by a period to be treated as comment lines in the program listing It is inconsistent to implement END-DIVISION commands while not implementing the division headings It is believed that this division structuring is important enough to the overall organizational structure of a CONCOR source language program that it should be implementedprior to general distribution The section headers shown on the figures inAppendix F however are another matter They are cumbersome and were generally not coded by workshop participants and they could be deleted from this version of the language altogether with little loss in organizational understanding The CONCOR language is significantly powerful to stand on its own as a distinct product and is not meant to be a COBOL imitation its present degree of development and specialization do not warrant the structural drag of additional section identifiers is the probable intent of the originalCONCOR project was to develop a package which was uncomplicated and unwieldy to use The question of division and section names implementation while seemingly cosmetic can have real impact on its perceived easiness of learning and use

Figure 1 on the following page illustrates common mistakes programmersmake coding numeric and alphanumeric variables in the DATA-DIVISION These mistakes are the result of the inconsistent variable formats For instancein the numeric data definition statement it is permissible to specify 19-23 where N signifies numeric 9 signifies the length of the item and 23 specifiesthe starting position in the record In NEW-DATA however it is possible to code an item with a maximum length of 18 While on the surface this inconshysistency would seem harmless typically some data defined user variables in NEW-DATA defined N18 could be moved inadvertently to output record fields defined by a data definition statement 119-23 such an action would result in a data error Under certain circumstances itwould be highly desirable to output these larger length values A similar circumstance exists between the numeric and alphanumeric data coding conventions While the maximum lengthof the numeric is permitted to be 9 in the data definition statement (18 in NEW-DATA) the maximum alphanumeric variable is permitted to be only 4 characters in length In the current systems manual it is recommended that

FIGURE 1

DICTIONARY-DIVISION

DICTIONARY-NAME DATA-CODING-EXAMPLE

INPUT-FILE

OUTPUT-FILE

AREA-CONTROL N2-2 N2-4 N3-6 N2-9 N9-23 QUESTIONNAIRE-CONTROL A4-2 A3-6 A2-9 A3-11 A3-14

RECORD-CONTROL Al-l

DEFINE-RECORD

HOI-TYPE-OF-HOUISING-UNIT Nl-17

H02-MATERIAL-OF-ROOF N1-19 10 9

H03-TOTAL-PERSONS-IN-UNIT N8-40 NOT-NUMERIC BLAIK

1104-STATE-OF-UIIII-CODE A4-50 0 U 1 D

DEFINE-RECORD

P01-SEX 1-13 W F

NEW-DATA

NOI-SAVE-TYPE-OF-HOUSING-UNIT

N02-SAVE-TYPE-OF-ROOF 1

N03-COUNT-TOTAL-IN-UNITS 10 0

N04-AGGREGATE-INCOME 18 0

END-DIVISION

Explanations

N2-4 This is an example of an external numeric input data item (N) with a length of 2 bytes starting in column 4 of the input record The maximumlength of this type of variable outside of NEW-DATA is 9 When coded in

NEW-DATA 18 is permitted

A4-2 This is an example of an external alphanumeric input data item (A)

with a length of 4 bytes starting in column 2 of the input record This

construction for alphanumeric variable is valid only in the control stateshyments Additionally it can never be over 4 bytes in length When alphshynumeric data fields are defined within record types the EDITOR program

requires that the comparison strings always be specified A maximum of 3 is permitted The purpose of these strings is to force recode the data to a numeric value If no match is found EDITOR automatically assigns a unique negative value to the field

7

alphanumeric coding be utilized in the QUESTIONNAIRE-CONTROL and RECORD-CONTROL statements where each input data item must be of the same data type as shown in the example When alphanumeric data variables are used in these control stateshyments their construction is identical to that of numeric items However when used elsewhere in the DATA-DIVISION alphanumeric variables are required to specify one of three possible comparison values as shown There are number of production instances when it never would be necessary or even desirable to reshycode alphanumeric data However as CONCOR attempts to force data into a totally numeric format upon output there is no current way to preserve these values if desired

An unwieldy alternative to this situation which may be acceptable under some circumstances would be the expansion of the number of comparison stringsfrom three to a more realistic number The limitation of this compromise is that a full twenty-six comparison identifiers would be required in order to accommodate data which utilized the entire alphabet A better solutionhowever would be to make the general format of the alphanumeric variables identical to that of numeric identifiers ie A9-23 and to permit alphashynumeric values so defined to pass unaltered through the CONCOR system

Anocher data-naming convention which caused several errors and which could be corrected concerns the array data definitional statements While arraysof two and more dimensions are handled in a superior manner by the CONCOR proshygram single-dimension arrays pose a problem in coding as shown in the Figure 2 It is suggested that the command imperatives be changed to permit the codingof both rows and columns in single dimension arrays ie allow a single row vector as well as a single column vector to maintain the consistel -yof the array data definitional statements

A major requirement of COBOL CONCOR file processing concerns the fact that all related data records must be physically contiguous on the input file The implication of this requirement is that files may require preprocessing prior to actual data editing (This preprocessing is usually a sort routine upon a selected CONTROL-AREA key) While this type of processing merely introduces a new step in file processing a major limitation becomes apparent when a largenumber of DISCRETE DATA files of the same census or survey questionnaire are to be processed This limitation is the introduction of manual steps to save the most recent inputed values ie preventing the program from startingwith cold values each batch run If a command such as LOADUNLOAD ARRAYS was incorporated into the language (an enhancement not believed to be difficult to implement) manual processing would be reduced to a minimum between batches and the maximum benefits of the hot deck methodology would be realized It is envisioned that such a command would automatically insure the transfer of the appropriately designated hot values Automatic processing of this nature if done correctly can greatly reduce the time required to clean multishyvolume files for once CONCOR language statements have been compiled linked

While it is possible at this time to save the arrays that amp-e used in the imputation processes on a separate write-file right now it is not possibleto automatically load those values back to an object program and to iTmedishyately resume processing on another volume It isbelieved that suh an automatic feature of the language would cut down the manual processing time significantly enough that it warrants inclusion into the package prior to its general distribution

FIGURE 2

A05-DI FF-BETWEEN-AGE-OF-FEMALE-BY-RECATION v2 4 4

AGE iF LHUSBAND RELATION Connents The ARRAY-DATA command statement

12-L7 18-24 25-35 36+ provides the means to declare array identifiers

2 1 3v 4v HEAD with up to five dimensions Current documentation 2 -1 3 CHILD is not as explicite about the rules of this 1 31 -2 -4 OTHER command as is desirable The parameters of I 2 2 2 NONRFLATTVE the command should function as follows

user-identifier number of dimensions D R C M number of rows number of columns

magnitude of element intiial start up valuesA06-DI FF-BETI4EEN-ArE-DF-PERSON-AND-M4OTHER 114

(This coding generates the below In the example A05 is a two dimensional array 16 18 21 23 error message) with 4 rows 4 columns a default magvitude of 9

and cold deck values as labeled

A06-DI FF- ETWEEN-AGE-OF-PERSON-AND-MOTHER 11t 587

I 2 III jqARNINIGDD-207) COMMAND TERMINATOR I) NOT FOUND C) ASSUMED PRESENT (2) EPROR (DD-9lI) DIMENSION OF USER-SPECIFIED ARRAY IS LESS THAN THE MINIMUM VALUE PERMITTFD (2)

PREEV1OUS DIAGIOSTIC AT CINIE 563

As shown by the array variable A06 CONCORs treatment of vectors is not consistent with the above multidimensional array skeme ie

(Example of how vectors must be currently A06 must be coded as follows coded to be correct)

A06-DIFF-BETWEEN-AGE-OF- PERSON-AND-MOTHERtl42 user-identifier 1 dimension number of elements in vector magnitude of element initial start up values

A simple modification to this command would permit 6 Lthe coding of both row and column vectors and make

16 LB 21 23 this command less error prone

0 0 0

9

and stored as an object module on the system no other compilations should berequired for questionnaire processing files of the same type Theoreticallya single well-written CONCOR program is all that would be required to process an entire census run

Appendix H contrasts the internal identifiers of the old and new languageversions Without such identifiers a user would have little information about the status of input as it is processed by EDITOR As noted in theappendix most internal pointers are reset upon each break in the CONTROL-AREAprovided a CONTROL-AREA has been defined The limitation here is that there are obvious instances when the termination in the processing mode would beadvantageous based on run counts although a CONTROL-AREA has been specifiedeg debugging CONCOR programs or comparing input files Therefore another set of pointers should be implemented for this purpose and made available for programmer reference

One clearly disturbing development which needs to be pursued during inshydepth testing of the system concerns the MAX-STORAGE parameters of the DEFINE-RECORD statement As shown in the figure on the following page when MAX-STORAGE was set equal to the maximum value a COBOL program was generated whichrequired 1O00K of core to run The MAX-STORAGE value of 999 is clearly notrealistic under most processing circumstances This example drives home severalimportant points about CONCOR The core requiremenis of CONCOR generated proshygrams can be influenced significantly by the amount or nature of programmerspecified I0 operations In fact it is possible to generate a program of a size most foreign country machines could not process It is recommended that tests determine a realistic max-value restriction for implementation to prevent problems in this area

The final area of recommended modification concerns the newly implementedREPORT-DIVISION The purpose of the REPORT-DIVISION is to enable a user todescribe or specify certain CONCOR language statements which will generatestatistical reports These reports contain statistics generated by EDITOR as specified by the GENERATE-EDIT-STATISTICS command of the EXECUTION4-DIVISIONAll of the reports produced are organized according to the data fields definedby the AREA-CONTROL command of the DATA-DICTIONARY If the AREA-CONTROL command is not defined in the DATA-DICTIONARY then all the statistics aresummarized at the total run level If a control area field is defined then allstatistics will be summarized for each unique CONTROL AREA as encountered bythe EDITCR program on the input file Statistics by total run level will notbe available This in part relates back to previous discussions citing theneed for new internal identifiers Report listings may contain the values ofentire records or entire questionnaires depending upon the keyword used inthe report generation commands The problem centers upon the homogeneity of CONCOR printouts during a production run

It is virtually impossible to distinguish reports on the basis of thevolumes they were run against Some means should be provided to allow users touniquely and purposefully label the reports generated in this division Indeedthe whole name REPORT-DIVISION suggests that such a command is implicit andappropriate Such a LABEL-REPORT or REPORT-FILE command along with file inforshymation from the system should not be difficult to implement

FIGURE 3

C O N C o R PAGE

SC0 N S I S T E N C Y A N D C-0 R R E C t I ON ) RUN 6AE 0hi88n

E D I T A N D I M P U T ATI 0 N S Y S T EM

USER DICTIONARY DIVISION-SOURCE LISTING

LINE NUMBER

7070

71

72 DEFINE-RECORD RECORDNAME= HOUSING-RECORDi 72

73 MAX-STORAGE=- 999 73

74 RLrRD-TYPE 11t NOTE AN LITERAL 74

267

71

P6T

268 DEFINE-RECORD RECORD-NAME= POPULATION-RECORD p68

P69MAX-STORAGE= 999269

270RECORD-TYPE= F1270

IEF2R5I JES2JOB04966SO0114 SYSOUT iEF2851 SYS8nO18T095553RAOOOAMCP12ARO000005 PASSED

= IEF2851 VOL qFP NOS TmPOn2 IEF285I wHKHDGMCONCORP12AERRORRECS KEPT

= IEF2851 VOL SEP NOS TMPO03 IEF3731 STEP LOAD START 800181001

IEF3741 STEP LOAD STOP 800181002 CPU OMIN 0165SEC SRR flMIN O25SEC VIRT iO00K SYS 3

STEP START TIME - 100155 STEP END TIME - 100257 STEP CPU TIME - 165 SECONDS STEP COMPLETION CODE - C 0

O TAPES - 0 DISKS - 0 PEGION (REQUESTED - 1000K USED - 1000K) IO COUNT 211

CARDS READ - 0 CARDS PUNCHED - 0 LINES PRINTE6 - 22

STEPNAME - LOAD CHARGE (MACHINE UNITS - 1198 DOLLARS- S-77 5TONUM9E 1

11

Concluding Remarks of System Modifications

Version 1 of the COBOL CONCOR language was notorious for abending without the generation of diagnostic messages Appendix I sets forth a test of some of these historical problems In eac) instance the system provided messagesand source program line references wLch enabled immediate identification of the processing difficulties However tests of the various commands revealed the general high performance character of the language in its newest version Under these circumstances the recommended modifica-ions should be considered as ones which will uniformly allow CONCOR to realize t-r potential of its design enhance its utility and moreover endure as a software product

12

IV THE ADEQUACY OF COBOL CONCOR DOCUMENTATION

The current documentation of CONCOR consists of a newly developed relativelyvoluminous systems manual a Diagnostic Message Guide and some loose-leaf inshystructions concerning the installation of the system on IBM computers Of these three documents the Diagnostic Message Guide is clearly complete and adequateand could bear no changes except to provide message number tabs to enable a programmer to more quickly locate the text of a message number An example of the superior format of this guide appears as Appendix J

Conspicuously absent from the list of systems documents is a Users Guide as originally specified in the ISPC enhancement proposal

A thorough assessment of the Systems Manual has been undertaken and indishycates with some reorganization and editing it could be made a more useful and usable document as itis well constituted informationally In its current form the Systems Manual can be best described as a working document a document which was not intended to teach the CONCOR language but centered upon an audishyence already familiar with previous CONCOR releases Further it is not asystems manual per se as some of its components and appendixes could more appropriately compose a users guide For example excerpts from the POPSTAN publications which review the basic editing concepts and contain the POPSTAN example problem should be deleted from the systems manual and would be more contextually effective in the Users Guide The EXECUTION-DIVISION chapter of the Systems Manual would especially benefit from general editing and reorganishyzation (Specifically the contents of pages 90-92 should be moved to page 63 -shythese pages explain the placement and purpose of the three permissible routines of the CONCOR language Had the author been granted more time he could have outlined additional specific modifications to bring existing documentation to a level of adequacy) However some guidelines stand out

1 Installation instruction and considerations should compose a chapter of the Systems Manual rather than a separate document (Hand-out materials covering installation were not adequate) Further a documentation scheme setting forth how CONCOR was installed on each computer and any modifications required during installation could be serially added to and become a permanent part of the installations systems manual If CONCOR is going to be used and if agencies are going to actively support it as a package installation limitations must be known and well-documented The prescribed format for this documentation should be set forth and any modifishycations to CONCOR should be continuously updated on these shcets so that a history of changes to the language will be readily accessible to supporting agencies If CONCOR is to survive as a standardized system editing results must be capable of being replicated among installations Such a form could contain other information as well concerning the tyce of computer amount of care and nature of utilities supporting

13

users Upon installation a copy of this form could be

sent to the US agency which will ultimately be responsible

for supporting the CONCOR package

an appendix2 A complete COBOL CONCOR program should appear in

for reference

3 The development of the Users Guide should include an intensive

review of the editing concepts involved in processing census

data files beyond the POPSTAN materials

4 An explanation of the CONCOR benchmark program syould appear

in the Users Guide and the Systems Manual The running of a

supplied benchmark program should be a standard installation

protocol used to test all operational aspects of a new

installation

This tool which has5 The development of a CONCOR pocket card been proven useful in teaching and assisting programmers in

utilizing programming language lays out all commands options on

a single small card An example of such a pocket card is the

Such a small document is useful in clari-IBM Assembler Card fying the language and setting forth structural rules without

continual reference to full-size manuals

14

V CONCLUSION

In a situation of extreme need CONCOR could probably be utilized as it seems capable of performing its editing functiens Its usefulness as a data

cleaning tool is generally undisputed The importance of having CONCOR exshyhaustively tested by an independent agency cannot be over-emphasized in light of prior failures of the system to perform adequately Additionally core requirements and processing speed should be precisely determined One must be encouraged in the amount of progress which has been made in a relatively short amount of time towards the completion of the CONCOR package While it is clearly beyond the scope of this report to estimate the cost and the amount of time required to implement the herein outlined highly desirable changes to the CONCOR language and documentation and then exhaustively test the system it is believed that this process could be completed within a single quarter (The changes are of the nature that no structural redefinishytions of a systematic nature would be required) These enhancements would make the COBOL CONCOR package more complete consistent and reliable in the field ISPC has competently moved the system to the point that its comshypletion iswithin reach

Given the probable revision of the current Systems Reference Manual and the development of a suitable Users Guide document it is likely that this version of CONCOR could be taught to experienced programmers (unfamiliar with previous versions of the language) in as little as four weeks And while one is impressed by the potential power and overall simplicity of the COBOL CONCOR language it is difficult to envision individuals with no real computer programming experience utilizing its full capabilities Hence the importance of including a review of basic editing concepts in the documentation is undershylined

Further as documentation is often critical to the perceived ease of use of computer-based systems itwill be essential to have these various manuals examined independently prior to their general use

As previously mentioned Appendix G represents features of which the ISPC feels could have positive impact on CONCOR Depending upon the outcome of the testing recommended in previous chapters and the availability of funding supporting agencies may desire to look again at an assembler (ALC) version of the language

Overall COBOL CONCOR remains an intermediate product yet to realize its potential which is thought considerable Software development of this nature is characteristically of a long-term evolutionary design In light of this reality agencies should be prepared to evaluate such projects success or failure in terms other than that of immediate utility It is the philosophy underlying the system not the system itself which will ultimately be exported

APPENDIX A

Bucen Enforcement Proposal

APPENDIX A

BUCEN IN-HOUSE REVIEW ENHANCEMENT PROPOSAL

1 Easy to use interrecord referencing

2 Improved output file capabilities

A provide overflow protection on WRITE command

B provide OUTPUT command to create corrected questionnaire file that matches IP file data dictionary

3 Improvededit statistics reported (LISTERR)

A provide automatic (user-specified) area break

B provide options for compilation and displaying edit statistics at various levels

C provide automatic (user-specified) tolerance checking of error rates by area

D automatically capture IDs of areas failing tolerance check

4 Clean up known bugs in code

5 Comprehensive testing

6 Clean up and enhance documentation

A reference manual more examples error message guide

B installation guide

C systems manual

APPENDIX B

EVALUATIVE CRITERIA

APPEiUIX B [4 si 11 T q tII

UNITED STATES GOVERNMENshy

Pfern0 KGa 7 dll TO DSPOPFPSD Robert H Haladay

DATE December 3 1979

DSPOPDEIO Liliane Floge

SUBJECT Criteria for Evaluation of SuCen COBOL CONCOR September 30thVersion as presented at January 1930 orkshop

The following points should be considered when evaluatingCONCOR the COBOLediting system during participation at the January 19380 workshyshop

1 Are the Installation anual the System ilanual and theUsers anual ready for use Are these manuals adequate fortheir purposes Are the manuals clear and consistentthey be understood by subject-matter personnel Can

as well program ers

as

2 Is the editing system capable of impleaenting all thefeatures as described in the documenit_tion 3 Does the system appc3Ar to be ely debugged andtested and ready for release to dLvlooing conris 4 Does the system appear to be fast enough to edit a

census in a reasonable airount of time 5 hat size core does tine sys t- equire

6 CorFm-ent in general on coole te-nss usefulness andefficiency of ir-nease usethe n of by developingcountry prograrrers and other c nsus personnel

cc Su-a-- e Olds APiA SDi OrWJulio Ortuizar luotolaCelta Systems

APPENDIX C

WORKSHOP ITINERARY

APPENDIX C

CONCOR Workshop Schedule January 7-18 1980

U S Bureau of the Census International Statistical Programs Center

Scuderi Building Room 209 4235 28th Avenue Marlow Heights Maryland

Monday January 7

930 am shy 1000 Welcoming Remarks

Overview of Workshop

1000 - 1000 Introduction to CONCOR - Purpose and function

- History of development - General computer

requirements

115 - 200 System Description

-Constraints in design of CONCOR

-Basic subsystems of CONCOR

-User interactions with system

-Examples of outputs produced

1030 shy 1045 Break

1045 - 1200 Editing Concepts - Ways to interrogate

data - Ways to correct data

- Editing housing and

population data - POPSTAN

- Advantages of CONCOR

1200 shy 115 pm Break

200 - 230

230 - 245

245 - 325

User Program Organization -Divisions - Sections

- Routines - Commands

Break

Command Language

Description -Types of statements -Format -Syntax

is

2

Tuesday January 8

Dictionary Division Command Statements

115 pm-2 15 Input-Record-Section930 am-10

30 Punctuation - Define-RecordInput data referencing

Working-Data-SectionWorking data declara-- New-Datations - Array-DataInternal data represen-

tation and storage Break215 - 230

1030 - 1045 Break 230 - 325 Dictionary Examples

- Minimum dictionary1045 - 1200 Dictionary-Attributes-

structureSection - Maximum dictionary- Dictionary-Name

structure

- Hand out dictionaryFile-Section problem- Input-File

- Output-File - Write-File

- Error-File

1200 - 115 pm Break

Wednesday January 9

115 pm-2 1 5 Execution Division Command

930 am-1030 Discussion of Partici-Statements (continued)pants Dictionary

- Routines of Edit-Specishyproblems fications-Section

Free work time -Prolog-Routine -Filter-Routine -Epilog-Routine

- Types and functions 1030 - 1045 Break

of edit specification

commands1045 - 1200 Execution Division

Command Statements - Range- Punctuation - Assert- Subscripting

- Internal Identifiers - Report-Control-Section

-Generate-Edit-Statistics 215 - 230 Break-Count-Imputes

-Examples 230 - 325 - PassFail clauses

- List1200 - 115 pm Break

3

Thursday January 10

930 am-1030 Discussion of Problems - If115 pm-215

Free work time - UntilExit - Stop

1030 - 1045 Break 215 - 230 Break

1045 shy 1200 Edit Specification Command Statements 230 - 325 - Drecode

(continued) - Grecode

- Allocate - Update - Let

1200 - 115pm Break

Friday January 11

930 am-1030 Edit Specification 115 pm-3 25 Hand out Edit Specishyfications as problemCommand Statements

(continued) Free work time

- Output - Write

1030 - 1045 Break

1045 - 1200 Report Division Command Statements - Display-Control-

Section -Display-Edit-Statistics

- Tolerance-Control-Section -Error-Rate-Check -Reject-File

-Report Examples

1200 - 115 pm Break

I

29

Monday January 14

930 am-1030 Discuss procedures for running problems on computer

1030-1045 Break

1045-1200 Component Programs of the CONCOR system

1200- 115 pm Break

Tuesday January 15

930 am - 325 pm Free work time

Wednesday January 16

930 am 1200 Free work time

1200- 115 pm Break

115 pm-215 How to Install CONCOR on IBM 360370 OS

215- 230 Break

230-325 Free work time

Thursday January 17

930 am-325 Free work time

115 pm-215 Future CONCOR Design Considerations - for census processing - for survey processing

- manual correction system

215- 230 Break

230 - 245 Evaluation Guidelines

- Hand out evaluation forms

245 - 325 Free work time

Friday January 18

930 am-1030 Free work time 115-325 Free work time

1030 - 1045 Break

1045 - 1200 Group Discussion on CONCOR - submission of written evaluations Distribution of IBM installation tapes Presentation of Certificates to Participants

1200-115 pm Break

APPENDIX D

PARTICIPANTS

APPENDIX D

CONCOR WORKSHOP January 7 - 18 1980 Marlow Heights MD

PARTICIPANTS

KENYA

James Midianga Government Computer Centre Central Bureau of Statistics PO Box 30266 Nairobi Kenya

PANAMA

Omar Rivera Rios Data Processing Specialist Contraloria General de la Republica de Panama Apartado Postal 5213 Panama 5 Panama

PHILIPPINES Guida T Capellan OIC Computer Programming Division National Census and Statistics Office Solicarel Bldg 1 Magsaysay Blvd

Sta Mesa Manila Philippines

THAILAND

Angsumal Sunalai National Statistical Office Bangkok 1 Thailand

EGYPT

Farag Sedky Mourad Ghaleb CAPMAS (Centeral Agency for Public

Mobilization and Statistics)NASR City Cairo Egypt

L

2 31

SAUDI ARABIA

Shargi R Al-Shargi National Computer Center PO Box 2534 Riyadh Saudi Arabia

Ahmed S Al-Ghamdi Dept Director of National Computer Center PO Box 2534 Riyadh Saudi Arabia

OTHER

Robert W ONeal USREPJECOR APO New York NY 09038

John N Adams Data Processing Advisor DACCA Department of State Washington DC 20520

OR co UNDP PO Box 224 Dacca Bangladesh

Joe Quasney US Census Bureau Washington DC 20233

Howard Brunsman 5715 N Ninth St Arlington VA 22205

Larry Shiller and Julio Ortuzar Delta Systems Consultants Inc 264 Alhambra Circle Coral Cables FL 33134

Nicholas Ourusoff Burpee Hill Road New London NH 03257 (United Nations)

Frederic J Grant IV Systems Development GeorgiaWorld Congress Institute 580 North Omni International Atlanta Georgia 30303

3 32

Rafael Samper Mary Jo Keenan Catherine B Gleason LACDR Room 2246 NS Washington DC 20523

John Marshall AIDSERDMPSE Rm 706C SA-12 Washington DC 20523

Susanne Bacon and Mary K Friday ISPCData Processing US Census Bureau Washington DC 20233

Leo Dougherty ISPCWorld Census Staff US Bureau of the Census Washington DC 20233

33

CON C OR WORKSHOP

January 7-18 1979 Washington DC

Staff

Robert R Bair

Luis Garcia

David Malkovsky

Sandra Mansfield

Selma Sawaya

Vivian Toro

APPENDIX E

CONCOR EVALUATION FORM

0

APPENDIX E

CONCOR WORKSHOP January 16 1980

BASIC COBOL VERSION 2

December 1979 Release

Participant Evaluation of Package

1 What is your impression of this version of CONCOR in terms of its usefulness and reliability for editing housing and population census data in less developed countries

2 If you are familiar with previous versions of CONCOR how does this current version compare to them

2D

35

2

3 How would you compare the utility of this version of CONCOR for less developed countries with other systems for editing data of which you are familiar

4 How well do you feel the documentation (Reference Manual and Diagnostic Messages Guide) supports the software

36

3

5 How well do you think this version of CONCOR would serve for editing other types of statistical censuses or surveys

6 If your organization does statistical data processing by computer would you recommend to your superiors that this software be used to edit the data Do you feel assistance would be required in the installation of this software on your organizations computer andor in training userR on the packages applicability to their work

37

7 In what way(s) could improvements or enhancements be made to this version

of the CONCOR software andor its documentation

8 Other comments

1shy

APPENDIX F

NEWOLD COMMAND COMPARISONS

0 0

OLD CONCOR VERSION 1 DECEMBER 1978

DATA-DICTIONARY

DEFINE INPUT-FILE

DEFINE ERROR-FILE

DEFINE COMMON-DATA

(contains the questionnaire-IDand record type location information)

DEFINE RECORD-TYPE=

DEFINE NEW-DATA

DEFINE ARRAY-DATA

APPENDIX F

NEW CONCOR VERSION 2 DECEMBER 1979

DICTIONARY-DIVISION

DICTIONARY-ATTRIBUTES-SECTION

DICTIONARY-NAME

FILE-SECTION

INPUT-FILE OUTPUT-FILE WRITE-FILE ERROR-FILE

DOENTIFICATION-CONTROL-SECTION

AREA-CONTROL

QUESTIONNAIRE-CONTROL

RECORD-CONTROL

INPUT-RECORD-SECTION

DEFINE-RECORD

WORKING-DATA-SECTION

NEW-DATA ARRAY-DATA

END-DiVISION

COMMENTS

In Version 1 all command keyword labels had to begin in column 1 ofthe coding sheet The position notation of Version 2 permits thecommand to begin in columns 1-72

The period preceding a statement

signifies that it is but a commentcard Accordingly it should be noted that while the END-DIVISION command must be present the division name identifier DICTIONARY-DIVISION

has not been implemented

Note Some parameters for each commandhave been altered even where general correspondence exists

OLD CONCOR VERSION 1 DECEMBER 1978

EDIT-SPECIFICATION

PROLOG

FILTER TYPE ( )

EPILOG

ALLOCATE (ALL) ASSERT (AST)

END ERROR

FILTER TYPE ( ) WITH

IF LET

RANGE (RNG) RECODE (REC) STOP --keyword

UPDATE (UPD) WRITE (WRT) XRECODE (XREC)

NEW CONCOR VERSION 2 DECEMBER 1979

EXECUTION-DIVISION

REPORT-CONTROL-SECTION

COUNT-IMPUTES GENERATE-EDIT-STATISTICS

EDIT-SPECIFICATION-SECTION

PROLOG-ROUTINE

FILTER-ROUTINE (name-of-record-type)

EPILOG-ROUTINE

ALLOCATE (ALLOC) ASSERT (ASRT) DRECODE (DRCD) END-DIVISION

EXIT

GRECODE (GRCD) IF LET LIST OUTPUT RANGE (RNG)

STOP UNTIL UPDATE (UPD) WRITE (WRT)

END-DIVISION

COMMENTS

Periods preceding division and section names signify that they are treated as comments by CONCOR

END-DIVISION signifies the termination of the CONCOR division organization and must always be present even though division identifiers have not been implemented

This figure additionally permits a comparison of the EDIT-SPECI-FICATION commands of both CONCOR language versions Note the changes in abbreviations and that some keyword options (not shown) have been altered even where general corresponshydence exists Individual commands are discussed on the following pages

CONCOR LANGUAGE COMMAND STATEMENTS

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

ALLOCATE (ALL) ALLOCATE (ALLOC) Allows for the assignment of values and with the UPDATE command providesthe imputation mechanism Major difference is this statement now providesfor a receiving-identifier list and statistics are generated as coded inthe GENERATE-EDIT-STATISTICS command option of the EXECUTION-DIVISION

ASSERT fAST) ASSERT (ASRT) The ASSERT command is the basic consistency editor command of CONCOR Command now provides for a NOERROR and MESSAGE option Additionallyby utilizinq the DUMPR DUMPO keywords provides the user with the option of using all the current value of all user-identifiers defined for the record type being processed Other changes to this command include the pass end-pass fail end-fail constructions for the P purpose of maintaining structured logic

(see RECODE) DRECODE (DRCD) Provides a method for recoding data items having a starting value of zero and continue in ascending sequence Replaces t previous XRECODE command

EXIT Provides the means to terminate EXECUTION (ESCAPE) of command statements within the UNTIL command

END END-IF END-THEN

A solitary END command is no lonqer permitted in conditional construction Note the similarity to COBOL pseudocode

END-ELSE

ERROR not implemented No longer supported in language

(continued)

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

FILTER TYPE ( ) WITH not implemented No longer are option statements supported in language This command once specified essentially a GOTO depending on construction

(see XRECODE) GRECODE (GRCD) This new command provides a method for the group recoding of input values

IF IF The IF statement has properties found virtually in all programming languages ie provides a means of evaluating a condition and controlling the execution of alternative logical statements Similar to pseudocode a CONCOR programmer must utilize the structured command IF THEN END-THEN ELSE END-ELSE construction in place of the IF THEN END ELSE END statements acceptable to the 1978 version

LET LET Virtually unchanged

LIST New command which allows for the generation of specific messages to be displayed in the report of edit statistics by questionnaire as well as specific individual identifiers

OUTPUT New command which provides the means by which records will be written to the output-file

(continuej)

4)bull

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 VERSION 1

RANGE (RNG)

RECODE (REC)

STOP- keyword

UPDATE (UPD)

(continued)

DECEMBER 1979 VERSION 2

RANGE (RNG)

name changed

STOP- keyword

UNTIL DO END-DO

UPDATE (UPD)

COMMENTS

As a basic editing command of CONCOR the December 1979 version permits the listing of the out-of-range values as well as the entire record or questionnaire through DUMPR and DUMPO optionswithin the command Another new option of this command involves the utilization of a PASS END-PASS FAIL END-FAIL construction permitting the specification of logicalcommand paths depending upon the outcome of the range test

Name changed to DRECODE Virtually identical with the DRC command in the COCENTS system

Unchanged

DO-END-DO one of the most significant and powerful enhancements to the CONCOR language this command provides a loopingcapability similar to most other high-level pogramminglanguages The number of iterations is under user control as well as the value initially assigned to the user identifier (counter) which is incremented with each successive execution of the command statements

Virtually unchanged it is the basic mechanism for maintaining arrays involved in imputation processes

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

WRITE (WRT) WRITE (WRT) The WRITE language statement while once utilized as the primary output command is now used to create an auxiliary or derivative file Statistics may be accumulated with the specification of the GENERATE-EDIT-STATISTICS option of the EXECUTION-DIVISION (See WRITE-FILE of DICTIONARY-DIVISION)

XRECODE (XREC) name changed Old command which has become the GRECODE command in the December 1979 version

0 0 0

DECEMBER 1978 VERSION 1

DECEMBER 1979 VERSION 2

Not implemented REPORT-DIViSION

DISPLAY-CONTROL-SECTION

DISPLAY-EDIT-STATISTICS

TOLERANCE-CONTROL-SECTION

ERROR-RATE-CHECK REJECT-FILE

END-DIVISION

COMMENTS

This new division (note division and section names treated as comments) provides the means to controlthe type of edit statistics reports to be generated Generally all reports produced by the commands ofthis division are organized according to the AREA-CONTROL command of the DICTIONARY-DIVISION Notethat this means the input data file must be preproshycessed (presorted external to CONCOR) on the basisof the AREA-CONTROL data field as CONCOR is incapableof merging uncontiguous records

There is currently no command to enable users to specify unique report headings on the listingfrom this division

APPENDIX G

ISPC FUTURE ENHANCEMENT LIST

G-1

Appendix G

Future Versions of CONCOR Design Considerations

The present version of the CONCOR system (Basic COBOL Version 2) was designed to provide adequate computing power to properly edit housing and population census da a using automatic correction techniquns As a result CONCOR should not be construed to be the ultimate software package for generalized data editing The developers of this version of CONCOR realize that there is a group of improvements that can be made to the system which would facilitate the census data editing process A second group of changes or enhancements also exist that could facilitate the editing of survey or other types of census data It should be noted that some of the modifications to meet these two objectives at times are mutually exclusive Some of the possible modifications and considerations are outlined below

1 Housing and Population Census Processing

1 1 Software

1111 Dictionary Division

(1) Implementation of the Dictionary Attributes Section This would allow the user to specify the dictionary size (number of pages) on disk and control other file characteristics such as volume specification and retention periods if applicable

(2) Addition of an optional input andor output file(s) which could be used to initialize and save values stored in arrays and used during the allocation process This feature would require additional syntax analysis and the creation of new commands to perform the movement of the data values

(3) An output record format area (Output Record Section) could be added This would allow the creation of edited output files in a different format and length than the file read into CONCOR This implementation would require other enhanceients to provide a simple means of moving values from th-iT i tato tile output file Unique naming of user-identifiers would also need to be addressed Another possiblity is the declaration of both the input and output location in the same command when specifying the contents of tile data record to be edited

(4) Addition of user-identifier naires to fields used in the controlling of summary statistics and unique questionnaire determination This would require modification to the system level data dictionary

46 G-2

(5) Increasing the number of area and questionnaire control fields

This implementation would require modification lo the

dictionary as tell as to the error records passed to the Report

Division Another problem would be the formatting of the extra

data values in the report headings

(6) Permit the mixing of difFerent data item types in the CONTROL

commands This would allow the greatest flexibilty in choosing

control fields In order to implement this enhancement the

dictionary and the generated COBOL program (EDITOR) would

require modification

(7) Give the user access to the values in the control fields during

the editing process Either the user would be prohibited from

changing these values (as is currently done with CONCOR

internal identifiers) to prevent the creation of new questionnaires or the summary information gathered on a

questionnaire basis would become meaningless Another

ramification of this enhancement is how to handle fields defined as alphanumeric as CONCOR internally works in binary

(8) Allow input data records with a similar format to be defined by

the same DEFINE-RECORD command statement This could be

accomplished by permitting multiple values in the RECORD-TYPE

clause of the DEFINE-RECORD command A possible problem here

is in the determination of which value to be move to the output

record when the record is to be written out using the OUTPUT

command 0

(9) Implement a COMMION-DATA concept to handle items command to all

record types This could be costly in terms of core storage

and additional conversion time if a storage area is allocated

for each identifier for each record type Only permitting one

common area for all records would require validation of the

data to ensure the values were exactly the same on each record

This wJould require the CONCOR program to perform many more

internal checks when reading the data file into the store area

(10) Permit the selective inclusion of data items found in the

DEFINE-RECORD command statement into the universe count used

for the determination of tolerance levels when using the COUNT-IIPUTES command statement

(11) Increase the number and types of data format referencing now

permitted External numeric data (type N) could be expanded to

handle 18 digit numbers Also signed numeric and packed

decimal data format could be added The signed numeric format

would allow both leading blanks and negative data values This

format is essential ihen dealing with data files generated by

FORTRAN programs

0

+

G-3 47

(12) Increase the number of comparison strings permitted for alphanumeric data items This would allow for easier processing of alphanumeric strings but would require modifications to the system data dictionary

(13) If a range of valid values were specified with the declaration of a user-identifier in the DEFINE-RECORD command an automatic range check could be performed prior to CONCOR giving the user access to the data items on the questionnaire The inclusion of such values are now done by many users in the form of comment statements This enhancement would also facilitate the use of the dictionary by tabulation software

(14) The language format used in the Dictionary Division could be modified to recognize both the space and comma character as valid delimiters The user would still be required to code two commas to receive the system default value for any parameter

(15) Provide a repetition factor in the initialization of arrays

(16) Print the data dictionary name in the titles of the dictionary documentation or provide a means by which the user may specify a title to appear in the listings

112 Execution Division

(M) Implement the Run Control Section This would allow the user to increase execution speed of the EDITOR program This would be accomplished by eliminating some of the system protection features such as division by zero and out of bounds checking In this secion the user could also specify new maximum values for the EDITOR error limits

(2) Provide an internal identifier (OCCURRENCE-PTR) which would contain the occurrence number of the record currently being filtered The only problem to resolve is how to treat this id-ntifier outside of the Filter Routines

(3) Implement a CREATE commnand This command would allow the user to create a record that Was not found on the input file This implementation would require special care as it gives the user the ability to change the valme of a CONCOR internal identifier (TYPE-CGUNT) and modify the contents of the store Another consequence of this command is what action is to be taken in the calculation of tolerance in terms of this new records inclusion into the universe of observations and number of changes

iK

48 G-4

(4) The code generated in the EDITOR program for the DRECODE

command should be modified to utilized a lookup table approach

This would decrease the execution time of the command but woul C

require additional communication between the GENED and GENDD

programs

(5) Permit the user to continue the message text field over

successive lines of code

(6) Implementation of a SET command command that would change

CONCORs default occurrence pointer for the duration of the SET

command This command would be ever helpful as the validation

of the existence of a record would only have to be done once

when the SET command was encounterd The major drawback to

this command is in the users ability to remember the action

taken during the last execution of the command when the

Execution Division command statements may cover many pages of

code

(7) Implement a method of specifying different data formats for

fields on the WRITE command statement This would give greater

flexibility to the user but would require major revision to the

routines now used to process the WRITE command

(8) Provide the user with two sets of internal identifiers The

first set currently exists in the system and is valid on the

basis of the current control area (if specified) being

executed The second set would be accumulated on a run total

basis and would require the creation of new CONCOR internal

ident ifiers

(9) Implement automatic array declarations for capturing allocation

frequency distributions This could be done by the system for

discrete values (especially if point number 13 above is

implemented) but a mechanism for handling continuous values

would need to be designed nother program would then be

required to format the information gathered into a readable

form

113 Report Division

(I) Provide a means by which the user may specify their own headings or titles -for the reports

(2) Give the user the ability to override page ejection as a means

of saving paper

(3) Provide a neans by which the statistics gathered may be

presented in other aqgregations other than the area-break and

total levels now pro ided This would require another

procedure to aggregate the statistics and pass them onto the

Report Division before the printing of the reports began Thi

extra procedure is required because of program size

cons i derat ions

G-5

49

114 System Level

(1) Generation of assembler (ALC) language code for the EDITOR program to be used on the IBM 360370 series type computers

(2) Modification of the GENSRC program i utilize a lookup table instead of the linear search technioa2 now used

(3) Modification of the RDMSGTXT program to look at continuation lines when searching for the text to be supplied by the system

(4) Hlodification to the system data dictionary to better arrange and make available information needed most frequently Also to examine the possibility of making the section dealing with the message text and report generation a separate file Investigate the industry standards for dictionary format and see how CONCOR meets the requirements set forth

(5) Make the parsing and syntacial routines interactive so as to increase the product ity of the CONCOR user This does not mean to imply that the editing process itself should be made interactive but rather only the development of the user code should be put online

12 Documentation

(1)The development of a self-teaching CONCOR manual

(2) The development of a CONCOR Users Guide

G-6

0

50

2 Survey or other Census Processing

21 Software

(1) Make optional the specification of the record type and questionnaire identification information as some files are not

hierarchical in nature

(2) Provide another input file as a means of doing a check-in procedure of sample cases expected This new input file would contain the master list of those oberservations selected to be

examined

(3) Allow floating point calculations

(4) Provide a mechanism for referencing repetitive sections of questionnaire This referencing (loop letter approach) could be accomplished in the same fashion as interrecord checking is now implemented Note that by using this approach two versions of the software would be required One version would be as CONCOR is now implemented and the second would accept flat data files where the subscript indicated a repetitive section on the survey document

(5) Add a weighting scheme to allow meaningful totals and summary

statistics based upon the sampling frame used to gather the data being edited

(6) Increase the number of user-identifiers allowed in the system because of the more detailed information found on survey forms

(7) Provide a means of manual correction of erroneous data items This correction scheme would utilize the data dictionary and allow for correction based upon the user-identifer name and

questionnaire identification CELADE has a COBOL version of

this program but it has not been finished nor tested

42

APPENDIX H

CONCOR SYSTEM INTERNAL VARIABLES

0 APPENO H

CONCOR SYSTEM INTERNAL VARIABLES

1 EOF-FLAG

2 ERROR-TN-QUESTIONNAIRE

3 CURRENT-POINTER-VALUE

4 TYPE-COUNT ( )

5 RECORD-COUNT

6 QUESTIONNAIRE-COUNT

7 RECORDS-IN-STORE

8 INVALID-RECORD-TYPE-COUNT

9 INCOMPLETE-FLAG

10 CONTINUATION-FLAG

11 INVALID-RECORD-FLAG

EOF-FLAG

ERRORS-IN-QUESTIONNAIRE

TYPE-COUNT ( )

IN-RECORD-COUNT

IN-QUESTIONNAIRE-COUNT

RECORDS-IN-STORE

INVALID-RECORD-TYPE-COUNT

INCOMPLETE-FLAG

CONTINUATION-FLAG

OUT-OF-RANGE

NOT-NUMBER

BLANK

(new internal counters) OUT-RECORD-COUNT OUTPUT-QUEST-COUNT WRITE-RECORD-COUNT

Change in spelling of variable

Not mentioned--possible ommission from manual

Not implemented

Note Current documentation equivocates when some variables

are reset--most are initialized with each control area break

Though implemented as reserved identifiers due to poor documentation of (old) CONCOR exact values and utility were unknown

These apparently new counters permit access to valuable totals within control areas

Note These are not cumulative counters Also outputshyquest-count violates naming conversion eg in out

APPENDIX I

CONCOR-EDITOR EXECUTION STATISTICS

C 0 N C o R - E D I T O R EXECUT ION S-TA-T1 ST I CS RUN DATEt 517I

APPENDIX I

CONTROL AREA 0 INPUTS RELD OUTPUTS BY OUTPUTCOHMAND OUTPUTS BY W0ITECOMP4JD NO VALI 0 SEQUENCE NUMBER 00000 00 0 RECTYPES 0

OUESTIONNAIRES RECORDS 0 QUEST KEYWORD- 0 ECORDS- RECORDS FORQUFSTshy

bullDIVISION BY ZERO LINE NUMBER i18 to DIVISION BY ZERO LINE NUMBER 118 bo DIVISION BY ZERO 00 LINE NUMBER 118

Oboo DIVISION BY ZERO 00 LINE NUMBER 118 LTNE |UJMP 000 DIVISION BY ZERO 000 LINE NUMBER 118

bo DIVISION BY ZERO 0 LINE NUMBER 118 00 RUN ABORTED DUE TO DIVISION BY ZERO ERRORS 00 1it LIST IR003 = Ro3t

1 6 42

112 LIST

Comments One of the most sever criticisms 114 LIST IDIVISIION BY ZErO ERROR Cw FOLLOWS of the old CONCOR package was that ]l LET RO01 003 = O it would cease processing for no

apparent reason--ABEND without generating any messages The most )16 LIST iRO01 R003 SET TO ZEROS Pou19 Ron3 conmon types of cessations are division by zero array coordinate 117 LIST RO0 0 = errors and user specified terminations A program to deliberately 118 LET P003 = R002 RO01 test CONCORS ability to providediagnostics was written The system 119 LIST LIST R003 R002RO01 =1 R0031 perfoned exceptionally well and correctly provided the line numbers in the 120 source program which caused the data exceptions The program text has been 121 superimposed upon the Execution Statistics page for illustration purposes 122 END TEST OF OPERANDS AND MSG

123

124

1 5 PEGISTERS RESET TO ZERO

125 LET R001 RO00 R003 = nt

0 N C 1 - Z D I T 0 it - ( r S T A T I S T I C S QJi DAT 0119

I ~ - ~APPENDIX b)APPENDIX TL I A Cr14A) 01 3TUTS JRtTF -0 VtLIQ AP-A I PUT R - 3ITD JTS LY UTPUT - Y C

I)ESTIjAAIES - PECLRW cU] I S T igry-t QEC UP )S irrkqS F] ) T

VLLII= FF= 15( C 1 u U-1 Tr JJ TI A)[ 9 )i 1= TI =WATE ) L i - U = VI L IJ) L T J3r-= I n

D 1 4A 1 EItR P IXSTI J A 1 L = 7 II I fIAT E I i Y C

= y 4 I 19f CZ U AT -Z-2j 1ESTIJh14 E CJ)PI) IATE= 2 V L 1E= L T I P1 =UM 11

= rU F r

F R L- -- I JE TI rjA OrPR eTE= rUMiF=E II V1 LIIF= LIT V M

tYfZU ttj gR R QUIESTI3Ii1 j Vi r IATE= LI n l FJw== 0R V J L T C ( -pJIT E~Q - cUESTT Y2ZjI 1= g rfiflUI)IATL= VeLUJE= P LIf PP= 1 3)

bull~ - A ED 011 1 Tn CCy -tENCE C rGR DI T FOR UP S )r 0 JCCUJ0EICF FJ71 ECGPO

173

175 UNTIL ROO1 gt 10 VARY P001 FROM 1 By Comments

In this example RO01 a 2 dimensional array 177 DO with 10 rows and 10 columns is attempting

178 UNTIL R002 - 10 VARY BO FROM 1 BY 1 to update the smaller (5 X 5) TA2 Nonshy- Uexistent element references caused the error

The program text has been superimposed on this

page for illustration purposes

180 ALLOCATE NOOP = TA1 (RO0] ROO)I

JAI UPDATE TA2 (RO01 R002) RO031

182 LIST N002 = R0037 NOO R003

1P3 LIST VALUES OF TA1 = RO03v TAl (RO01 R002)

184 LIST

185 LIST ITA2=TA2 RO01 Rn02)l

l6 END-DOI LIST

187 END-DO

deg r- -- shy

APPENJDIXI1 J( j

CONTROL AREA INPUTS READ OUTPUTS BY OUTPUT LOMANO OUTPUTS TEltCOMMNDfN

SEQUENCE NUMBER 11T A QUESTIONNAIRES RECORDS QUEST KEYWORD RECORDS SFf- 0 T rr-

JOB STOPPED DUETO USER STOP-RUN COMMAND ON LINC 215 - - -- 2 15 _ _ 4$ A

EDITOR -- NO RMAL END OF J B r T I1)

214 IF IN-RECORD-COUNT gt Comments In this example when the internal variable1215 ITHEN STOP-RUN END-THEN was than

215 TE END-THEN IN-RECORD-COUNT was greater than 500 processing was directed by the source COBOL CONCOR language

216 1 program to cease The program text has been - upon this page for illustrationI _-EC superimposed

S IF IN-RECORD-COUNT- lt 2 - - purposes ~~~~~~~~~~- bull - - ------I-

218~ THEN -LIST lt(THAN 20RECS INR

amp-~~~~~ ntai-beDcent

I

APPENDIX J

DIAGNOSTIC MEFSAGE GUIDE EXAMPLE

APPENDIX J DD-2

WARNING(DD-001) BLANK COMMAND STATEMENT LINE

EXPLANATION A blank command statement line was found in the user Dictionary Division command statement file

ACTION TAKEN Parsing continued with the next command statement line in the Dictionary Division command file

USER RESPONSE Put a comment indicator () in column 1 of the command statement line if spacing is desired if not remove the command statement line

ERROR (DD-002) ILLEGAL FIRST CHARACTER IN STRING

EXPLANATION A character not in the CONCOR legal character set (A-Z 0-9 - + the comma character () the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found as the first character in the string

ACTION TAKEN Parsing began again with the next string

USER RESPONSE Check for possible keying errors and ensure that the first character is a member of the valid CONCOR

character set

ERROR (DD-003) END OF LINE REACHED WITH NO DELIMITER FOR ILLEGAL STRING

EXPLANATION A string starting with a character not valid in the CONCOR legal character set (A-Z 0-9 - + the comma character ()p the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found without one of the delimiter characters ( =) when the end of the command line was reached

ACTION TAKEN Parsing began again with the next user command statement line in the Dictionary Division command file

USER RESPONSE Check for possible keying errors and ensure that each of the characters in the string is a member of the CONCOR valid character set (A-Z 0-9 - + the comma character () he delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ())

ERROR (DD-004) COMMENT INDICATOR () IN STRING

EXPLANATION The comment indicator character () was found embedded in a string

Page 10: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …

FIGURE 1

DICTIONARY-DIVISION

DICTIONARY-NAME DATA-CODING-EXAMPLE

INPUT-FILE

OUTPUT-FILE

AREA-CONTROL N2-2 N2-4 N3-6 N2-9 N9-23 QUESTIONNAIRE-CONTROL A4-2 A3-6 A2-9 A3-11 A3-14

RECORD-CONTROL Al-l

DEFINE-RECORD

HOI-TYPE-OF-HOUISING-UNIT Nl-17

H02-MATERIAL-OF-ROOF N1-19 10 9

H03-TOTAL-PERSONS-IN-UNIT N8-40 NOT-NUMERIC BLAIK

1104-STATE-OF-UIIII-CODE A4-50 0 U 1 D

DEFINE-RECORD

P01-SEX 1-13 W F

NEW-DATA

NOI-SAVE-TYPE-OF-HOUSING-UNIT

N02-SAVE-TYPE-OF-ROOF 1

N03-COUNT-TOTAL-IN-UNITS 10 0

N04-AGGREGATE-INCOME 18 0

END-DIVISION

Explanations

N2-4 This is an example of an external numeric input data item (N) with a length of 2 bytes starting in column 4 of the input record The maximumlength of this type of variable outside of NEW-DATA is 9 When coded in

NEW-DATA 18 is permitted

A4-2 This is an example of an external alphanumeric input data item (A)

with a length of 4 bytes starting in column 2 of the input record This

construction for alphanumeric variable is valid only in the control stateshyments Additionally it can never be over 4 bytes in length When alphshynumeric data fields are defined within record types the EDITOR program

requires that the comparison strings always be specified A maximum of 3 is permitted The purpose of these strings is to force recode the data to a numeric value If no match is found EDITOR automatically assigns a unique negative value to the field

7

alphanumeric coding be utilized in the QUESTIONNAIRE-CONTROL and RECORD-CONTROL statements where each input data item must be of the same data type as shown in the example When alphanumeric data variables are used in these control stateshyments their construction is identical to that of numeric items However when used elsewhere in the DATA-DIVISION alphanumeric variables are required to specify one of three possible comparison values as shown There are number of production instances when it never would be necessary or even desirable to reshycode alphanumeric data However as CONCOR attempts to force data into a totally numeric format upon output there is no current way to preserve these values if desired

An unwieldy alternative to this situation which may be acceptable under some circumstances would be the expansion of the number of comparison stringsfrom three to a more realistic number The limitation of this compromise is that a full twenty-six comparison identifiers would be required in order to accommodate data which utilized the entire alphabet A better solutionhowever would be to make the general format of the alphanumeric variables identical to that of numeric identifiers ie A9-23 and to permit alphashynumeric values so defined to pass unaltered through the CONCOR system

Anocher data-naming convention which caused several errors and which could be corrected concerns the array data definitional statements While arraysof two and more dimensions are handled in a superior manner by the CONCOR proshygram single-dimension arrays pose a problem in coding as shown in the Figure 2 It is suggested that the command imperatives be changed to permit the codingof both rows and columns in single dimension arrays ie allow a single row vector as well as a single column vector to maintain the consistel -yof the array data definitional statements

A major requirement of COBOL CONCOR file processing concerns the fact that all related data records must be physically contiguous on the input file The implication of this requirement is that files may require preprocessing prior to actual data editing (This preprocessing is usually a sort routine upon a selected CONTROL-AREA key) While this type of processing merely introduces a new step in file processing a major limitation becomes apparent when a largenumber of DISCRETE DATA files of the same census or survey questionnaire are to be processed This limitation is the introduction of manual steps to save the most recent inputed values ie preventing the program from startingwith cold values each batch run If a command such as LOADUNLOAD ARRAYS was incorporated into the language (an enhancement not believed to be difficult to implement) manual processing would be reduced to a minimum between batches and the maximum benefits of the hot deck methodology would be realized It is envisioned that such a command would automatically insure the transfer of the appropriately designated hot values Automatic processing of this nature if done correctly can greatly reduce the time required to clean multishyvolume files for once CONCOR language statements have been compiled linked

While it is possible at this time to save the arrays that amp-e used in the imputation processes on a separate write-file right now it is not possibleto automatically load those values back to an object program and to iTmedishyately resume processing on another volume It isbelieved that suh an automatic feature of the language would cut down the manual processing time significantly enough that it warrants inclusion into the package prior to its general distribution

FIGURE 2

A05-DI FF-BETWEEN-AGE-OF-FEMALE-BY-RECATION v2 4 4

AGE iF LHUSBAND RELATION Connents The ARRAY-DATA command statement

12-L7 18-24 25-35 36+ provides the means to declare array identifiers

2 1 3v 4v HEAD with up to five dimensions Current documentation 2 -1 3 CHILD is not as explicite about the rules of this 1 31 -2 -4 OTHER command as is desirable The parameters of I 2 2 2 NONRFLATTVE the command should function as follows

user-identifier number of dimensions D R C M number of rows number of columns

magnitude of element intiial start up valuesA06-DI FF-BETI4EEN-ArE-DF-PERSON-AND-M4OTHER 114

(This coding generates the below In the example A05 is a two dimensional array 16 18 21 23 error message) with 4 rows 4 columns a default magvitude of 9

and cold deck values as labeled

A06-DI FF- ETWEEN-AGE-OF-PERSON-AND-MOTHER 11t 587

I 2 III jqARNINIGDD-207) COMMAND TERMINATOR I) NOT FOUND C) ASSUMED PRESENT (2) EPROR (DD-9lI) DIMENSION OF USER-SPECIFIED ARRAY IS LESS THAN THE MINIMUM VALUE PERMITTFD (2)

PREEV1OUS DIAGIOSTIC AT CINIE 563

As shown by the array variable A06 CONCORs treatment of vectors is not consistent with the above multidimensional array skeme ie

(Example of how vectors must be currently A06 must be coded as follows coded to be correct)

A06-DIFF-BETWEEN-AGE-OF- PERSON-AND-MOTHERtl42 user-identifier 1 dimension number of elements in vector magnitude of element initial start up values

A simple modification to this command would permit 6 Lthe coding of both row and column vectors and make

16 LB 21 23 this command less error prone

0 0 0

9

and stored as an object module on the system no other compilations should berequired for questionnaire processing files of the same type Theoreticallya single well-written CONCOR program is all that would be required to process an entire census run

Appendix H contrasts the internal identifiers of the old and new languageversions Without such identifiers a user would have little information about the status of input as it is processed by EDITOR As noted in theappendix most internal pointers are reset upon each break in the CONTROL-AREAprovided a CONTROL-AREA has been defined The limitation here is that there are obvious instances when the termination in the processing mode would beadvantageous based on run counts although a CONTROL-AREA has been specifiedeg debugging CONCOR programs or comparing input files Therefore another set of pointers should be implemented for this purpose and made available for programmer reference

One clearly disturbing development which needs to be pursued during inshydepth testing of the system concerns the MAX-STORAGE parameters of the DEFINE-RECORD statement As shown in the figure on the following page when MAX-STORAGE was set equal to the maximum value a COBOL program was generated whichrequired 1O00K of core to run The MAX-STORAGE value of 999 is clearly notrealistic under most processing circumstances This example drives home severalimportant points about CONCOR The core requiremenis of CONCOR generated proshygrams can be influenced significantly by the amount or nature of programmerspecified I0 operations In fact it is possible to generate a program of a size most foreign country machines could not process It is recommended that tests determine a realistic max-value restriction for implementation to prevent problems in this area

The final area of recommended modification concerns the newly implementedREPORT-DIVISION The purpose of the REPORT-DIVISION is to enable a user todescribe or specify certain CONCOR language statements which will generatestatistical reports These reports contain statistics generated by EDITOR as specified by the GENERATE-EDIT-STATISTICS command of the EXECUTION4-DIVISIONAll of the reports produced are organized according to the data fields definedby the AREA-CONTROL command of the DATA-DICTIONARY If the AREA-CONTROL command is not defined in the DATA-DICTIONARY then all the statistics aresummarized at the total run level If a control area field is defined then allstatistics will be summarized for each unique CONTROL AREA as encountered bythe EDITCR program on the input file Statistics by total run level will notbe available This in part relates back to previous discussions citing theneed for new internal identifiers Report listings may contain the values ofentire records or entire questionnaires depending upon the keyword used inthe report generation commands The problem centers upon the homogeneity of CONCOR printouts during a production run

It is virtually impossible to distinguish reports on the basis of thevolumes they were run against Some means should be provided to allow users touniquely and purposefully label the reports generated in this division Indeedthe whole name REPORT-DIVISION suggests that such a command is implicit andappropriate Such a LABEL-REPORT or REPORT-FILE command along with file inforshymation from the system should not be difficult to implement

FIGURE 3

C O N C o R PAGE

SC0 N S I S T E N C Y A N D C-0 R R E C t I ON ) RUN 6AE 0hi88n

E D I T A N D I M P U T ATI 0 N S Y S T EM

USER DICTIONARY DIVISION-SOURCE LISTING

LINE NUMBER

7070

71

72 DEFINE-RECORD RECORDNAME= HOUSING-RECORDi 72

73 MAX-STORAGE=- 999 73

74 RLrRD-TYPE 11t NOTE AN LITERAL 74

267

71

P6T

268 DEFINE-RECORD RECORD-NAME= POPULATION-RECORD p68

P69MAX-STORAGE= 999269

270RECORD-TYPE= F1270

IEF2R5I JES2JOB04966SO0114 SYSOUT iEF2851 SYS8nO18T095553RAOOOAMCP12ARO000005 PASSED

= IEF2851 VOL qFP NOS TmPOn2 IEF285I wHKHDGMCONCORP12AERRORRECS KEPT

= IEF2851 VOL SEP NOS TMPO03 IEF3731 STEP LOAD START 800181001

IEF3741 STEP LOAD STOP 800181002 CPU OMIN 0165SEC SRR flMIN O25SEC VIRT iO00K SYS 3

STEP START TIME - 100155 STEP END TIME - 100257 STEP CPU TIME - 165 SECONDS STEP COMPLETION CODE - C 0

O TAPES - 0 DISKS - 0 PEGION (REQUESTED - 1000K USED - 1000K) IO COUNT 211

CARDS READ - 0 CARDS PUNCHED - 0 LINES PRINTE6 - 22

STEPNAME - LOAD CHARGE (MACHINE UNITS - 1198 DOLLARS- S-77 5TONUM9E 1

11

Concluding Remarks of System Modifications

Version 1 of the COBOL CONCOR language was notorious for abending without the generation of diagnostic messages Appendix I sets forth a test of some of these historical problems In eac) instance the system provided messagesand source program line references wLch enabled immediate identification of the processing difficulties However tests of the various commands revealed the general high performance character of the language in its newest version Under these circumstances the recommended modifica-ions should be considered as ones which will uniformly allow CONCOR to realize t-r potential of its design enhance its utility and moreover endure as a software product

12

IV THE ADEQUACY OF COBOL CONCOR DOCUMENTATION

The current documentation of CONCOR consists of a newly developed relativelyvoluminous systems manual a Diagnostic Message Guide and some loose-leaf inshystructions concerning the installation of the system on IBM computers Of these three documents the Diagnostic Message Guide is clearly complete and adequateand could bear no changes except to provide message number tabs to enable a programmer to more quickly locate the text of a message number An example of the superior format of this guide appears as Appendix J

Conspicuously absent from the list of systems documents is a Users Guide as originally specified in the ISPC enhancement proposal

A thorough assessment of the Systems Manual has been undertaken and indishycates with some reorganization and editing it could be made a more useful and usable document as itis well constituted informationally In its current form the Systems Manual can be best described as a working document a document which was not intended to teach the CONCOR language but centered upon an audishyence already familiar with previous CONCOR releases Further it is not asystems manual per se as some of its components and appendixes could more appropriately compose a users guide For example excerpts from the POPSTAN publications which review the basic editing concepts and contain the POPSTAN example problem should be deleted from the systems manual and would be more contextually effective in the Users Guide The EXECUTION-DIVISION chapter of the Systems Manual would especially benefit from general editing and reorganishyzation (Specifically the contents of pages 90-92 should be moved to page 63 -shythese pages explain the placement and purpose of the three permissible routines of the CONCOR language Had the author been granted more time he could have outlined additional specific modifications to bring existing documentation to a level of adequacy) However some guidelines stand out

1 Installation instruction and considerations should compose a chapter of the Systems Manual rather than a separate document (Hand-out materials covering installation were not adequate) Further a documentation scheme setting forth how CONCOR was installed on each computer and any modifications required during installation could be serially added to and become a permanent part of the installations systems manual If CONCOR is going to be used and if agencies are going to actively support it as a package installation limitations must be known and well-documented The prescribed format for this documentation should be set forth and any modifishycations to CONCOR should be continuously updated on these shcets so that a history of changes to the language will be readily accessible to supporting agencies If CONCOR is to survive as a standardized system editing results must be capable of being replicated among installations Such a form could contain other information as well concerning the tyce of computer amount of care and nature of utilities supporting

13

users Upon installation a copy of this form could be

sent to the US agency which will ultimately be responsible

for supporting the CONCOR package

an appendix2 A complete COBOL CONCOR program should appear in

for reference

3 The development of the Users Guide should include an intensive

review of the editing concepts involved in processing census

data files beyond the POPSTAN materials

4 An explanation of the CONCOR benchmark program syould appear

in the Users Guide and the Systems Manual The running of a

supplied benchmark program should be a standard installation

protocol used to test all operational aspects of a new

installation

This tool which has5 The development of a CONCOR pocket card been proven useful in teaching and assisting programmers in

utilizing programming language lays out all commands options on

a single small card An example of such a pocket card is the

Such a small document is useful in clari-IBM Assembler Card fying the language and setting forth structural rules without

continual reference to full-size manuals

14

V CONCLUSION

In a situation of extreme need CONCOR could probably be utilized as it seems capable of performing its editing functiens Its usefulness as a data

cleaning tool is generally undisputed The importance of having CONCOR exshyhaustively tested by an independent agency cannot be over-emphasized in light of prior failures of the system to perform adequately Additionally core requirements and processing speed should be precisely determined One must be encouraged in the amount of progress which has been made in a relatively short amount of time towards the completion of the CONCOR package While it is clearly beyond the scope of this report to estimate the cost and the amount of time required to implement the herein outlined highly desirable changes to the CONCOR language and documentation and then exhaustively test the system it is believed that this process could be completed within a single quarter (The changes are of the nature that no structural redefinishytions of a systematic nature would be required) These enhancements would make the COBOL CONCOR package more complete consistent and reliable in the field ISPC has competently moved the system to the point that its comshypletion iswithin reach

Given the probable revision of the current Systems Reference Manual and the development of a suitable Users Guide document it is likely that this version of CONCOR could be taught to experienced programmers (unfamiliar with previous versions of the language) in as little as four weeks And while one is impressed by the potential power and overall simplicity of the COBOL CONCOR language it is difficult to envision individuals with no real computer programming experience utilizing its full capabilities Hence the importance of including a review of basic editing concepts in the documentation is undershylined

Further as documentation is often critical to the perceived ease of use of computer-based systems itwill be essential to have these various manuals examined independently prior to their general use

As previously mentioned Appendix G represents features of which the ISPC feels could have positive impact on CONCOR Depending upon the outcome of the testing recommended in previous chapters and the availability of funding supporting agencies may desire to look again at an assembler (ALC) version of the language

Overall COBOL CONCOR remains an intermediate product yet to realize its potential which is thought considerable Software development of this nature is characteristically of a long-term evolutionary design In light of this reality agencies should be prepared to evaluate such projects success or failure in terms other than that of immediate utility It is the philosophy underlying the system not the system itself which will ultimately be exported

APPENDIX A

Bucen Enforcement Proposal

APPENDIX A

BUCEN IN-HOUSE REVIEW ENHANCEMENT PROPOSAL

1 Easy to use interrecord referencing

2 Improved output file capabilities

A provide overflow protection on WRITE command

B provide OUTPUT command to create corrected questionnaire file that matches IP file data dictionary

3 Improvededit statistics reported (LISTERR)

A provide automatic (user-specified) area break

B provide options for compilation and displaying edit statistics at various levels

C provide automatic (user-specified) tolerance checking of error rates by area

D automatically capture IDs of areas failing tolerance check

4 Clean up known bugs in code

5 Comprehensive testing

6 Clean up and enhance documentation

A reference manual more examples error message guide

B installation guide

C systems manual

APPENDIX B

EVALUATIVE CRITERIA

APPEiUIX B [4 si 11 T q tII

UNITED STATES GOVERNMENshy

Pfern0 KGa 7 dll TO DSPOPFPSD Robert H Haladay

DATE December 3 1979

DSPOPDEIO Liliane Floge

SUBJECT Criteria for Evaluation of SuCen COBOL CONCOR September 30thVersion as presented at January 1930 orkshop

The following points should be considered when evaluatingCONCOR the COBOLediting system during participation at the January 19380 workshyshop

1 Are the Installation anual the System ilanual and theUsers anual ready for use Are these manuals adequate fortheir purposes Are the manuals clear and consistentthey be understood by subject-matter personnel Can

as well program ers

as

2 Is the editing system capable of impleaenting all thefeatures as described in the documenit_tion 3 Does the system appc3Ar to be ely debugged andtested and ready for release to dLvlooing conris 4 Does the system appear to be fast enough to edit a

census in a reasonable airount of time 5 hat size core does tine sys t- equire

6 CorFm-ent in general on coole te-nss usefulness andefficiency of ir-nease usethe n of by developingcountry prograrrers and other c nsus personnel

cc Su-a-- e Olds APiA SDi OrWJulio Ortuizar luotolaCelta Systems

APPENDIX C

WORKSHOP ITINERARY

APPENDIX C

CONCOR Workshop Schedule January 7-18 1980

U S Bureau of the Census International Statistical Programs Center

Scuderi Building Room 209 4235 28th Avenue Marlow Heights Maryland

Monday January 7

930 am shy 1000 Welcoming Remarks

Overview of Workshop

1000 - 1000 Introduction to CONCOR - Purpose and function

- History of development - General computer

requirements

115 - 200 System Description

-Constraints in design of CONCOR

-Basic subsystems of CONCOR

-User interactions with system

-Examples of outputs produced

1030 shy 1045 Break

1045 - 1200 Editing Concepts - Ways to interrogate

data - Ways to correct data

- Editing housing and

population data - POPSTAN

- Advantages of CONCOR

1200 shy 115 pm Break

200 - 230

230 - 245

245 - 325

User Program Organization -Divisions - Sections

- Routines - Commands

Break

Command Language

Description -Types of statements -Format -Syntax

is

2

Tuesday January 8

Dictionary Division Command Statements

115 pm-2 15 Input-Record-Section930 am-10

30 Punctuation - Define-RecordInput data referencing

Working-Data-SectionWorking data declara-- New-Datations - Array-DataInternal data represen-

tation and storage Break215 - 230

1030 - 1045 Break 230 - 325 Dictionary Examples

- Minimum dictionary1045 - 1200 Dictionary-Attributes-

structureSection - Maximum dictionary- Dictionary-Name

structure

- Hand out dictionaryFile-Section problem- Input-File

- Output-File - Write-File

- Error-File

1200 - 115 pm Break

Wednesday January 9

115 pm-2 1 5 Execution Division Command

930 am-1030 Discussion of Partici-Statements (continued)pants Dictionary

- Routines of Edit-Specishyproblems fications-Section

Free work time -Prolog-Routine -Filter-Routine -Epilog-Routine

- Types and functions 1030 - 1045 Break

of edit specification

commands1045 - 1200 Execution Division

Command Statements - Range- Punctuation - Assert- Subscripting

- Internal Identifiers - Report-Control-Section

-Generate-Edit-Statistics 215 - 230 Break-Count-Imputes

-Examples 230 - 325 - PassFail clauses

- List1200 - 115 pm Break

3

Thursday January 10

930 am-1030 Discussion of Problems - If115 pm-215

Free work time - UntilExit - Stop

1030 - 1045 Break 215 - 230 Break

1045 shy 1200 Edit Specification Command Statements 230 - 325 - Drecode

(continued) - Grecode

- Allocate - Update - Let

1200 - 115pm Break

Friday January 11

930 am-1030 Edit Specification 115 pm-3 25 Hand out Edit Specishyfications as problemCommand Statements

(continued) Free work time

- Output - Write

1030 - 1045 Break

1045 - 1200 Report Division Command Statements - Display-Control-

Section -Display-Edit-Statistics

- Tolerance-Control-Section -Error-Rate-Check -Reject-File

-Report Examples

1200 - 115 pm Break

I

29

Monday January 14

930 am-1030 Discuss procedures for running problems on computer

1030-1045 Break

1045-1200 Component Programs of the CONCOR system

1200- 115 pm Break

Tuesday January 15

930 am - 325 pm Free work time

Wednesday January 16

930 am 1200 Free work time

1200- 115 pm Break

115 pm-215 How to Install CONCOR on IBM 360370 OS

215- 230 Break

230-325 Free work time

Thursday January 17

930 am-325 Free work time

115 pm-215 Future CONCOR Design Considerations - for census processing - for survey processing

- manual correction system

215- 230 Break

230 - 245 Evaluation Guidelines

- Hand out evaluation forms

245 - 325 Free work time

Friday January 18

930 am-1030 Free work time 115-325 Free work time

1030 - 1045 Break

1045 - 1200 Group Discussion on CONCOR - submission of written evaluations Distribution of IBM installation tapes Presentation of Certificates to Participants

1200-115 pm Break

APPENDIX D

PARTICIPANTS

APPENDIX D

CONCOR WORKSHOP January 7 - 18 1980 Marlow Heights MD

PARTICIPANTS

KENYA

James Midianga Government Computer Centre Central Bureau of Statistics PO Box 30266 Nairobi Kenya

PANAMA

Omar Rivera Rios Data Processing Specialist Contraloria General de la Republica de Panama Apartado Postal 5213 Panama 5 Panama

PHILIPPINES Guida T Capellan OIC Computer Programming Division National Census and Statistics Office Solicarel Bldg 1 Magsaysay Blvd

Sta Mesa Manila Philippines

THAILAND

Angsumal Sunalai National Statistical Office Bangkok 1 Thailand

EGYPT

Farag Sedky Mourad Ghaleb CAPMAS (Centeral Agency for Public

Mobilization and Statistics)NASR City Cairo Egypt

L

2 31

SAUDI ARABIA

Shargi R Al-Shargi National Computer Center PO Box 2534 Riyadh Saudi Arabia

Ahmed S Al-Ghamdi Dept Director of National Computer Center PO Box 2534 Riyadh Saudi Arabia

OTHER

Robert W ONeal USREPJECOR APO New York NY 09038

John N Adams Data Processing Advisor DACCA Department of State Washington DC 20520

OR co UNDP PO Box 224 Dacca Bangladesh

Joe Quasney US Census Bureau Washington DC 20233

Howard Brunsman 5715 N Ninth St Arlington VA 22205

Larry Shiller and Julio Ortuzar Delta Systems Consultants Inc 264 Alhambra Circle Coral Cables FL 33134

Nicholas Ourusoff Burpee Hill Road New London NH 03257 (United Nations)

Frederic J Grant IV Systems Development GeorgiaWorld Congress Institute 580 North Omni International Atlanta Georgia 30303

3 32

Rafael Samper Mary Jo Keenan Catherine B Gleason LACDR Room 2246 NS Washington DC 20523

John Marshall AIDSERDMPSE Rm 706C SA-12 Washington DC 20523

Susanne Bacon and Mary K Friday ISPCData Processing US Census Bureau Washington DC 20233

Leo Dougherty ISPCWorld Census Staff US Bureau of the Census Washington DC 20233

33

CON C OR WORKSHOP

January 7-18 1979 Washington DC

Staff

Robert R Bair

Luis Garcia

David Malkovsky

Sandra Mansfield

Selma Sawaya

Vivian Toro

APPENDIX E

CONCOR EVALUATION FORM

0

APPENDIX E

CONCOR WORKSHOP January 16 1980

BASIC COBOL VERSION 2

December 1979 Release

Participant Evaluation of Package

1 What is your impression of this version of CONCOR in terms of its usefulness and reliability for editing housing and population census data in less developed countries

2 If you are familiar with previous versions of CONCOR how does this current version compare to them

2D

35

2

3 How would you compare the utility of this version of CONCOR for less developed countries with other systems for editing data of which you are familiar

4 How well do you feel the documentation (Reference Manual and Diagnostic Messages Guide) supports the software

36

3

5 How well do you think this version of CONCOR would serve for editing other types of statistical censuses or surveys

6 If your organization does statistical data processing by computer would you recommend to your superiors that this software be used to edit the data Do you feel assistance would be required in the installation of this software on your organizations computer andor in training userR on the packages applicability to their work

37

7 In what way(s) could improvements or enhancements be made to this version

of the CONCOR software andor its documentation

8 Other comments

1shy

APPENDIX F

NEWOLD COMMAND COMPARISONS

0 0

OLD CONCOR VERSION 1 DECEMBER 1978

DATA-DICTIONARY

DEFINE INPUT-FILE

DEFINE ERROR-FILE

DEFINE COMMON-DATA

(contains the questionnaire-IDand record type location information)

DEFINE RECORD-TYPE=

DEFINE NEW-DATA

DEFINE ARRAY-DATA

APPENDIX F

NEW CONCOR VERSION 2 DECEMBER 1979

DICTIONARY-DIVISION

DICTIONARY-ATTRIBUTES-SECTION

DICTIONARY-NAME

FILE-SECTION

INPUT-FILE OUTPUT-FILE WRITE-FILE ERROR-FILE

DOENTIFICATION-CONTROL-SECTION

AREA-CONTROL

QUESTIONNAIRE-CONTROL

RECORD-CONTROL

INPUT-RECORD-SECTION

DEFINE-RECORD

WORKING-DATA-SECTION

NEW-DATA ARRAY-DATA

END-DiVISION

COMMENTS

In Version 1 all command keyword labels had to begin in column 1 ofthe coding sheet The position notation of Version 2 permits thecommand to begin in columns 1-72

The period preceding a statement

signifies that it is but a commentcard Accordingly it should be noted that while the END-DIVISION command must be present the division name identifier DICTIONARY-DIVISION

has not been implemented

Note Some parameters for each commandhave been altered even where general correspondence exists

OLD CONCOR VERSION 1 DECEMBER 1978

EDIT-SPECIFICATION

PROLOG

FILTER TYPE ( )

EPILOG

ALLOCATE (ALL) ASSERT (AST)

END ERROR

FILTER TYPE ( ) WITH

IF LET

RANGE (RNG) RECODE (REC) STOP --keyword

UPDATE (UPD) WRITE (WRT) XRECODE (XREC)

NEW CONCOR VERSION 2 DECEMBER 1979

EXECUTION-DIVISION

REPORT-CONTROL-SECTION

COUNT-IMPUTES GENERATE-EDIT-STATISTICS

EDIT-SPECIFICATION-SECTION

PROLOG-ROUTINE

FILTER-ROUTINE (name-of-record-type)

EPILOG-ROUTINE

ALLOCATE (ALLOC) ASSERT (ASRT) DRECODE (DRCD) END-DIVISION

EXIT

GRECODE (GRCD) IF LET LIST OUTPUT RANGE (RNG)

STOP UNTIL UPDATE (UPD) WRITE (WRT)

END-DIVISION

COMMENTS

Periods preceding division and section names signify that they are treated as comments by CONCOR

END-DIVISION signifies the termination of the CONCOR division organization and must always be present even though division identifiers have not been implemented

This figure additionally permits a comparison of the EDIT-SPECI-FICATION commands of both CONCOR language versions Note the changes in abbreviations and that some keyword options (not shown) have been altered even where general corresponshydence exists Individual commands are discussed on the following pages

CONCOR LANGUAGE COMMAND STATEMENTS

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

ALLOCATE (ALL) ALLOCATE (ALLOC) Allows for the assignment of values and with the UPDATE command providesthe imputation mechanism Major difference is this statement now providesfor a receiving-identifier list and statistics are generated as coded inthe GENERATE-EDIT-STATISTICS command option of the EXECUTION-DIVISION

ASSERT fAST) ASSERT (ASRT) The ASSERT command is the basic consistency editor command of CONCOR Command now provides for a NOERROR and MESSAGE option Additionallyby utilizinq the DUMPR DUMPO keywords provides the user with the option of using all the current value of all user-identifiers defined for the record type being processed Other changes to this command include the pass end-pass fail end-fail constructions for the P purpose of maintaining structured logic

(see RECODE) DRECODE (DRCD) Provides a method for recoding data items having a starting value of zero and continue in ascending sequence Replaces t previous XRECODE command

EXIT Provides the means to terminate EXECUTION (ESCAPE) of command statements within the UNTIL command

END END-IF END-THEN

A solitary END command is no lonqer permitted in conditional construction Note the similarity to COBOL pseudocode

END-ELSE

ERROR not implemented No longer supported in language

(continued)

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

FILTER TYPE ( ) WITH not implemented No longer are option statements supported in language This command once specified essentially a GOTO depending on construction

(see XRECODE) GRECODE (GRCD) This new command provides a method for the group recoding of input values

IF IF The IF statement has properties found virtually in all programming languages ie provides a means of evaluating a condition and controlling the execution of alternative logical statements Similar to pseudocode a CONCOR programmer must utilize the structured command IF THEN END-THEN ELSE END-ELSE construction in place of the IF THEN END ELSE END statements acceptable to the 1978 version

LET LET Virtually unchanged

LIST New command which allows for the generation of specific messages to be displayed in the report of edit statistics by questionnaire as well as specific individual identifiers

OUTPUT New command which provides the means by which records will be written to the output-file

(continuej)

4)bull

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 VERSION 1

RANGE (RNG)

RECODE (REC)

STOP- keyword

UPDATE (UPD)

(continued)

DECEMBER 1979 VERSION 2

RANGE (RNG)

name changed

STOP- keyword

UNTIL DO END-DO

UPDATE (UPD)

COMMENTS

As a basic editing command of CONCOR the December 1979 version permits the listing of the out-of-range values as well as the entire record or questionnaire through DUMPR and DUMPO optionswithin the command Another new option of this command involves the utilization of a PASS END-PASS FAIL END-FAIL construction permitting the specification of logicalcommand paths depending upon the outcome of the range test

Name changed to DRECODE Virtually identical with the DRC command in the COCENTS system

Unchanged

DO-END-DO one of the most significant and powerful enhancements to the CONCOR language this command provides a loopingcapability similar to most other high-level pogramminglanguages The number of iterations is under user control as well as the value initially assigned to the user identifier (counter) which is incremented with each successive execution of the command statements

Virtually unchanged it is the basic mechanism for maintaining arrays involved in imputation processes

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

WRITE (WRT) WRITE (WRT) The WRITE language statement while once utilized as the primary output command is now used to create an auxiliary or derivative file Statistics may be accumulated with the specification of the GENERATE-EDIT-STATISTICS option of the EXECUTION-DIVISION (See WRITE-FILE of DICTIONARY-DIVISION)

XRECODE (XREC) name changed Old command which has become the GRECODE command in the December 1979 version

0 0 0

DECEMBER 1978 VERSION 1

DECEMBER 1979 VERSION 2

Not implemented REPORT-DIViSION

DISPLAY-CONTROL-SECTION

DISPLAY-EDIT-STATISTICS

TOLERANCE-CONTROL-SECTION

ERROR-RATE-CHECK REJECT-FILE

END-DIVISION

COMMENTS

This new division (note division and section names treated as comments) provides the means to controlthe type of edit statistics reports to be generated Generally all reports produced by the commands ofthis division are organized according to the AREA-CONTROL command of the DICTIONARY-DIVISION Notethat this means the input data file must be preproshycessed (presorted external to CONCOR) on the basisof the AREA-CONTROL data field as CONCOR is incapableof merging uncontiguous records

There is currently no command to enable users to specify unique report headings on the listingfrom this division

APPENDIX G

ISPC FUTURE ENHANCEMENT LIST

G-1

Appendix G

Future Versions of CONCOR Design Considerations

The present version of the CONCOR system (Basic COBOL Version 2) was designed to provide adequate computing power to properly edit housing and population census da a using automatic correction techniquns As a result CONCOR should not be construed to be the ultimate software package for generalized data editing The developers of this version of CONCOR realize that there is a group of improvements that can be made to the system which would facilitate the census data editing process A second group of changes or enhancements also exist that could facilitate the editing of survey or other types of census data It should be noted that some of the modifications to meet these two objectives at times are mutually exclusive Some of the possible modifications and considerations are outlined below

1 Housing and Population Census Processing

1 1 Software

1111 Dictionary Division

(1) Implementation of the Dictionary Attributes Section This would allow the user to specify the dictionary size (number of pages) on disk and control other file characteristics such as volume specification and retention periods if applicable

(2) Addition of an optional input andor output file(s) which could be used to initialize and save values stored in arrays and used during the allocation process This feature would require additional syntax analysis and the creation of new commands to perform the movement of the data values

(3) An output record format area (Output Record Section) could be added This would allow the creation of edited output files in a different format and length than the file read into CONCOR This implementation would require other enhanceients to provide a simple means of moving values from th-iT i tato tile output file Unique naming of user-identifiers would also need to be addressed Another possiblity is the declaration of both the input and output location in the same command when specifying the contents of tile data record to be edited

(4) Addition of user-identifier naires to fields used in the controlling of summary statistics and unique questionnaire determination This would require modification to the system level data dictionary

46 G-2

(5) Increasing the number of area and questionnaire control fields

This implementation would require modification lo the

dictionary as tell as to the error records passed to the Report

Division Another problem would be the formatting of the extra

data values in the report headings

(6) Permit the mixing of difFerent data item types in the CONTROL

commands This would allow the greatest flexibilty in choosing

control fields In order to implement this enhancement the

dictionary and the generated COBOL program (EDITOR) would

require modification

(7) Give the user access to the values in the control fields during

the editing process Either the user would be prohibited from

changing these values (as is currently done with CONCOR

internal identifiers) to prevent the creation of new questionnaires or the summary information gathered on a

questionnaire basis would become meaningless Another

ramification of this enhancement is how to handle fields defined as alphanumeric as CONCOR internally works in binary

(8) Allow input data records with a similar format to be defined by

the same DEFINE-RECORD command statement This could be

accomplished by permitting multiple values in the RECORD-TYPE

clause of the DEFINE-RECORD command A possible problem here

is in the determination of which value to be move to the output

record when the record is to be written out using the OUTPUT

command 0

(9) Implement a COMMION-DATA concept to handle items command to all

record types This could be costly in terms of core storage

and additional conversion time if a storage area is allocated

for each identifier for each record type Only permitting one

common area for all records would require validation of the

data to ensure the values were exactly the same on each record

This wJould require the CONCOR program to perform many more

internal checks when reading the data file into the store area

(10) Permit the selective inclusion of data items found in the

DEFINE-RECORD command statement into the universe count used

for the determination of tolerance levels when using the COUNT-IIPUTES command statement

(11) Increase the number and types of data format referencing now

permitted External numeric data (type N) could be expanded to

handle 18 digit numbers Also signed numeric and packed

decimal data format could be added The signed numeric format

would allow both leading blanks and negative data values This

format is essential ihen dealing with data files generated by

FORTRAN programs

0

+

G-3 47

(12) Increase the number of comparison strings permitted for alphanumeric data items This would allow for easier processing of alphanumeric strings but would require modifications to the system data dictionary

(13) If a range of valid values were specified with the declaration of a user-identifier in the DEFINE-RECORD command an automatic range check could be performed prior to CONCOR giving the user access to the data items on the questionnaire The inclusion of such values are now done by many users in the form of comment statements This enhancement would also facilitate the use of the dictionary by tabulation software

(14) The language format used in the Dictionary Division could be modified to recognize both the space and comma character as valid delimiters The user would still be required to code two commas to receive the system default value for any parameter

(15) Provide a repetition factor in the initialization of arrays

(16) Print the data dictionary name in the titles of the dictionary documentation or provide a means by which the user may specify a title to appear in the listings

112 Execution Division

(M) Implement the Run Control Section This would allow the user to increase execution speed of the EDITOR program This would be accomplished by eliminating some of the system protection features such as division by zero and out of bounds checking In this secion the user could also specify new maximum values for the EDITOR error limits

(2) Provide an internal identifier (OCCURRENCE-PTR) which would contain the occurrence number of the record currently being filtered The only problem to resolve is how to treat this id-ntifier outside of the Filter Routines

(3) Implement a CREATE commnand This command would allow the user to create a record that Was not found on the input file This implementation would require special care as it gives the user the ability to change the valme of a CONCOR internal identifier (TYPE-CGUNT) and modify the contents of the store Another consequence of this command is what action is to be taken in the calculation of tolerance in terms of this new records inclusion into the universe of observations and number of changes

iK

48 G-4

(4) The code generated in the EDITOR program for the DRECODE

command should be modified to utilized a lookup table approach

This would decrease the execution time of the command but woul C

require additional communication between the GENED and GENDD

programs

(5) Permit the user to continue the message text field over

successive lines of code

(6) Implementation of a SET command command that would change

CONCORs default occurrence pointer for the duration of the SET

command This command would be ever helpful as the validation

of the existence of a record would only have to be done once

when the SET command was encounterd The major drawback to

this command is in the users ability to remember the action

taken during the last execution of the command when the

Execution Division command statements may cover many pages of

code

(7) Implement a method of specifying different data formats for

fields on the WRITE command statement This would give greater

flexibility to the user but would require major revision to the

routines now used to process the WRITE command

(8) Provide the user with two sets of internal identifiers The

first set currently exists in the system and is valid on the

basis of the current control area (if specified) being

executed The second set would be accumulated on a run total

basis and would require the creation of new CONCOR internal

ident ifiers

(9) Implement automatic array declarations for capturing allocation

frequency distributions This could be done by the system for

discrete values (especially if point number 13 above is

implemented) but a mechanism for handling continuous values

would need to be designed nother program would then be

required to format the information gathered into a readable

form

113 Report Division

(I) Provide a means by which the user may specify their own headings or titles -for the reports

(2) Give the user the ability to override page ejection as a means

of saving paper

(3) Provide a neans by which the statistics gathered may be

presented in other aqgregations other than the area-break and

total levels now pro ided This would require another

procedure to aggregate the statistics and pass them onto the

Report Division before the printing of the reports began Thi

extra procedure is required because of program size

cons i derat ions

G-5

49

114 System Level

(1) Generation of assembler (ALC) language code for the EDITOR program to be used on the IBM 360370 series type computers

(2) Modification of the GENSRC program i utilize a lookup table instead of the linear search technioa2 now used

(3) Modification of the RDMSGTXT program to look at continuation lines when searching for the text to be supplied by the system

(4) Hlodification to the system data dictionary to better arrange and make available information needed most frequently Also to examine the possibility of making the section dealing with the message text and report generation a separate file Investigate the industry standards for dictionary format and see how CONCOR meets the requirements set forth

(5) Make the parsing and syntacial routines interactive so as to increase the product ity of the CONCOR user This does not mean to imply that the editing process itself should be made interactive but rather only the development of the user code should be put online

12 Documentation

(1)The development of a self-teaching CONCOR manual

(2) The development of a CONCOR Users Guide

G-6

0

50

2 Survey or other Census Processing

21 Software

(1) Make optional the specification of the record type and questionnaire identification information as some files are not

hierarchical in nature

(2) Provide another input file as a means of doing a check-in procedure of sample cases expected This new input file would contain the master list of those oberservations selected to be

examined

(3) Allow floating point calculations

(4) Provide a mechanism for referencing repetitive sections of questionnaire This referencing (loop letter approach) could be accomplished in the same fashion as interrecord checking is now implemented Note that by using this approach two versions of the software would be required One version would be as CONCOR is now implemented and the second would accept flat data files where the subscript indicated a repetitive section on the survey document

(5) Add a weighting scheme to allow meaningful totals and summary

statistics based upon the sampling frame used to gather the data being edited

(6) Increase the number of user-identifiers allowed in the system because of the more detailed information found on survey forms

(7) Provide a means of manual correction of erroneous data items This correction scheme would utilize the data dictionary and allow for correction based upon the user-identifer name and

questionnaire identification CELADE has a COBOL version of

this program but it has not been finished nor tested

42

APPENDIX H

CONCOR SYSTEM INTERNAL VARIABLES

0 APPENO H

CONCOR SYSTEM INTERNAL VARIABLES

1 EOF-FLAG

2 ERROR-TN-QUESTIONNAIRE

3 CURRENT-POINTER-VALUE

4 TYPE-COUNT ( )

5 RECORD-COUNT

6 QUESTIONNAIRE-COUNT

7 RECORDS-IN-STORE

8 INVALID-RECORD-TYPE-COUNT

9 INCOMPLETE-FLAG

10 CONTINUATION-FLAG

11 INVALID-RECORD-FLAG

EOF-FLAG

ERRORS-IN-QUESTIONNAIRE

TYPE-COUNT ( )

IN-RECORD-COUNT

IN-QUESTIONNAIRE-COUNT

RECORDS-IN-STORE

INVALID-RECORD-TYPE-COUNT

INCOMPLETE-FLAG

CONTINUATION-FLAG

OUT-OF-RANGE

NOT-NUMBER

BLANK

(new internal counters) OUT-RECORD-COUNT OUTPUT-QUEST-COUNT WRITE-RECORD-COUNT

Change in spelling of variable

Not mentioned--possible ommission from manual

Not implemented

Note Current documentation equivocates when some variables

are reset--most are initialized with each control area break

Though implemented as reserved identifiers due to poor documentation of (old) CONCOR exact values and utility were unknown

These apparently new counters permit access to valuable totals within control areas

Note These are not cumulative counters Also outputshyquest-count violates naming conversion eg in out

APPENDIX I

CONCOR-EDITOR EXECUTION STATISTICS

C 0 N C o R - E D I T O R EXECUT ION S-TA-T1 ST I CS RUN DATEt 517I

APPENDIX I

CONTROL AREA 0 INPUTS RELD OUTPUTS BY OUTPUTCOHMAND OUTPUTS BY W0ITECOMP4JD NO VALI 0 SEQUENCE NUMBER 00000 00 0 RECTYPES 0

OUESTIONNAIRES RECORDS 0 QUEST KEYWORD- 0 ECORDS- RECORDS FORQUFSTshy

bullDIVISION BY ZERO LINE NUMBER i18 to DIVISION BY ZERO LINE NUMBER 118 bo DIVISION BY ZERO 00 LINE NUMBER 118

Oboo DIVISION BY ZERO 00 LINE NUMBER 118 LTNE |UJMP 000 DIVISION BY ZERO 000 LINE NUMBER 118

bo DIVISION BY ZERO 0 LINE NUMBER 118 00 RUN ABORTED DUE TO DIVISION BY ZERO ERRORS 00 1it LIST IR003 = Ro3t

1 6 42

112 LIST

Comments One of the most sever criticisms 114 LIST IDIVISIION BY ZErO ERROR Cw FOLLOWS of the old CONCOR package was that ]l LET RO01 003 = O it would cease processing for no

apparent reason--ABEND without generating any messages The most )16 LIST iRO01 R003 SET TO ZEROS Pou19 Ron3 conmon types of cessations are division by zero array coordinate 117 LIST RO0 0 = errors and user specified terminations A program to deliberately 118 LET P003 = R002 RO01 test CONCORS ability to providediagnostics was written The system 119 LIST LIST R003 R002RO01 =1 R0031 perfoned exceptionally well and correctly provided the line numbers in the 120 source program which caused the data exceptions The program text has been 121 superimposed upon the Execution Statistics page for illustration purposes 122 END TEST OF OPERANDS AND MSG

123

124

1 5 PEGISTERS RESET TO ZERO

125 LET R001 RO00 R003 = nt

0 N C 1 - Z D I T 0 it - ( r S T A T I S T I C S QJi DAT 0119

I ~ - ~APPENDIX b)APPENDIX TL I A Cr14A) 01 3TUTS JRtTF -0 VtLIQ AP-A I PUT R - 3ITD JTS LY UTPUT - Y C

I)ESTIjAAIES - PECLRW cU] I S T igry-t QEC UP )S irrkqS F] ) T

VLLII= FF= 15( C 1 u U-1 Tr JJ TI A)[ 9 )i 1= TI =WATE ) L i - U = VI L IJ) L T J3r-= I n

D 1 4A 1 EItR P IXSTI J A 1 L = 7 II I fIAT E I i Y C

= y 4 I 19f CZ U AT -Z-2j 1ESTIJh14 E CJ)PI) IATE= 2 V L 1E= L T I P1 =UM 11

= rU F r

F R L- -- I JE TI rjA OrPR eTE= rUMiF=E II V1 LIIF= LIT V M

tYfZU ttj gR R QUIESTI3Ii1 j Vi r IATE= LI n l FJw== 0R V J L T C ( -pJIT E~Q - cUESTT Y2ZjI 1= g rfiflUI)IATL= VeLUJE= P LIf PP= 1 3)

bull~ - A ED 011 1 Tn CCy -tENCE C rGR DI T FOR UP S )r 0 JCCUJ0EICF FJ71 ECGPO

173

175 UNTIL ROO1 gt 10 VARY P001 FROM 1 By Comments

In this example RO01 a 2 dimensional array 177 DO with 10 rows and 10 columns is attempting

178 UNTIL R002 - 10 VARY BO FROM 1 BY 1 to update the smaller (5 X 5) TA2 Nonshy- Uexistent element references caused the error

The program text has been superimposed on this

page for illustration purposes

180 ALLOCATE NOOP = TA1 (RO0] ROO)I

JAI UPDATE TA2 (RO01 R002) RO031

182 LIST N002 = R0037 NOO R003

1P3 LIST VALUES OF TA1 = RO03v TAl (RO01 R002)

184 LIST

185 LIST ITA2=TA2 RO01 Rn02)l

l6 END-DOI LIST

187 END-DO

deg r- -- shy

APPENJDIXI1 J( j

CONTROL AREA INPUTS READ OUTPUTS BY OUTPUT LOMANO OUTPUTS TEltCOMMNDfN

SEQUENCE NUMBER 11T A QUESTIONNAIRES RECORDS QUEST KEYWORD RECORDS SFf- 0 T rr-

JOB STOPPED DUETO USER STOP-RUN COMMAND ON LINC 215 - - -- 2 15 _ _ 4$ A

EDITOR -- NO RMAL END OF J B r T I1)

214 IF IN-RECORD-COUNT gt Comments In this example when the internal variable1215 ITHEN STOP-RUN END-THEN was than

215 TE END-THEN IN-RECORD-COUNT was greater than 500 processing was directed by the source COBOL CONCOR language

216 1 program to cease The program text has been - upon this page for illustrationI _-EC superimposed

S IF IN-RECORD-COUNT- lt 2 - - purposes ~~~~~~~~~~- bull - - ------I-

218~ THEN -LIST lt(THAN 20RECS INR

amp-~~~~~ ntai-beDcent

I

APPENDIX J

DIAGNOSTIC MEFSAGE GUIDE EXAMPLE

APPENDIX J DD-2

WARNING(DD-001) BLANK COMMAND STATEMENT LINE

EXPLANATION A blank command statement line was found in the user Dictionary Division command statement file

ACTION TAKEN Parsing continued with the next command statement line in the Dictionary Division command file

USER RESPONSE Put a comment indicator () in column 1 of the command statement line if spacing is desired if not remove the command statement line

ERROR (DD-002) ILLEGAL FIRST CHARACTER IN STRING

EXPLANATION A character not in the CONCOR legal character set (A-Z 0-9 - + the comma character () the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found as the first character in the string

ACTION TAKEN Parsing began again with the next string

USER RESPONSE Check for possible keying errors and ensure that the first character is a member of the valid CONCOR

character set

ERROR (DD-003) END OF LINE REACHED WITH NO DELIMITER FOR ILLEGAL STRING

EXPLANATION A string starting with a character not valid in the CONCOR legal character set (A-Z 0-9 - + the comma character ()p the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found without one of the delimiter characters ( =) when the end of the command line was reached

ACTION TAKEN Parsing began again with the next user command statement line in the Dictionary Division command file

USER RESPONSE Check for possible keying errors and ensure that each of the characters in the string is a member of the CONCOR valid character set (A-Z 0-9 - + the comma character () he delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ())

ERROR (DD-004) COMMENT INDICATOR () IN STRING

EXPLANATION The comment indicator character () was found embedded in a string

Page 11: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …

7

alphanumeric coding be utilized in the QUESTIONNAIRE-CONTROL and RECORD-CONTROL statements where each input data item must be of the same data type as shown in the example When alphanumeric data variables are used in these control stateshyments their construction is identical to that of numeric items However when used elsewhere in the DATA-DIVISION alphanumeric variables are required to specify one of three possible comparison values as shown There are number of production instances when it never would be necessary or even desirable to reshycode alphanumeric data However as CONCOR attempts to force data into a totally numeric format upon output there is no current way to preserve these values if desired

An unwieldy alternative to this situation which may be acceptable under some circumstances would be the expansion of the number of comparison stringsfrom three to a more realistic number The limitation of this compromise is that a full twenty-six comparison identifiers would be required in order to accommodate data which utilized the entire alphabet A better solutionhowever would be to make the general format of the alphanumeric variables identical to that of numeric identifiers ie A9-23 and to permit alphashynumeric values so defined to pass unaltered through the CONCOR system

Anocher data-naming convention which caused several errors and which could be corrected concerns the array data definitional statements While arraysof two and more dimensions are handled in a superior manner by the CONCOR proshygram single-dimension arrays pose a problem in coding as shown in the Figure 2 It is suggested that the command imperatives be changed to permit the codingof both rows and columns in single dimension arrays ie allow a single row vector as well as a single column vector to maintain the consistel -yof the array data definitional statements

A major requirement of COBOL CONCOR file processing concerns the fact that all related data records must be physically contiguous on the input file The implication of this requirement is that files may require preprocessing prior to actual data editing (This preprocessing is usually a sort routine upon a selected CONTROL-AREA key) While this type of processing merely introduces a new step in file processing a major limitation becomes apparent when a largenumber of DISCRETE DATA files of the same census or survey questionnaire are to be processed This limitation is the introduction of manual steps to save the most recent inputed values ie preventing the program from startingwith cold values each batch run If a command such as LOADUNLOAD ARRAYS was incorporated into the language (an enhancement not believed to be difficult to implement) manual processing would be reduced to a minimum between batches and the maximum benefits of the hot deck methodology would be realized It is envisioned that such a command would automatically insure the transfer of the appropriately designated hot values Automatic processing of this nature if done correctly can greatly reduce the time required to clean multishyvolume files for once CONCOR language statements have been compiled linked

While it is possible at this time to save the arrays that amp-e used in the imputation processes on a separate write-file right now it is not possibleto automatically load those values back to an object program and to iTmedishyately resume processing on another volume It isbelieved that suh an automatic feature of the language would cut down the manual processing time significantly enough that it warrants inclusion into the package prior to its general distribution

FIGURE 2

A05-DI FF-BETWEEN-AGE-OF-FEMALE-BY-RECATION v2 4 4

AGE iF LHUSBAND RELATION Connents The ARRAY-DATA command statement

12-L7 18-24 25-35 36+ provides the means to declare array identifiers

2 1 3v 4v HEAD with up to five dimensions Current documentation 2 -1 3 CHILD is not as explicite about the rules of this 1 31 -2 -4 OTHER command as is desirable The parameters of I 2 2 2 NONRFLATTVE the command should function as follows

user-identifier number of dimensions D R C M number of rows number of columns

magnitude of element intiial start up valuesA06-DI FF-BETI4EEN-ArE-DF-PERSON-AND-M4OTHER 114

(This coding generates the below In the example A05 is a two dimensional array 16 18 21 23 error message) with 4 rows 4 columns a default magvitude of 9

and cold deck values as labeled

A06-DI FF- ETWEEN-AGE-OF-PERSON-AND-MOTHER 11t 587

I 2 III jqARNINIGDD-207) COMMAND TERMINATOR I) NOT FOUND C) ASSUMED PRESENT (2) EPROR (DD-9lI) DIMENSION OF USER-SPECIFIED ARRAY IS LESS THAN THE MINIMUM VALUE PERMITTFD (2)

PREEV1OUS DIAGIOSTIC AT CINIE 563

As shown by the array variable A06 CONCORs treatment of vectors is not consistent with the above multidimensional array skeme ie

(Example of how vectors must be currently A06 must be coded as follows coded to be correct)

A06-DIFF-BETWEEN-AGE-OF- PERSON-AND-MOTHERtl42 user-identifier 1 dimension number of elements in vector magnitude of element initial start up values

A simple modification to this command would permit 6 Lthe coding of both row and column vectors and make

16 LB 21 23 this command less error prone

0 0 0

9

and stored as an object module on the system no other compilations should berequired for questionnaire processing files of the same type Theoreticallya single well-written CONCOR program is all that would be required to process an entire census run

Appendix H contrasts the internal identifiers of the old and new languageversions Without such identifiers a user would have little information about the status of input as it is processed by EDITOR As noted in theappendix most internal pointers are reset upon each break in the CONTROL-AREAprovided a CONTROL-AREA has been defined The limitation here is that there are obvious instances when the termination in the processing mode would beadvantageous based on run counts although a CONTROL-AREA has been specifiedeg debugging CONCOR programs or comparing input files Therefore another set of pointers should be implemented for this purpose and made available for programmer reference

One clearly disturbing development which needs to be pursued during inshydepth testing of the system concerns the MAX-STORAGE parameters of the DEFINE-RECORD statement As shown in the figure on the following page when MAX-STORAGE was set equal to the maximum value a COBOL program was generated whichrequired 1O00K of core to run The MAX-STORAGE value of 999 is clearly notrealistic under most processing circumstances This example drives home severalimportant points about CONCOR The core requiremenis of CONCOR generated proshygrams can be influenced significantly by the amount or nature of programmerspecified I0 operations In fact it is possible to generate a program of a size most foreign country machines could not process It is recommended that tests determine a realistic max-value restriction for implementation to prevent problems in this area

The final area of recommended modification concerns the newly implementedREPORT-DIVISION The purpose of the REPORT-DIVISION is to enable a user todescribe or specify certain CONCOR language statements which will generatestatistical reports These reports contain statistics generated by EDITOR as specified by the GENERATE-EDIT-STATISTICS command of the EXECUTION4-DIVISIONAll of the reports produced are organized according to the data fields definedby the AREA-CONTROL command of the DATA-DICTIONARY If the AREA-CONTROL command is not defined in the DATA-DICTIONARY then all the statistics aresummarized at the total run level If a control area field is defined then allstatistics will be summarized for each unique CONTROL AREA as encountered bythe EDITCR program on the input file Statistics by total run level will notbe available This in part relates back to previous discussions citing theneed for new internal identifiers Report listings may contain the values ofentire records or entire questionnaires depending upon the keyword used inthe report generation commands The problem centers upon the homogeneity of CONCOR printouts during a production run

It is virtually impossible to distinguish reports on the basis of thevolumes they were run against Some means should be provided to allow users touniquely and purposefully label the reports generated in this division Indeedthe whole name REPORT-DIVISION suggests that such a command is implicit andappropriate Such a LABEL-REPORT or REPORT-FILE command along with file inforshymation from the system should not be difficult to implement

FIGURE 3

C O N C o R PAGE

SC0 N S I S T E N C Y A N D C-0 R R E C t I ON ) RUN 6AE 0hi88n

E D I T A N D I M P U T ATI 0 N S Y S T EM

USER DICTIONARY DIVISION-SOURCE LISTING

LINE NUMBER

7070

71

72 DEFINE-RECORD RECORDNAME= HOUSING-RECORDi 72

73 MAX-STORAGE=- 999 73

74 RLrRD-TYPE 11t NOTE AN LITERAL 74

267

71

P6T

268 DEFINE-RECORD RECORD-NAME= POPULATION-RECORD p68

P69MAX-STORAGE= 999269

270RECORD-TYPE= F1270

IEF2R5I JES2JOB04966SO0114 SYSOUT iEF2851 SYS8nO18T095553RAOOOAMCP12ARO000005 PASSED

= IEF2851 VOL qFP NOS TmPOn2 IEF285I wHKHDGMCONCORP12AERRORRECS KEPT

= IEF2851 VOL SEP NOS TMPO03 IEF3731 STEP LOAD START 800181001

IEF3741 STEP LOAD STOP 800181002 CPU OMIN 0165SEC SRR flMIN O25SEC VIRT iO00K SYS 3

STEP START TIME - 100155 STEP END TIME - 100257 STEP CPU TIME - 165 SECONDS STEP COMPLETION CODE - C 0

O TAPES - 0 DISKS - 0 PEGION (REQUESTED - 1000K USED - 1000K) IO COUNT 211

CARDS READ - 0 CARDS PUNCHED - 0 LINES PRINTE6 - 22

STEPNAME - LOAD CHARGE (MACHINE UNITS - 1198 DOLLARS- S-77 5TONUM9E 1

11

Concluding Remarks of System Modifications

Version 1 of the COBOL CONCOR language was notorious for abending without the generation of diagnostic messages Appendix I sets forth a test of some of these historical problems In eac) instance the system provided messagesand source program line references wLch enabled immediate identification of the processing difficulties However tests of the various commands revealed the general high performance character of the language in its newest version Under these circumstances the recommended modifica-ions should be considered as ones which will uniformly allow CONCOR to realize t-r potential of its design enhance its utility and moreover endure as a software product

12

IV THE ADEQUACY OF COBOL CONCOR DOCUMENTATION

The current documentation of CONCOR consists of a newly developed relativelyvoluminous systems manual a Diagnostic Message Guide and some loose-leaf inshystructions concerning the installation of the system on IBM computers Of these three documents the Diagnostic Message Guide is clearly complete and adequateand could bear no changes except to provide message number tabs to enable a programmer to more quickly locate the text of a message number An example of the superior format of this guide appears as Appendix J

Conspicuously absent from the list of systems documents is a Users Guide as originally specified in the ISPC enhancement proposal

A thorough assessment of the Systems Manual has been undertaken and indishycates with some reorganization and editing it could be made a more useful and usable document as itis well constituted informationally In its current form the Systems Manual can be best described as a working document a document which was not intended to teach the CONCOR language but centered upon an audishyence already familiar with previous CONCOR releases Further it is not asystems manual per se as some of its components and appendixes could more appropriately compose a users guide For example excerpts from the POPSTAN publications which review the basic editing concepts and contain the POPSTAN example problem should be deleted from the systems manual and would be more contextually effective in the Users Guide The EXECUTION-DIVISION chapter of the Systems Manual would especially benefit from general editing and reorganishyzation (Specifically the contents of pages 90-92 should be moved to page 63 -shythese pages explain the placement and purpose of the three permissible routines of the CONCOR language Had the author been granted more time he could have outlined additional specific modifications to bring existing documentation to a level of adequacy) However some guidelines stand out

1 Installation instruction and considerations should compose a chapter of the Systems Manual rather than a separate document (Hand-out materials covering installation were not adequate) Further a documentation scheme setting forth how CONCOR was installed on each computer and any modifications required during installation could be serially added to and become a permanent part of the installations systems manual If CONCOR is going to be used and if agencies are going to actively support it as a package installation limitations must be known and well-documented The prescribed format for this documentation should be set forth and any modifishycations to CONCOR should be continuously updated on these shcets so that a history of changes to the language will be readily accessible to supporting agencies If CONCOR is to survive as a standardized system editing results must be capable of being replicated among installations Such a form could contain other information as well concerning the tyce of computer amount of care and nature of utilities supporting

13

users Upon installation a copy of this form could be

sent to the US agency which will ultimately be responsible

for supporting the CONCOR package

an appendix2 A complete COBOL CONCOR program should appear in

for reference

3 The development of the Users Guide should include an intensive

review of the editing concepts involved in processing census

data files beyond the POPSTAN materials

4 An explanation of the CONCOR benchmark program syould appear

in the Users Guide and the Systems Manual The running of a

supplied benchmark program should be a standard installation

protocol used to test all operational aspects of a new

installation

This tool which has5 The development of a CONCOR pocket card been proven useful in teaching and assisting programmers in

utilizing programming language lays out all commands options on

a single small card An example of such a pocket card is the

Such a small document is useful in clari-IBM Assembler Card fying the language and setting forth structural rules without

continual reference to full-size manuals

14

V CONCLUSION

In a situation of extreme need CONCOR could probably be utilized as it seems capable of performing its editing functiens Its usefulness as a data

cleaning tool is generally undisputed The importance of having CONCOR exshyhaustively tested by an independent agency cannot be over-emphasized in light of prior failures of the system to perform adequately Additionally core requirements and processing speed should be precisely determined One must be encouraged in the amount of progress which has been made in a relatively short amount of time towards the completion of the CONCOR package While it is clearly beyond the scope of this report to estimate the cost and the amount of time required to implement the herein outlined highly desirable changes to the CONCOR language and documentation and then exhaustively test the system it is believed that this process could be completed within a single quarter (The changes are of the nature that no structural redefinishytions of a systematic nature would be required) These enhancements would make the COBOL CONCOR package more complete consistent and reliable in the field ISPC has competently moved the system to the point that its comshypletion iswithin reach

Given the probable revision of the current Systems Reference Manual and the development of a suitable Users Guide document it is likely that this version of CONCOR could be taught to experienced programmers (unfamiliar with previous versions of the language) in as little as four weeks And while one is impressed by the potential power and overall simplicity of the COBOL CONCOR language it is difficult to envision individuals with no real computer programming experience utilizing its full capabilities Hence the importance of including a review of basic editing concepts in the documentation is undershylined

Further as documentation is often critical to the perceived ease of use of computer-based systems itwill be essential to have these various manuals examined independently prior to their general use

As previously mentioned Appendix G represents features of which the ISPC feels could have positive impact on CONCOR Depending upon the outcome of the testing recommended in previous chapters and the availability of funding supporting agencies may desire to look again at an assembler (ALC) version of the language

Overall COBOL CONCOR remains an intermediate product yet to realize its potential which is thought considerable Software development of this nature is characteristically of a long-term evolutionary design In light of this reality agencies should be prepared to evaluate such projects success or failure in terms other than that of immediate utility It is the philosophy underlying the system not the system itself which will ultimately be exported

APPENDIX A

Bucen Enforcement Proposal

APPENDIX A

BUCEN IN-HOUSE REVIEW ENHANCEMENT PROPOSAL

1 Easy to use interrecord referencing

2 Improved output file capabilities

A provide overflow protection on WRITE command

B provide OUTPUT command to create corrected questionnaire file that matches IP file data dictionary

3 Improvededit statistics reported (LISTERR)

A provide automatic (user-specified) area break

B provide options for compilation and displaying edit statistics at various levels

C provide automatic (user-specified) tolerance checking of error rates by area

D automatically capture IDs of areas failing tolerance check

4 Clean up known bugs in code

5 Comprehensive testing

6 Clean up and enhance documentation

A reference manual more examples error message guide

B installation guide

C systems manual

APPENDIX B

EVALUATIVE CRITERIA

APPEiUIX B [4 si 11 T q tII

UNITED STATES GOVERNMENshy

Pfern0 KGa 7 dll TO DSPOPFPSD Robert H Haladay

DATE December 3 1979

DSPOPDEIO Liliane Floge

SUBJECT Criteria for Evaluation of SuCen COBOL CONCOR September 30thVersion as presented at January 1930 orkshop

The following points should be considered when evaluatingCONCOR the COBOLediting system during participation at the January 19380 workshyshop

1 Are the Installation anual the System ilanual and theUsers anual ready for use Are these manuals adequate fortheir purposes Are the manuals clear and consistentthey be understood by subject-matter personnel Can

as well program ers

as

2 Is the editing system capable of impleaenting all thefeatures as described in the documenit_tion 3 Does the system appc3Ar to be ely debugged andtested and ready for release to dLvlooing conris 4 Does the system appear to be fast enough to edit a

census in a reasonable airount of time 5 hat size core does tine sys t- equire

6 CorFm-ent in general on coole te-nss usefulness andefficiency of ir-nease usethe n of by developingcountry prograrrers and other c nsus personnel

cc Su-a-- e Olds APiA SDi OrWJulio Ortuizar luotolaCelta Systems

APPENDIX C

WORKSHOP ITINERARY

APPENDIX C

CONCOR Workshop Schedule January 7-18 1980

U S Bureau of the Census International Statistical Programs Center

Scuderi Building Room 209 4235 28th Avenue Marlow Heights Maryland

Monday January 7

930 am shy 1000 Welcoming Remarks

Overview of Workshop

1000 - 1000 Introduction to CONCOR - Purpose and function

- History of development - General computer

requirements

115 - 200 System Description

-Constraints in design of CONCOR

-Basic subsystems of CONCOR

-User interactions with system

-Examples of outputs produced

1030 shy 1045 Break

1045 - 1200 Editing Concepts - Ways to interrogate

data - Ways to correct data

- Editing housing and

population data - POPSTAN

- Advantages of CONCOR

1200 shy 115 pm Break

200 - 230

230 - 245

245 - 325

User Program Organization -Divisions - Sections

- Routines - Commands

Break

Command Language

Description -Types of statements -Format -Syntax

is

2

Tuesday January 8

Dictionary Division Command Statements

115 pm-2 15 Input-Record-Section930 am-10

30 Punctuation - Define-RecordInput data referencing

Working-Data-SectionWorking data declara-- New-Datations - Array-DataInternal data represen-

tation and storage Break215 - 230

1030 - 1045 Break 230 - 325 Dictionary Examples

- Minimum dictionary1045 - 1200 Dictionary-Attributes-

structureSection - Maximum dictionary- Dictionary-Name

structure

- Hand out dictionaryFile-Section problem- Input-File

- Output-File - Write-File

- Error-File

1200 - 115 pm Break

Wednesday January 9

115 pm-2 1 5 Execution Division Command

930 am-1030 Discussion of Partici-Statements (continued)pants Dictionary

- Routines of Edit-Specishyproblems fications-Section

Free work time -Prolog-Routine -Filter-Routine -Epilog-Routine

- Types and functions 1030 - 1045 Break

of edit specification

commands1045 - 1200 Execution Division

Command Statements - Range- Punctuation - Assert- Subscripting

- Internal Identifiers - Report-Control-Section

-Generate-Edit-Statistics 215 - 230 Break-Count-Imputes

-Examples 230 - 325 - PassFail clauses

- List1200 - 115 pm Break

3

Thursday January 10

930 am-1030 Discussion of Problems - If115 pm-215

Free work time - UntilExit - Stop

1030 - 1045 Break 215 - 230 Break

1045 shy 1200 Edit Specification Command Statements 230 - 325 - Drecode

(continued) - Grecode

- Allocate - Update - Let

1200 - 115pm Break

Friday January 11

930 am-1030 Edit Specification 115 pm-3 25 Hand out Edit Specishyfications as problemCommand Statements

(continued) Free work time

- Output - Write

1030 - 1045 Break

1045 - 1200 Report Division Command Statements - Display-Control-

Section -Display-Edit-Statistics

- Tolerance-Control-Section -Error-Rate-Check -Reject-File

-Report Examples

1200 - 115 pm Break

I

29

Monday January 14

930 am-1030 Discuss procedures for running problems on computer

1030-1045 Break

1045-1200 Component Programs of the CONCOR system

1200- 115 pm Break

Tuesday January 15

930 am - 325 pm Free work time

Wednesday January 16

930 am 1200 Free work time

1200- 115 pm Break

115 pm-215 How to Install CONCOR on IBM 360370 OS

215- 230 Break

230-325 Free work time

Thursday January 17

930 am-325 Free work time

115 pm-215 Future CONCOR Design Considerations - for census processing - for survey processing

- manual correction system

215- 230 Break

230 - 245 Evaluation Guidelines

- Hand out evaluation forms

245 - 325 Free work time

Friday January 18

930 am-1030 Free work time 115-325 Free work time

1030 - 1045 Break

1045 - 1200 Group Discussion on CONCOR - submission of written evaluations Distribution of IBM installation tapes Presentation of Certificates to Participants

1200-115 pm Break

APPENDIX D

PARTICIPANTS

APPENDIX D

CONCOR WORKSHOP January 7 - 18 1980 Marlow Heights MD

PARTICIPANTS

KENYA

James Midianga Government Computer Centre Central Bureau of Statistics PO Box 30266 Nairobi Kenya

PANAMA

Omar Rivera Rios Data Processing Specialist Contraloria General de la Republica de Panama Apartado Postal 5213 Panama 5 Panama

PHILIPPINES Guida T Capellan OIC Computer Programming Division National Census and Statistics Office Solicarel Bldg 1 Magsaysay Blvd

Sta Mesa Manila Philippines

THAILAND

Angsumal Sunalai National Statistical Office Bangkok 1 Thailand

EGYPT

Farag Sedky Mourad Ghaleb CAPMAS (Centeral Agency for Public

Mobilization and Statistics)NASR City Cairo Egypt

L

2 31

SAUDI ARABIA

Shargi R Al-Shargi National Computer Center PO Box 2534 Riyadh Saudi Arabia

Ahmed S Al-Ghamdi Dept Director of National Computer Center PO Box 2534 Riyadh Saudi Arabia

OTHER

Robert W ONeal USREPJECOR APO New York NY 09038

John N Adams Data Processing Advisor DACCA Department of State Washington DC 20520

OR co UNDP PO Box 224 Dacca Bangladesh

Joe Quasney US Census Bureau Washington DC 20233

Howard Brunsman 5715 N Ninth St Arlington VA 22205

Larry Shiller and Julio Ortuzar Delta Systems Consultants Inc 264 Alhambra Circle Coral Cables FL 33134

Nicholas Ourusoff Burpee Hill Road New London NH 03257 (United Nations)

Frederic J Grant IV Systems Development GeorgiaWorld Congress Institute 580 North Omni International Atlanta Georgia 30303

3 32

Rafael Samper Mary Jo Keenan Catherine B Gleason LACDR Room 2246 NS Washington DC 20523

John Marshall AIDSERDMPSE Rm 706C SA-12 Washington DC 20523

Susanne Bacon and Mary K Friday ISPCData Processing US Census Bureau Washington DC 20233

Leo Dougherty ISPCWorld Census Staff US Bureau of the Census Washington DC 20233

33

CON C OR WORKSHOP

January 7-18 1979 Washington DC

Staff

Robert R Bair

Luis Garcia

David Malkovsky

Sandra Mansfield

Selma Sawaya

Vivian Toro

APPENDIX E

CONCOR EVALUATION FORM

0

APPENDIX E

CONCOR WORKSHOP January 16 1980

BASIC COBOL VERSION 2

December 1979 Release

Participant Evaluation of Package

1 What is your impression of this version of CONCOR in terms of its usefulness and reliability for editing housing and population census data in less developed countries

2 If you are familiar with previous versions of CONCOR how does this current version compare to them

2D

35

2

3 How would you compare the utility of this version of CONCOR for less developed countries with other systems for editing data of which you are familiar

4 How well do you feel the documentation (Reference Manual and Diagnostic Messages Guide) supports the software

36

3

5 How well do you think this version of CONCOR would serve for editing other types of statistical censuses or surveys

6 If your organization does statistical data processing by computer would you recommend to your superiors that this software be used to edit the data Do you feel assistance would be required in the installation of this software on your organizations computer andor in training userR on the packages applicability to their work

37

7 In what way(s) could improvements or enhancements be made to this version

of the CONCOR software andor its documentation

8 Other comments

1shy

APPENDIX F

NEWOLD COMMAND COMPARISONS

0 0

OLD CONCOR VERSION 1 DECEMBER 1978

DATA-DICTIONARY

DEFINE INPUT-FILE

DEFINE ERROR-FILE

DEFINE COMMON-DATA

(contains the questionnaire-IDand record type location information)

DEFINE RECORD-TYPE=

DEFINE NEW-DATA

DEFINE ARRAY-DATA

APPENDIX F

NEW CONCOR VERSION 2 DECEMBER 1979

DICTIONARY-DIVISION

DICTIONARY-ATTRIBUTES-SECTION

DICTIONARY-NAME

FILE-SECTION

INPUT-FILE OUTPUT-FILE WRITE-FILE ERROR-FILE

DOENTIFICATION-CONTROL-SECTION

AREA-CONTROL

QUESTIONNAIRE-CONTROL

RECORD-CONTROL

INPUT-RECORD-SECTION

DEFINE-RECORD

WORKING-DATA-SECTION

NEW-DATA ARRAY-DATA

END-DiVISION

COMMENTS

In Version 1 all command keyword labels had to begin in column 1 ofthe coding sheet The position notation of Version 2 permits thecommand to begin in columns 1-72

The period preceding a statement

signifies that it is but a commentcard Accordingly it should be noted that while the END-DIVISION command must be present the division name identifier DICTIONARY-DIVISION

has not been implemented

Note Some parameters for each commandhave been altered even where general correspondence exists

OLD CONCOR VERSION 1 DECEMBER 1978

EDIT-SPECIFICATION

PROLOG

FILTER TYPE ( )

EPILOG

ALLOCATE (ALL) ASSERT (AST)

END ERROR

FILTER TYPE ( ) WITH

IF LET

RANGE (RNG) RECODE (REC) STOP --keyword

UPDATE (UPD) WRITE (WRT) XRECODE (XREC)

NEW CONCOR VERSION 2 DECEMBER 1979

EXECUTION-DIVISION

REPORT-CONTROL-SECTION

COUNT-IMPUTES GENERATE-EDIT-STATISTICS

EDIT-SPECIFICATION-SECTION

PROLOG-ROUTINE

FILTER-ROUTINE (name-of-record-type)

EPILOG-ROUTINE

ALLOCATE (ALLOC) ASSERT (ASRT) DRECODE (DRCD) END-DIVISION

EXIT

GRECODE (GRCD) IF LET LIST OUTPUT RANGE (RNG)

STOP UNTIL UPDATE (UPD) WRITE (WRT)

END-DIVISION

COMMENTS

Periods preceding division and section names signify that they are treated as comments by CONCOR

END-DIVISION signifies the termination of the CONCOR division organization and must always be present even though division identifiers have not been implemented

This figure additionally permits a comparison of the EDIT-SPECI-FICATION commands of both CONCOR language versions Note the changes in abbreviations and that some keyword options (not shown) have been altered even where general corresponshydence exists Individual commands are discussed on the following pages

CONCOR LANGUAGE COMMAND STATEMENTS

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

ALLOCATE (ALL) ALLOCATE (ALLOC) Allows for the assignment of values and with the UPDATE command providesthe imputation mechanism Major difference is this statement now providesfor a receiving-identifier list and statistics are generated as coded inthe GENERATE-EDIT-STATISTICS command option of the EXECUTION-DIVISION

ASSERT fAST) ASSERT (ASRT) The ASSERT command is the basic consistency editor command of CONCOR Command now provides for a NOERROR and MESSAGE option Additionallyby utilizinq the DUMPR DUMPO keywords provides the user with the option of using all the current value of all user-identifiers defined for the record type being processed Other changes to this command include the pass end-pass fail end-fail constructions for the P purpose of maintaining structured logic

(see RECODE) DRECODE (DRCD) Provides a method for recoding data items having a starting value of zero and continue in ascending sequence Replaces t previous XRECODE command

EXIT Provides the means to terminate EXECUTION (ESCAPE) of command statements within the UNTIL command

END END-IF END-THEN

A solitary END command is no lonqer permitted in conditional construction Note the similarity to COBOL pseudocode

END-ELSE

ERROR not implemented No longer supported in language

(continued)

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

FILTER TYPE ( ) WITH not implemented No longer are option statements supported in language This command once specified essentially a GOTO depending on construction

(see XRECODE) GRECODE (GRCD) This new command provides a method for the group recoding of input values

IF IF The IF statement has properties found virtually in all programming languages ie provides a means of evaluating a condition and controlling the execution of alternative logical statements Similar to pseudocode a CONCOR programmer must utilize the structured command IF THEN END-THEN ELSE END-ELSE construction in place of the IF THEN END ELSE END statements acceptable to the 1978 version

LET LET Virtually unchanged

LIST New command which allows for the generation of specific messages to be displayed in the report of edit statistics by questionnaire as well as specific individual identifiers

OUTPUT New command which provides the means by which records will be written to the output-file

(continuej)

4)bull

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 VERSION 1

RANGE (RNG)

RECODE (REC)

STOP- keyword

UPDATE (UPD)

(continued)

DECEMBER 1979 VERSION 2

RANGE (RNG)

name changed

STOP- keyword

UNTIL DO END-DO

UPDATE (UPD)

COMMENTS

As a basic editing command of CONCOR the December 1979 version permits the listing of the out-of-range values as well as the entire record or questionnaire through DUMPR and DUMPO optionswithin the command Another new option of this command involves the utilization of a PASS END-PASS FAIL END-FAIL construction permitting the specification of logicalcommand paths depending upon the outcome of the range test

Name changed to DRECODE Virtually identical with the DRC command in the COCENTS system

Unchanged

DO-END-DO one of the most significant and powerful enhancements to the CONCOR language this command provides a loopingcapability similar to most other high-level pogramminglanguages The number of iterations is under user control as well as the value initially assigned to the user identifier (counter) which is incremented with each successive execution of the command statements

Virtually unchanged it is the basic mechanism for maintaining arrays involved in imputation processes

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

WRITE (WRT) WRITE (WRT) The WRITE language statement while once utilized as the primary output command is now used to create an auxiliary or derivative file Statistics may be accumulated with the specification of the GENERATE-EDIT-STATISTICS option of the EXECUTION-DIVISION (See WRITE-FILE of DICTIONARY-DIVISION)

XRECODE (XREC) name changed Old command which has become the GRECODE command in the December 1979 version

0 0 0

DECEMBER 1978 VERSION 1

DECEMBER 1979 VERSION 2

Not implemented REPORT-DIViSION

DISPLAY-CONTROL-SECTION

DISPLAY-EDIT-STATISTICS

TOLERANCE-CONTROL-SECTION

ERROR-RATE-CHECK REJECT-FILE

END-DIVISION

COMMENTS

This new division (note division and section names treated as comments) provides the means to controlthe type of edit statistics reports to be generated Generally all reports produced by the commands ofthis division are organized according to the AREA-CONTROL command of the DICTIONARY-DIVISION Notethat this means the input data file must be preproshycessed (presorted external to CONCOR) on the basisof the AREA-CONTROL data field as CONCOR is incapableof merging uncontiguous records

There is currently no command to enable users to specify unique report headings on the listingfrom this division

APPENDIX G

ISPC FUTURE ENHANCEMENT LIST

G-1

Appendix G

Future Versions of CONCOR Design Considerations

The present version of the CONCOR system (Basic COBOL Version 2) was designed to provide adequate computing power to properly edit housing and population census da a using automatic correction techniquns As a result CONCOR should not be construed to be the ultimate software package for generalized data editing The developers of this version of CONCOR realize that there is a group of improvements that can be made to the system which would facilitate the census data editing process A second group of changes or enhancements also exist that could facilitate the editing of survey or other types of census data It should be noted that some of the modifications to meet these two objectives at times are mutually exclusive Some of the possible modifications and considerations are outlined below

1 Housing and Population Census Processing

1 1 Software

1111 Dictionary Division

(1) Implementation of the Dictionary Attributes Section This would allow the user to specify the dictionary size (number of pages) on disk and control other file characteristics such as volume specification and retention periods if applicable

(2) Addition of an optional input andor output file(s) which could be used to initialize and save values stored in arrays and used during the allocation process This feature would require additional syntax analysis and the creation of new commands to perform the movement of the data values

(3) An output record format area (Output Record Section) could be added This would allow the creation of edited output files in a different format and length than the file read into CONCOR This implementation would require other enhanceients to provide a simple means of moving values from th-iT i tato tile output file Unique naming of user-identifiers would also need to be addressed Another possiblity is the declaration of both the input and output location in the same command when specifying the contents of tile data record to be edited

(4) Addition of user-identifier naires to fields used in the controlling of summary statistics and unique questionnaire determination This would require modification to the system level data dictionary

46 G-2

(5) Increasing the number of area and questionnaire control fields

This implementation would require modification lo the

dictionary as tell as to the error records passed to the Report

Division Another problem would be the formatting of the extra

data values in the report headings

(6) Permit the mixing of difFerent data item types in the CONTROL

commands This would allow the greatest flexibilty in choosing

control fields In order to implement this enhancement the

dictionary and the generated COBOL program (EDITOR) would

require modification

(7) Give the user access to the values in the control fields during

the editing process Either the user would be prohibited from

changing these values (as is currently done with CONCOR

internal identifiers) to prevent the creation of new questionnaires or the summary information gathered on a

questionnaire basis would become meaningless Another

ramification of this enhancement is how to handle fields defined as alphanumeric as CONCOR internally works in binary

(8) Allow input data records with a similar format to be defined by

the same DEFINE-RECORD command statement This could be

accomplished by permitting multiple values in the RECORD-TYPE

clause of the DEFINE-RECORD command A possible problem here

is in the determination of which value to be move to the output

record when the record is to be written out using the OUTPUT

command 0

(9) Implement a COMMION-DATA concept to handle items command to all

record types This could be costly in terms of core storage

and additional conversion time if a storage area is allocated

for each identifier for each record type Only permitting one

common area for all records would require validation of the

data to ensure the values were exactly the same on each record

This wJould require the CONCOR program to perform many more

internal checks when reading the data file into the store area

(10) Permit the selective inclusion of data items found in the

DEFINE-RECORD command statement into the universe count used

for the determination of tolerance levels when using the COUNT-IIPUTES command statement

(11) Increase the number and types of data format referencing now

permitted External numeric data (type N) could be expanded to

handle 18 digit numbers Also signed numeric and packed

decimal data format could be added The signed numeric format

would allow both leading blanks and negative data values This

format is essential ihen dealing with data files generated by

FORTRAN programs

0

+

G-3 47

(12) Increase the number of comparison strings permitted for alphanumeric data items This would allow for easier processing of alphanumeric strings but would require modifications to the system data dictionary

(13) If a range of valid values were specified with the declaration of a user-identifier in the DEFINE-RECORD command an automatic range check could be performed prior to CONCOR giving the user access to the data items on the questionnaire The inclusion of such values are now done by many users in the form of comment statements This enhancement would also facilitate the use of the dictionary by tabulation software

(14) The language format used in the Dictionary Division could be modified to recognize both the space and comma character as valid delimiters The user would still be required to code two commas to receive the system default value for any parameter

(15) Provide a repetition factor in the initialization of arrays

(16) Print the data dictionary name in the titles of the dictionary documentation or provide a means by which the user may specify a title to appear in the listings

112 Execution Division

(M) Implement the Run Control Section This would allow the user to increase execution speed of the EDITOR program This would be accomplished by eliminating some of the system protection features such as division by zero and out of bounds checking In this secion the user could also specify new maximum values for the EDITOR error limits

(2) Provide an internal identifier (OCCURRENCE-PTR) which would contain the occurrence number of the record currently being filtered The only problem to resolve is how to treat this id-ntifier outside of the Filter Routines

(3) Implement a CREATE commnand This command would allow the user to create a record that Was not found on the input file This implementation would require special care as it gives the user the ability to change the valme of a CONCOR internal identifier (TYPE-CGUNT) and modify the contents of the store Another consequence of this command is what action is to be taken in the calculation of tolerance in terms of this new records inclusion into the universe of observations and number of changes

iK

48 G-4

(4) The code generated in the EDITOR program for the DRECODE

command should be modified to utilized a lookup table approach

This would decrease the execution time of the command but woul C

require additional communication between the GENED and GENDD

programs

(5) Permit the user to continue the message text field over

successive lines of code

(6) Implementation of a SET command command that would change

CONCORs default occurrence pointer for the duration of the SET

command This command would be ever helpful as the validation

of the existence of a record would only have to be done once

when the SET command was encounterd The major drawback to

this command is in the users ability to remember the action

taken during the last execution of the command when the

Execution Division command statements may cover many pages of

code

(7) Implement a method of specifying different data formats for

fields on the WRITE command statement This would give greater

flexibility to the user but would require major revision to the

routines now used to process the WRITE command

(8) Provide the user with two sets of internal identifiers The

first set currently exists in the system and is valid on the

basis of the current control area (if specified) being

executed The second set would be accumulated on a run total

basis and would require the creation of new CONCOR internal

ident ifiers

(9) Implement automatic array declarations for capturing allocation

frequency distributions This could be done by the system for

discrete values (especially if point number 13 above is

implemented) but a mechanism for handling continuous values

would need to be designed nother program would then be

required to format the information gathered into a readable

form

113 Report Division

(I) Provide a means by which the user may specify their own headings or titles -for the reports

(2) Give the user the ability to override page ejection as a means

of saving paper

(3) Provide a neans by which the statistics gathered may be

presented in other aqgregations other than the area-break and

total levels now pro ided This would require another

procedure to aggregate the statistics and pass them onto the

Report Division before the printing of the reports began Thi

extra procedure is required because of program size

cons i derat ions

G-5

49

114 System Level

(1) Generation of assembler (ALC) language code for the EDITOR program to be used on the IBM 360370 series type computers

(2) Modification of the GENSRC program i utilize a lookup table instead of the linear search technioa2 now used

(3) Modification of the RDMSGTXT program to look at continuation lines when searching for the text to be supplied by the system

(4) Hlodification to the system data dictionary to better arrange and make available information needed most frequently Also to examine the possibility of making the section dealing with the message text and report generation a separate file Investigate the industry standards for dictionary format and see how CONCOR meets the requirements set forth

(5) Make the parsing and syntacial routines interactive so as to increase the product ity of the CONCOR user This does not mean to imply that the editing process itself should be made interactive but rather only the development of the user code should be put online

12 Documentation

(1)The development of a self-teaching CONCOR manual

(2) The development of a CONCOR Users Guide

G-6

0

50

2 Survey or other Census Processing

21 Software

(1) Make optional the specification of the record type and questionnaire identification information as some files are not

hierarchical in nature

(2) Provide another input file as a means of doing a check-in procedure of sample cases expected This new input file would contain the master list of those oberservations selected to be

examined

(3) Allow floating point calculations

(4) Provide a mechanism for referencing repetitive sections of questionnaire This referencing (loop letter approach) could be accomplished in the same fashion as interrecord checking is now implemented Note that by using this approach two versions of the software would be required One version would be as CONCOR is now implemented and the second would accept flat data files where the subscript indicated a repetitive section on the survey document

(5) Add a weighting scheme to allow meaningful totals and summary

statistics based upon the sampling frame used to gather the data being edited

(6) Increase the number of user-identifiers allowed in the system because of the more detailed information found on survey forms

(7) Provide a means of manual correction of erroneous data items This correction scheme would utilize the data dictionary and allow for correction based upon the user-identifer name and

questionnaire identification CELADE has a COBOL version of

this program but it has not been finished nor tested

42

APPENDIX H

CONCOR SYSTEM INTERNAL VARIABLES

0 APPENO H

CONCOR SYSTEM INTERNAL VARIABLES

1 EOF-FLAG

2 ERROR-TN-QUESTIONNAIRE

3 CURRENT-POINTER-VALUE

4 TYPE-COUNT ( )

5 RECORD-COUNT

6 QUESTIONNAIRE-COUNT

7 RECORDS-IN-STORE

8 INVALID-RECORD-TYPE-COUNT

9 INCOMPLETE-FLAG

10 CONTINUATION-FLAG

11 INVALID-RECORD-FLAG

EOF-FLAG

ERRORS-IN-QUESTIONNAIRE

TYPE-COUNT ( )

IN-RECORD-COUNT

IN-QUESTIONNAIRE-COUNT

RECORDS-IN-STORE

INVALID-RECORD-TYPE-COUNT

INCOMPLETE-FLAG

CONTINUATION-FLAG

OUT-OF-RANGE

NOT-NUMBER

BLANK

(new internal counters) OUT-RECORD-COUNT OUTPUT-QUEST-COUNT WRITE-RECORD-COUNT

Change in spelling of variable

Not mentioned--possible ommission from manual

Not implemented

Note Current documentation equivocates when some variables

are reset--most are initialized with each control area break

Though implemented as reserved identifiers due to poor documentation of (old) CONCOR exact values and utility were unknown

These apparently new counters permit access to valuable totals within control areas

Note These are not cumulative counters Also outputshyquest-count violates naming conversion eg in out

APPENDIX I

CONCOR-EDITOR EXECUTION STATISTICS

C 0 N C o R - E D I T O R EXECUT ION S-TA-T1 ST I CS RUN DATEt 517I

APPENDIX I

CONTROL AREA 0 INPUTS RELD OUTPUTS BY OUTPUTCOHMAND OUTPUTS BY W0ITECOMP4JD NO VALI 0 SEQUENCE NUMBER 00000 00 0 RECTYPES 0

OUESTIONNAIRES RECORDS 0 QUEST KEYWORD- 0 ECORDS- RECORDS FORQUFSTshy

bullDIVISION BY ZERO LINE NUMBER i18 to DIVISION BY ZERO LINE NUMBER 118 bo DIVISION BY ZERO 00 LINE NUMBER 118

Oboo DIVISION BY ZERO 00 LINE NUMBER 118 LTNE |UJMP 000 DIVISION BY ZERO 000 LINE NUMBER 118

bo DIVISION BY ZERO 0 LINE NUMBER 118 00 RUN ABORTED DUE TO DIVISION BY ZERO ERRORS 00 1it LIST IR003 = Ro3t

1 6 42

112 LIST

Comments One of the most sever criticisms 114 LIST IDIVISIION BY ZErO ERROR Cw FOLLOWS of the old CONCOR package was that ]l LET RO01 003 = O it would cease processing for no

apparent reason--ABEND without generating any messages The most )16 LIST iRO01 R003 SET TO ZEROS Pou19 Ron3 conmon types of cessations are division by zero array coordinate 117 LIST RO0 0 = errors and user specified terminations A program to deliberately 118 LET P003 = R002 RO01 test CONCORS ability to providediagnostics was written The system 119 LIST LIST R003 R002RO01 =1 R0031 perfoned exceptionally well and correctly provided the line numbers in the 120 source program which caused the data exceptions The program text has been 121 superimposed upon the Execution Statistics page for illustration purposes 122 END TEST OF OPERANDS AND MSG

123

124

1 5 PEGISTERS RESET TO ZERO

125 LET R001 RO00 R003 = nt

0 N C 1 - Z D I T 0 it - ( r S T A T I S T I C S QJi DAT 0119

I ~ - ~APPENDIX b)APPENDIX TL I A Cr14A) 01 3TUTS JRtTF -0 VtLIQ AP-A I PUT R - 3ITD JTS LY UTPUT - Y C

I)ESTIjAAIES - PECLRW cU] I S T igry-t QEC UP )S irrkqS F] ) T

VLLII= FF= 15( C 1 u U-1 Tr JJ TI A)[ 9 )i 1= TI =WATE ) L i - U = VI L IJ) L T J3r-= I n

D 1 4A 1 EItR P IXSTI J A 1 L = 7 II I fIAT E I i Y C

= y 4 I 19f CZ U AT -Z-2j 1ESTIJh14 E CJ)PI) IATE= 2 V L 1E= L T I P1 =UM 11

= rU F r

F R L- -- I JE TI rjA OrPR eTE= rUMiF=E II V1 LIIF= LIT V M

tYfZU ttj gR R QUIESTI3Ii1 j Vi r IATE= LI n l FJw== 0R V J L T C ( -pJIT E~Q - cUESTT Y2ZjI 1= g rfiflUI)IATL= VeLUJE= P LIf PP= 1 3)

bull~ - A ED 011 1 Tn CCy -tENCE C rGR DI T FOR UP S )r 0 JCCUJ0EICF FJ71 ECGPO

173

175 UNTIL ROO1 gt 10 VARY P001 FROM 1 By Comments

In this example RO01 a 2 dimensional array 177 DO with 10 rows and 10 columns is attempting

178 UNTIL R002 - 10 VARY BO FROM 1 BY 1 to update the smaller (5 X 5) TA2 Nonshy- Uexistent element references caused the error

The program text has been superimposed on this

page for illustration purposes

180 ALLOCATE NOOP = TA1 (RO0] ROO)I

JAI UPDATE TA2 (RO01 R002) RO031

182 LIST N002 = R0037 NOO R003

1P3 LIST VALUES OF TA1 = RO03v TAl (RO01 R002)

184 LIST

185 LIST ITA2=TA2 RO01 Rn02)l

l6 END-DOI LIST

187 END-DO

deg r- -- shy

APPENJDIXI1 J( j

CONTROL AREA INPUTS READ OUTPUTS BY OUTPUT LOMANO OUTPUTS TEltCOMMNDfN

SEQUENCE NUMBER 11T A QUESTIONNAIRES RECORDS QUEST KEYWORD RECORDS SFf- 0 T rr-

JOB STOPPED DUETO USER STOP-RUN COMMAND ON LINC 215 - - -- 2 15 _ _ 4$ A

EDITOR -- NO RMAL END OF J B r T I1)

214 IF IN-RECORD-COUNT gt Comments In this example when the internal variable1215 ITHEN STOP-RUN END-THEN was than

215 TE END-THEN IN-RECORD-COUNT was greater than 500 processing was directed by the source COBOL CONCOR language

216 1 program to cease The program text has been - upon this page for illustrationI _-EC superimposed

S IF IN-RECORD-COUNT- lt 2 - - purposes ~~~~~~~~~~- bull - - ------I-

218~ THEN -LIST lt(THAN 20RECS INR

amp-~~~~~ ntai-beDcent

I

APPENDIX J

DIAGNOSTIC MEFSAGE GUIDE EXAMPLE

APPENDIX J DD-2

WARNING(DD-001) BLANK COMMAND STATEMENT LINE

EXPLANATION A blank command statement line was found in the user Dictionary Division command statement file

ACTION TAKEN Parsing continued with the next command statement line in the Dictionary Division command file

USER RESPONSE Put a comment indicator () in column 1 of the command statement line if spacing is desired if not remove the command statement line

ERROR (DD-002) ILLEGAL FIRST CHARACTER IN STRING

EXPLANATION A character not in the CONCOR legal character set (A-Z 0-9 - + the comma character () the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found as the first character in the string

ACTION TAKEN Parsing began again with the next string

USER RESPONSE Check for possible keying errors and ensure that the first character is a member of the valid CONCOR

character set

ERROR (DD-003) END OF LINE REACHED WITH NO DELIMITER FOR ILLEGAL STRING

EXPLANATION A string starting with a character not valid in the CONCOR legal character set (A-Z 0-9 - + the comma character ()p the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found without one of the delimiter characters ( =) when the end of the command line was reached

ACTION TAKEN Parsing began again with the next user command statement line in the Dictionary Division command file

USER RESPONSE Check for possible keying errors and ensure that each of the characters in the string is a member of the CONCOR valid character set (A-Z 0-9 - + the comma character () he delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ())

ERROR (DD-004) COMMENT INDICATOR () IN STRING

EXPLANATION The comment indicator character () was found embedded in a string

Page 12: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …

FIGURE 2

A05-DI FF-BETWEEN-AGE-OF-FEMALE-BY-RECATION v2 4 4

AGE iF LHUSBAND RELATION Connents The ARRAY-DATA command statement

12-L7 18-24 25-35 36+ provides the means to declare array identifiers

2 1 3v 4v HEAD with up to five dimensions Current documentation 2 -1 3 CHILD is not as explicite about the rules of this 1 31 -2 -4 OTHER command as is desirable The parameters of I 2 2 2 NONRFLATTVE the command should function as follows

user-identifier number of dimensions D R C M number of rows number of columns

magnitude of element intiial start up valuesA06-DI FF-BETI4EEN-ArE-DF-PERSON-AND-M4OTHER 114

(This coding generates the below In the example A05 is a two dimensional array 16 18 21 23 error message) with 4 rows 4 columns a default magvitude of 9

and cold deck values as labeled

A06-DI FF- ETWEEN-AGE-OF-PERSON-AND-MOTHER 11t 587

I 2 III jqARNINIGDD-207) COMMAND TERMINATOR I) NOT FOUND C) ASSUMED PRESENT (2) EPROR (DD-9lI) DIMENSION OF USER-SPECIFIED ARRAY IS LESS THAN THE MINIMUM VALUE PERMITTFD (2)

PREEV1OUS DIAGIOSTIC AT CINIE 563

As shown by the array variable A06 CONCORs treatment of vectors is not consistent with the above multidimensional array skeme ie

(Example of how vectors must be currently A06 must be coded as follows coded to be correct)

A06-DIFF-BETWEEN-AGE-OF- PERSON-AND-MOTHERtl42 user-identifier 1 dimension number of elements in vector magnitude of element initial start up values

A simple modification to this command would permit 6 Lthe coding of both row and column vectors and make

16 LB 21 23 this command less error prone

0 0 0

9

and stored as an object module on the system no other compilations should berequired for questionnaire processing files of the same type Theoreticallya single well-written CONCOR program is all that would be required to process an entire census run

Appendix H contrasts the internal identifiers of the old and new languageversions Without such identifiers a user would have little information about the status of input as it is processed by EDITOR As noted in theappendix most internal pointers are reset upon each break in the CONTROL-AREAprovided a CONTROL-AREA has been defined The limitation here is that there are obvious instances when the termination in the processing mode would beadvantageous based on run counts although a CONTROL-AREA has been specifiedeg debugging CONCOR programs or comparing input files Therefore another set of pointers should be implemented for this purpose and made available for programmer reference

One clearly disturbing development which needs to be pursued during inshydepth testing of the system concerns the MAX-STORAGE parameters of the DEFINE-RECORD statement As shown in the figure on the following page when MAX-STORAGE was set equal to the maximum value a COBOL program was generated whichrequired 1O00K of core to run The MAX-STORAGE value of 999 is clearly notrealistic under most processing circumstances This example drives home severalimportant points about CONCOR The core requiremenis of CONCOR generated proshygrams can be influenced significantly by the amount or nature of programmerspecified I0 operations In fact it is possible to generate a program of a size most foreign country machines could not process It is recommended that tests determine a realistic max-value restriction for implementation to prevent problems in this area

The final area of recommended modification concerns the newly implementedREPORT-DIVISION The purpose of the REPORT-DIVISION is to enable a user todescribe or specify certain CONCOR language statements which will generatestatistical reports These reports contain statistics generated by EDITOR as specified by the GENERATE-EDIT-STATISTICS command of the EXECUTION4-DIVISIONAll of the reports produced are organized according to the data fields definedby the AREA-CONTROL command of the DATA-DICTIONARY If the AREA-CONTROL command is not defined in the DATA-DICTIONARY then all the statistics aresummarized at the total run level If a control area field is defined then allstatistics will be summarized for each unique CONTROL AREA as encountered bythe EDITCR program on the input file Statistics by total run level will notbe available This in part relates back to previous discussions citing theneed for new internal identifiers Report listings may contain the values ofentire records or entire questionnaires depending upon the keyword used inthe report generation commands The problem centers upon the homogeneity of CONCOR printouts during a production run

It is virtually impossible to distinguish reports on the basis of thevolumes they were run against Some means should be provided to allow users touniquely and purposefully label the reports generated in this division Indeedthe whole name REPORT-DIVISION suggests that such a command is implicit andappropriate Such a LABEL-REPORT or REPORT-FILE command along with file inforshymation from the system should not be difficult to implement

FIGURE 3

C O N C o R PAGE

SC0 N S I S T E N C Y A N D C-0 R R E C t I ON ) RUN 6AE 0hi88n

E D I T A N D I M P U T ATI 0 N S Y S T EM

USER DICTIONARY DIVISION-SOURCE LISTING

LINE NUMBER

7070

71

72 DEFINE-RECORD RECORDNAME= HOUSING-RECORDi 72

73 MAX-STORAGE=- 999 73

74 RLrRD-TYPE 11t NOTE AN LITERAL 74

267

71

P6T

268 DEFINE-RECORD RECORD-NAME= POPULATION-RECORD p68

P69MAX-STORAGE= 999269

270RECORD-TYPE= F1270

IEF2R5I JES2JOB04966SO0114 SYSOUT iEF2851 SYS8nO18T095553RAOOOAMCP12ARO000005 PASSED

= IEF2851 VOL qFP NOS TmPOn2 IEF285I wHKHDGMCONCORP12AERRORRECS KEPT

= IEF2851 VOL SEP NOS TMPO03 IEF3731 STEP LOAD START 800181001

IEF3741 STEP LOAD STOP 800181002 CPU OMIN 0165SEC SRR flMIN O25SEC VIRT iO00K SYS 3

STEP START TIME - 100155 STEP END TIME - 100257 STEP CPU TIME - 165 SECONDS STEP COMPLETION CODE - C 0

O TAPES - 0 DISKS - 0 PEGION (REQUESTED - 1000K USED - 1000K) IO COUNT 211

CARDS READ - 0 CARDS PUNCHED - 0 LINES PRINTE6 - 22

STEPNAME - LOAD CHARGE (MACHINE UNITS - 1198 DOLLARS- S-77 5TONUM9E 1

11

Concluding Remarks of System Modifications

Version 1 of the COBOL CONCOR language was notorious for abending without the generation of diagnostic messages Appendix I sets forth a test of some of these historical problems In eac) instance the system provided messagesand source program line references wLch enabled immediate identification of the processing difficulties However tests of the various commands revealed the general high performance character of the language in its newest version Under these circumstances the recommended modifica-ions should be considered as ones which will uniformly allow CONCOR to realize t-r potential of its design enhance its utility and moreover endure as a software product

12

IV THE ADEQUACY OF COBOL CONCOR DOCUMENTATION

The current documentation of CONCOR consists of a newly developed relativelyvoluminous systems manual a Diagnostic Message Guide and some loose-leaf inshystructions concerning the installation of the system on IBM computers Of these three documents the Diagnostic Message Guide is clearly complete and adequateand could bear no changes except to provide message number tabs to enable a programmer to more quickly locate the text of a message number An example of the superior format of this guide appears as Appendix J

Conspicuously absent from the list of systems documents is a Users Guide as originally specified in the ISPC enhancement proposal

A thorough assessment of the Systems Manual has been undertaken and indishycates with some reorganization and editing it could be made a more useful and usable document as itis well constituted informationally In its current form the Systems Manual can be best described as a working document a document which was not intended to teach the CONCOR language but centered upon an audishyence already familiar with previous CONCOR releases Further it is not asystems manual per se as some of its components and appendixes could more appropriately compose a users guide For example excerpts from the POPSTAN publications which review the basic editing concepts and contain the POPSTAN example problem should be deleted from the systems manual and would be more contextually effective in the Users Guide The EXECUTION-DIVISION chapter of the Systems Manual would especially benefit from general editing and reorganishyzation (Specifically the contents of pages 90-92 should be moved to page 63 -shythese pages explain the placement and purpose of the three permissible routines of the CONCOR language Had the author been granted more time he could have outlined additional specific modifications to bring existing documentation to a level of adequacy) However some guidelines stand out

1 Installation instruction and considerations should compose a chapter of the Systems Manual rather than a separate document (Hand-out materials covering installation were not adequate) Further a documentation scheme setting forth how CONCOR was installed on each computer and any modifications required during installation could be serially added to and become a permanent part of the installations systems manual If CONCOR is going to be used and if agencies are going to actively support it as a package installation limitations must be known and well-documented The prescribed format for this documentation should be set forth and any modifishycations to CONCOR should be continuously updated on these shcets so that a history of changes to the language will be readily accessible to supporting agencies If CONCOR is to survive as a standardized system editing results must be capable of being replicated among installations Such a form could contain other information as well concerning the tyce of computer amount of care and nature of utilities supporting

13

users Upon installation a copy of this form could be

sent to the US agency which will ultimately be responsible

for supporting the CONCOR package

an appendix2 A complete COBOL CONCOR program should appear in

for reference

3 The development of the Users Guide should include an intensive

review of the editing concepts involved in processing census

data files beyond the POPSTAN materials

4 An explanation of the CONCOR benchmark program syould appear

in the Users Guide and the Systems Manual The running of a

supplied benchmark program should be a standard installation

protocol used to test all operational aspects of a new

installation

This tool which has5 The development of a CONCOR pocket card been proven useful in teaching and assisting programmers in

utilizing programming language lays out all commands options on

a single small card An example of such a pocket card is the

Such a small document is useful in clari-IBM Assembler Card fying the language and setting forth structural rules without

continual reference to full-size manuals

14

V CONCLUSION

In a situation of extreme need CONCOR could probably be utilized as it seems capable of performing its editing functiens Its usefulness as a data

cleaning tool is generally undisputed The importance of having CONCOR exshyhaustively tested by an independent agency cannot be over-emphasized in light of prior failures of the system to perform adequately Additionally core requirements and processing speed should be precisely determined One must be encouraged in the amount of progress which has been made in a relatively short amount of time towards the completion of the CONCOR package While it is clearly beyond the scope of this report to estimate the cost and the amount of time required to implement the herein outlined highly desirable changes to the CONCOR language and documentation and then exhaustively test the system it is believed that this process could be completed within a single quarter (The changes are of the nature that no structural redefinishytions of a systematic nature would be required) These enhancements would make the COBOL CONCOR package more complete consistent and reliable in the field ISPC has competently moved the system to the point that its comshypletion iswithin reach

Given the probable revision of the current Systems Reference Manual and the development of a suitable Users Guide document it is likely that this version of CONCOR could be taught to experienced programmers (unfamiliar with previous versions of the language) in as little as four weeks And while one is impressed by the potential power and overall simplicity of the COBOL CONCOR language it is difficult to envision individuals with no real computer programming experience utilizing its full capabilities Hence the importance of including a review of basic editing concepts in the documentation is undershylined

Further as documentation is often critical to the perceived ease of use of computer-based systems itwill be essential to have these various manuals examined independently prior to their general use

As previously mentioned Appendix G represents features of which the ISPC feels could have positive impact on CONCOR Depending upon the outcome of the testing recommended in previous chapters and the availability of funding supporting agencies may desire to look again at an assembler (ALC) version of the language

Overall COBOL CONCOR remains an intermediate product yet to realize its potential which is thought considerable Software development of this nature is characteristically of a long-term evolutionary design In light of this reality agencies should be prepared to evaluate such projects success or failure in terms other than that of immediate utility It is the philosophy underlying the system not the system itself which will ultimately be exported

APPENDIX A

Bucen Enforcement Proposal

APPENDIX A

BUCEN IN-HOUSE REVIEW ENHANCEMENT PROPOSAL

1 Easy to use interrecord referencing

2 Improved output file capabilities

A provide overflow protection on WRITE command

B provide OUTPUT command to create corrected questionnaire file that matches IP file data dictionary

3 Improvededit statistics reported (LISTERR)

A provide automatic (user-specified) area break

B provide options for compilation and displaying edit statistics at various levels

C provide automatic (user-specified) tolerance checking of error rates by area

D automatically capture IDs of areas failing tolerance check

4 Clean up known bugs in code

5 Comprehensive testing

6 Clean up and enhance documentation

A reference manual more examples error message guide

B installation guide

C systems manual

APPENDIX B

EVALUATIVE CRITERIA

APPEiUIX B [4 si 11 T q tII

UNITED STATES GOVERNMENshy

Pfern0 KGa 7 dll TO DSPOPFPSD Robert H Haladay

DATE December 3 1979

DSPOPDEIO Liliane Floge

SUBJECT Criteria for Evaluation of SuCen COBOL CONCOR September 30thVersion as presented at January 1930 orkshop

The following points should be considered when evaluatingCONCOR the COBOLediting system during participation at the January 19380 workshyshop

1 Are the Installation anual the System ilanual and theUsers anual ready for use Are these manuals adequate fortheir purposes Are the manuals clear and consistentthey be understood by subject-matter personnel Can

as well program ers

as

2 Is the editing system capable of impleaenting all thefeatures as described in the documenit_tion 3 Does the system appc3Ar to be ely debugged andtested and ready for release to dLvlooing conris 4 Does the system appear to be fast enough to edit a

census in a reasonable airount of time 5 hat size core does tine sys t- equire

6 CorFm-ent in general on coole te-nss usefulness andefficiency of ir-nease usethe n of by developingcountry prograrrers and other c nsus personnel

cc Su-a-- e Olds APiA SDi OrWJulio Ortuizar luotolaCelta Systems

APPENDIX C

WORKSHOP ITINERARY

APPENDIX C

CONCOR Workshop Schedule January 7-18 1980

U S Bureau of the Census International Statistical Programs Center

Scuderi Building Room 209 4235 28th Avenue Marlow Heights Maryland

Monday January 7

930 am shy 1000 Welcoming Remarks

Overview of Workshop

1000 - 1000 Introduction to CONCOR - Purpose and function

- History of development - General computer

requirements

115 - 200 System Description

-Constraints in design of CONCOR

-Basic subsystems of CONCOR

-User interactions with system

-Examples of outputs produced

1030 shy 1045 Break

1045 - 1200 Editing Concepts - Ways to interrogate

data - Ways to correct data

- Editing housing and

population data - POPSTAN

- Advantages of CONCOR

1200 shy 115 pm Break

200 - 230

230 - 245

245 - 325

User Program Organization -Divisions - Sections

- Routines - Commands

Break

Command Language

Description -Types of statements -Format -Syntax

is

2

Tuesday January 8

Dictionary Division Command Statements

115 pm-2 15 Input-Record-Section930 am-10

30 Punctuation - Define-RecordInput data referencing

Working-Data-SectionWorking data declara-- New-Datations - Array-DataInternal data represen-

tation and storage Break215 - 230

1030 - 1045 Break 230 - 325 Dictionary Examples

- Minimum dictionary1045 - 1200 Dictionary-Attributes-

structureSection - Maximum dictionary- Dictionary-Name

structure

- Hand out dictionaryFile-Section problem- Input-File

- Output-File - Write-File

- Error-File

1200 - 115 pm Break

Wednesday January 9

115 pm-2 1 5 Execution Division Command

930 am-1030 Discussion of Partici-Statements (continued)pants Dictionary

- Routines of Edit-Specishyproblems fications-Section

Free work time -Prolog-Routine -Filter-Routine -Epilog-Routine

- Types and functions 1030 - 1045 Break

of edit specification

commands1045 - 1200 Execution Division

Command Statements - Range- Punctuation - Assert- Subscripting

- Internal Identifiers - Report-Control-Section

-Generate-Edit-Statistics 215 - 230 Break-Count-Imputes

-Examples 230 - 325 - PassFail clauses

- List1200 - 115 pm Break

3

Thursday January 10

930 am-1030 Discussion of Problems - If115 pm-215

Free work time - UntilExit - Stop

1030 - 1045 Break 215 - 230 Break

1045 shy 1200 Edit Specification Command Statements 230 - 325 - Drecode

(continued) - Grecode

- Allocate - Update - Let

1200 - 115pm Break

Friday January 11

930 am-1030 Edit Specification 115 pm-3 25 Hand out Edit Specishyfications as problemCommand Statements

(continued) Free work time

- Output - Write

1030 - 1045 Break

1045 - 1200 Report Division Command Statements - Display-Control-

Section -Display-Edit-Statistics

- Tolerance-Control-Section -Error-Rate-Check -Reject-File

-Report Examples

1200 - 115 pm Break

I

29

Monday January 14

930 am-1030 Discuss procedures for running problems on computer

1030-1045 Break

1045-1200 Component Programs of the CONCOR system

1200- 115 pm Break

Tuesday January 15

930 am - 325 pm Free work time

Wednesday January 16

930 am 1200 Free work time

1200- 115 pm Break

115 pm-215 How to Install CONCOR on IBM 360370 OS

215- 230 Break

230-325 Free work time

Thursday January 17

930 am-325 Free work time

115 pm-215 Future CONCOR Design Considerations - for census processing - for survey processing

- manual correction system

215- 230 Break

230 - 245 Evaluation Guidelines

- Hand out evaluation forms

245 - 325 Free work time

Friday January 18

930 am-1030 Free work time 115-325 Free work time

1030 - 1045 Break

1045 - 1200 Group Discussion on CONCOR - submission of written evaluations Distribution of IBM installation tapes Presentation of Certificates to Participants

1200-115 pm Break

APPENDIX D

PARTICIPANTS

APPENDIX D

CONCOR WORKSHOP January 7 - 18 1980 Marlow Heights MD

PARTICIPANTS

KENYA

James Midianga Government Computer Centre Central Bureau of Statistics PO Box 30266 Nairobi Kenya

PANAMA

Omar Rivera Rios Data Processing Specialist Contraloria General de la Republica de Panama Apartado Postal 5213 Panama 5 Panama

PHILIPPINES Guida T Capellan OIC Computer Programming Division National Census and Statistics Office Solicarel Bldg 1 Magsaysay Blvd

Sta Mesa Manila Philippines

THAILAND

Angsumal Sunalai National Statistical Office Bangkok 1 Thailand

EGYPT

Farag Sedky Mourad Ghaleb CAPMAS (Centeral Agency for Public

Mobilization and Statistics)NASR City Cairo Egypt

L

2 31

SAUDI ARABIA

Shargi R Al-Shargi National Computer Center PO Box 2534 Riyadh Saudi Arabia

Ahmed S Al-Ghamdi Dept Director of National Computer Center PO Box 2534 Riyadh Saudi Arabia

OTHER

Robert W ONeal USREPJECOR APO New York NY 09038

John N Adams Data Processing Advisor DACCA Department of State Washington DC 20520

OR co UNDP PO Box 224 Dacca Bangladesh

Joe Quasney US Census Bureau Washington DC 20233

Howard Brunsman 5715 N Ninth St Arlington VA 22205

Larry Shiller and Julio Ortuzar Delta Systems Consultants Inc 264 Alhambra Circle Coral Cables FL 33134

Nicholas Ourusoff Burpee Hill Road New London NH 03257 (United Nations)

Frederic J Grant IV Systems Development GeorgiaWorld Congress Institute 580 North Omni International Atlanta Georgia 30303

3 32

Rafael Samper Mary Jo Keenan Catherine B Gleason LACDR Room 2246 NS Washington DC 20523

John Marshall AIDSERDMPSE Rm 706C SA-12 Washington DC 20523

Susanne Bacon and Mary K Friday ISPCData Processing US Census Bureau Washington DC 20233

Leo Dougherty ISPCWorld Census Staff US Bureau of the Census Washington DC 20233

33

CON C OR WORKSHOP

January 7-18 1979 Washington DC

Staff

Robert R Bair

Luis Garcia

David Malkovsky

Sandra Mansfield

Selma Sawaya

Vivian Toro

APPENDIX E

CONCOR EVALUATION FORM

0

APPENDIX E

CONCOR WORKSHOP January 16 1980

BASIC COBOL VERSION 2

December 1979 Release

Participant Evaluation of Package

1 What is your impression of this version of CONCOR in terms of its usefulness and reliability for editing housing and population census data in less developed countries

2 If you are familiar with previous versions of CONCOR how does this current version compare to them

2D

35

2

3 How would you compare the utility of this version of CONCOR for less developed countries with other systems for editing data of which you are familiar

4 How well do you feel the documentation (Reference Manual and Diagnostic Messages Guide) supports the software

36

3

5 How well do you think this version of CONCOR would serve for editing other types of statistical censuses or surveys

6 If your organization does statistical data processing by computer would you recommend to your superiors that this software be used to edit the data Do you feel assistance would be required in the installation of this software on your organizations computer andor in training userR on the packages applicability to their work

37

7 In what way(s) could improvements or enhancements be made to this version

of the CONCOR software andor its documentation

8 Other comments

1shy

APPENDIX F

NEWOLD COMMAND COMPARISONS

0 0

OLD CONCOR VERSION 1 DECEMBER 1978

DATA-DICTIONARY

DEFINE INPUT-FILE

DEFINE ERROR-FILE

DEFINE COMMON-DATA

(contains the questionnaire-IDand record type location information)

DEFINE RECORD-TYPE=

DEFINE NEW-DATA

DEFINE ARRAY-DATA

APPENDIX F

NEW CONCOR VERSION 2 DECEMBER 1979

DICTIONARY-DIVISION

DICTIONARY-ATTRIBUTES-SECTION

DICTIONARY-NAME

FILE-SECTION

INPUT-FILE OUTPUT-FILE WRITE-FILE ERROR-FILE

DOENTIFICATION-CONTROL-SECTION

AREA-CONTROL

QUESTIONNAIRE-CONTROL

RECORD-CONTROL

INPUT-RECORD-SECTION

DEFINE-RECORD

WORKING-DATA-SECTION

NEW-DATA ARRAY-DATA

END-DiVISION

COMMENTS

In Version 1 all command keyword labels had to begin in column 1 ofthe coding sheet The position notation of Version 2 permits thecommand to begin in columns 1-72

The period preceding a statement

signifies that it is but a commentcard Accordingly it should be noted that while the END-DIVISION command must be present the division name identifier DICTIONARY-DIVISION

has not been implemented

Note Some parameters for each commandhave been altered even where general correspondence exists

OLD CONCOR VERSION 1 DECEMBER 1978

EDIT-SPECIFICATION

PROLOG

FILTER TYPE ( )

EPILOG

ALLOCATE (ALL) ASSERT (AST)

END ERROR

FILTER TYPE ( ) WITH

IF LET

RANGE (RNG) RECODE (REC) STOP --keyword

UPDATE (UPD) WRITE (WRT) XRECODE (XREC)

NEW CONCOR VERSION 2 DECEMBER 1979

EXECUTION-DIVISION

REPORT-CONTROL-SECTION

COUNT-IMPUTES GENERATE-EDIT-STATISTICS

EDIT-SPECIFICATION-SECTION

PROLOG-ROUTINE

FILTER-ROUTINE (name-of-record-type)

EPILOG-ROUTINE

ALLOCATE (ALLOC) ASSERT (ASRT) DRECODE (DRCD) END-DIVISION

EXIT

GRECODE (GRCD) IF LET LIST OUTPUT RANGE (RNG)

STOP UNTIL UPDATE (UPD) WRITE (WRT)

END-DIVISION

COMMENTS

Periods preceding division and section names signify that they are treated as comments by CONCOR

END-DIVISION signifies the termination of the CONCOR division organization and must always be present even though division identifiers have not been implemented

This figure additionally permits a comparison of the EDIT-SPECI-FICATION commands of both CONCOR language versions Note the changes in abbreviations and that some keyword options (not shown) have been altered even where general corresponshydence exists Individual commands are discussed on the following pages

CONCOR LANGUAGE COMMAND STATEMENTS

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

ALLOCATE (ALL) ALLOCATE (ALLOC) Allows for the assignment of values and with the UPDATE command providesthe imputation mechanism Major difference is this statement now providesfor a receiving-identifier list and statistics are generated as coded inthe GENERATE-EDIT-STATISTICS command option of the EXECUTION-DIVISION

ASSERT fAST) ASSERT (ASRT) The ASSERT command is the basic consistency editor command of CONCOR Command now provides for a NOERROR and MESSAGE option Additionallyby utilizinq the DUMPR DUMPO keywords provides the user with the option of using all the current value of all user-identifiers defined for the record type being processed Other changes to this command include the pass end-pass fail end-fail constructions for the P purpose of maintaining structured logic

(see RECODE) DRECODE (DRCD) Provides a method for recoding data items having a starting value of zero and continue in ascending sequence Replaces t previous XRECODE command

EXIT Provides the means to terminate EXECUTION (ESCAPE) of command statements within the UNTIL command

END END-IF END-THEN

A solitary END command is no lonqer permitted in conditional construction Note the similarity to COBOL pseudocode

END-ELSE

ERROR not implemented No longer supported in language

(continued)

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

FILTER TYPE ( ) WITH not implemented No longer are option statements supported in language This command once specified essentially a GOTO depending on construction

(see XRECODE) GRECODE (GRCD) This new command provides a method for the group recoding of input values

IF IF The IF statement has properties found virtually in all programming languages ie provides a means of evaluating a condition and controlling the execution of alternative logical statements Similar to pseudocode a CONCOR programmer must utilize the structured command IF THEN END-THEN ELSE END-ELSE construction in place of the IF THEN END ELSE END statements acceptable to the 1978 version

LET LET Virtually unchanged

LIST New command which allows for the generation of specific messages to be displayed in the report of edit statistics by questionnaire as well as specific individual identifiers

OUTPUT New command which provides the means by which records will be written to the output-file

(continuej)

4)bull

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 VERSION 1

RANGE (RNG)

RECODE (REC)

STOP- keyword

UPDATE (UPD)

(continued)

DECEMBER 1979 VERSION 2

RANGE (RNG)

name changed

STOP- keyword

UNTIL DO END-DO

UPDATE (UPD)

COMMENTS

As a basic editing command of CONCOR the December 1979 version permits the listing of the out-of-range values as well as the entire record or questionnaire through DUMPR and DUMPO optionswithin the command Another new option of this command involves the utilization of a PASS END-PASS FAIL END-FAIL construction permitting the specification of logicalcommand paths depending upon the outcome of the range test

Name changed to DRECODE Virtually identical with the DRC command in the COCENTS system

Unchanged

DO-END-DO one of the most significant and powerful enhancements to the CONCOR language this command provides a loopingcapability similar to most other high-level pogramminglanguages The number of iterations is under user control as well as the value initially assigned to the user identifier (counter) which is incremented with each successive execution of the command statements

Virtually unchanged it is the basic mechanism for maintaining arrays involved in imputation processes

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

WRITE (WRT) WRITE (WRT) The WRITE language statement while once utilized as the primary output command is now used to create an auxiliary or derivative file Statistics may be accumulated with the specification of the GENERATE-EDIT-STATISTICS option of the EXECUTION-DIVISION (See WRITE-FILE of DICTIONARY-DIVISION)

XRECODE (XREC) name changed Old command which has become the GRECODE command in the December 1979 version

0 0 0

DECEMBER 1978 VERSION 1

DECEMBER 1979 VERSION 2

Not implemented REPORT-DIViSION

DISPLAY-CONTROL-SECTION

DISPLAY-EDIT-STATISTICS

TOLERANCE-CONTROL-SECTION

ERROR-RATE-CHECK REJECT-FILE

END-DIVISION

COMMENTS

This new division (note division and section names treated as comments) provides the means to controlthe type of edit statistics reports to be generated Generally all reports produced by the commands ofthis division are organized according to the AREA-CONTROL command of the DICTIONARY-DIVISION Notethat this means the input data file must be preproshycessed (presorted external to CONCOR) on the basisof the AREA-CONTROL data field as CONCOR is incapableof merging uncontiguous records

There is currently no command to enable users to specify unique report headings on the listingfrom this division

APPENDIX G

ISPC FUTURE ENHANCEMENT LIST

G-1

Appendix G

Future Versions of CONCOR Design Considerations

The present version of the CONCOR system (Basic COBOL Version 2) was designed to provide adequate computing power to properly edit housing and population census da a using automatic correction techniquns As a result CONCOR should not be construed to be the ultimate software package for generalized data editing The developers of this version of CONCOR realize that there is a group of improvements that can be made to the system which would facilitate the census data editing process A second group of changes or enhancements also exist that could facilitate the editing of survey or other types of census data It should be noted that some of the modifications to meet these two objectives at times are mutually exclusive Some of the possible modifications and considerations are outlined below

1 Housing and Population Census Processing

1 1 Software

1111 Dictionary Division

(1) Implementation of the Dictionary Attributes Section This would allow the user to specify the dictionary size (number of pages) on disk and control other file characteristics such as volume specification and retention periods if applicable

(2) Addition of an optional input andor output file(s) which could be used to initialize and save values stored in arrays and used during the allocation process This feature would require additional syntax analysis and the creation of new commands to perform the movement of the data values

(3) An output record format area (Output Record Section) could be added This would allow the creation of edited output files in a different format and length than the file read into CONCOR This implementation would require other enhanceients to provide a simple means of moving values from th-iT i tato tile output file Unique naming of user-identifiers would also need to be addressed Another possiblity is the declaration of both the input and output location in the same command when specifying the contents of tile data record to be edited

(4) Addition of user-identifier naires to fields used in the controlling of summary statistics and unique questionnaire determination This would require modification to the system level data dictionary

46 G-2

(5) Increasing the number of area and questionnaire control fields

This implementation would require modification lo the

dictionary as tell as to the error records passed to the Report

Division Another problem would be the formatting of the extra

data values in the report headings

(6) Permit the mixing of difFerent data item types in the CONTROL

commands This would allow the greatest flexibilty in choosing

control fields In order to implement this enhancement the

dictionary and the generated COBOL program (EDITOR) would

require modification

(7) Give the user access to the values in the control fields during

the editing process Either the user would be prohibited from

changing these values (as is currently done with CONCOR

internal identifiers) to prevent the creation of new questionnaires or the summary information gathered on a

questionnaire basis would become meaningless Another

ramification of this enhancement is how to handle fields defined as alphanumeric as CONCOR internally works in binary

(8) Allow input data records with a similar format to be defined by

the same DEFINE-RECORD command statement This could be

accomplished by permitting multiple values in the RECORD-TYPE

clause of the DEFINE-RECORD command A possible problem here

is in the determination of which value to be move to the output

record when the record is to be written out using the OUTPUT

command 0

(9) Implement a COMMION-DATA concept to handle items command to all

record types This could be costly in terms of core storage

and additional conversion time if a storage area is allocated

for each identifier for each record type Only permitting one

common area for all records would require validation of the

data to ensure the values were exactly the same on each record

This wJould require the CONCOR program to perform many more

internal checks when reading the data file into the store area

(10) Permit the selective inclusion of data items found in the

DEFINE-RECORD command statement into the universe count used

for the determination of tolerance levels when using the COUNT-IIPUTES command statement

(11) Increase the number and types of data format referencing now

permitted External numeric data (type N) could be expanded to

handle 18 digit numbers Also signed numeric and packed

decimal data format could be added The signed numeric format

would allow both leading blanks and negative data values This

format is essential ihen dealing with data files generated by

FORTRAN programs

0

+

G-3 47

(12) Increase the number of comparison strings permitted for alphanumeric data items This would allow for easier processing of alphanumeric strings but would require modifications to the system data dictionary

(13) If a range of valid values were specified with the declaration of a user-identifier in the DEFINE-RECORD command an automatic range check could be performed prior to CONCOR giving the user access to the data items on the questionnaire The inclusion of such values are now done by many users in the form of comment statements This enhancement would also facilitate the use of the dictionary by tabulation software

(14) The language format used in the Dictionary Division could be modified to recognize both the space and comma character as valid delimiters The user would still be required to code two commas to receive the system default value for any parameter

(15) Provide a repetition factor in the initialization of arrays

(16) Print the data dictionary name in the titles of the dictionary documentation or provide a means by which the user may specify a title to appear in the listings

112 Execution Division

(M) Implement the Run Control Section This would allow the user to increase execution speed of the EDITOR program This would be accomplished by eliminating some of the system protection features such as division by zero and out of bounds checking In this secion the user could also specify new maximum values for the EDITOR error limits

(2) Provide an internal identifier (OCCURRENCE-PTR) which would contain the occurrence number of the record currently being filtered The only problem to resolve is how to treat this id-ntifier outside of the Filter Routines

(3) Implement a CREATE commnand This command would allow the user to create a record that Was not found on the input file This implementation would require special care as it gives the user the ability to change the valme of a CONCOR internal identifier (TYPE-CGUNT) and modify the contents of the store Another consequence of this command is what action is to be taken in the calculation of tolerance in terms of this new records inclusion into the universe of observations and number of changes

iK

48 G-4

(4) The code generated in the EDITOR program for the DRECODE

command should be modified to utilized a lookup table approach

This would decrease the execution time of the command but woul C

require additional communication between the GENED and GENDD

programs

(5) Permit the user to continue the message text field over

successive lines of code

(6) Implementation of a SET command command that would change

CONCORs default occurrence pointer for the duration of the SET

command This command would be ever helpful as the validation

of the existence of a record would only have to be done once

when the SET command was encounterd The major drawback to

this command is in the users ability to remember the action

taken during the last execution of the command when the

Execution Division command statements may cover many pages of

code

(7) Implement a method of specifying different data formats for

fields on the WRITE command statement This would give greater

flexibility to the user but would require major revision to the

routines now used to process the WRITE command

(8) Provide the user with two sets of internal identifiers The

first set currently exists in the system and is valid on the

basis of the current control area (if specified) being

executed The second set would be accumulated on a run total

basis and would require the creation of new CONCOR internal

ident ifiers

(9) Implement automatic array declarations for capturing allocation

frequency distributions This could be done by the system for

discrete values (especially if point number 13 above is

implemented) but a mechanism for handling continuous values

would need to be designed nother program would then be

required to format the information gathered into a readable

form

113 Report Division

(I) Provide a means by which the user may specify their own headings or titles -for the reports

(2) Give the user the ability to override page ejection as a means

of saving paper

(3) Provide a neans by which the statistics gathered may be

presented in other aqgregations other than the area-break and

total levels now pro ided This would require another

procedure to aggregate the statistics and pass them onto the

Report Division before the printing of the reports began Thi

extra procedure is required because of program size

cons i derat ions

G-5

49

114 System Level

(1) Generation of assembler (ALC) language code for the EDITOR program to be used on the IBM 360370 series type computers

(2) Modification of the GENSRC program i utilize a lookup table instead of the linear search technioa2 now used

(3) Modification of the RDMSGTXT program to look at continuation lines when searching for the text to be supplied by the system

(4) Hlodification to the system data dictionary to better arrange and make available information needed most frequently Also to examine the possibility of making the section dealing with the message text and report generation a separate file Investigate the industry standards for dictionary format and see how CONCOR meets the requirements set forth

(5) Make the parsing and syntacial routines interactive so as to increase the product ity of the CONCOR user This does not mean to imply that the editing process itself should be made interactive but rather only the development of the user code should be put online

12 Documentation

(1)The development of a self-teaching CONCOR manual

(2) The development of a CONCOR Users Guide

G-6

0

50

2 Survey or other Census Processing

21 Software

(1) Make optional the specification of the record type and questionnaire identification information as some files are not

hierarchical in nature

(2) Provide another input file as a means of doing a check-in procedure of sample cases expected This new input file would contain the master list of those oberservations selected to be

examined

(3) Allow floating point calculations

(4) Provide a mechanism for referencing repetitive sections of questionnaire This referencing (loop letter approach) could be accomplished in the same fashion as interrecord checking is now implemented Note that by using this approach two versions of the software would be required One version would be as CONCOR is now implemented and the second would accept flat data files where the subscript indicated a repetitive section on the survey document

(5) Add a weighting scheme to allow meaningful totals and summary

statistics based upon the sampling frame used to gather the data being edited

(6) Increase the number of user-identifiers allowed in the system because of the more detailed information found on survey forms

(7) Provide a means of manual correction of erroneous data items This correction scheme would utilize the data dictionary and allow for correction based upon the user-identifer name and

questionnaire identification CELADE has a COBOL version of

this program but it has not been finished nor tested

42

APPENDIX H

CONCOR SYSTEM INTERNAL VARIABLES

0 APPENO H

CONCOR SYSTEM INTERNAL VARIABLES

1 EOF-FLAG

2 ERROR-TN-QUESTIONNAIRE

3 CURRENT-POINTER-VALUE

4 TYPE-COUNT ( )

5 RECORD-COUNT

6 QUESTIONNAIRE-COUNT

7 RECORDS-IN-STORE

8 INVALID-RECORD-TYPE-COUNT

9 INCOMPLETE-FLAG

10 CONTINUATION-FLAG

11 INVALID-RECORD-FLAG

EOF-FLAG

ERRORS-IN-QUESTIONNAIRE

TYPE-COUNT ( )

IN-RECORD-COUNT

IN-QUESTIONNAIRE-COUNT

RECORDS-IN-STORE

INVALID-RECORD-TYPE-COUNT

INCOMPLETE-FLAG

CONTINUATION-FLAG

OUT-OF-RANGE

NOT-NUMBER

BLANK

(new internal counters) OUT-RECORD-COUNT OUTPUT-QUEST-COUNT WRITE-RECORD-COUNT

Change in spelling of variable

Not mentioned--possible ommission from manual

Not implemented

Note Current documentation equivocates when some variables

are reset--most are initialized with each control area break

Though implemented as reserved identifiers due to poor documentation of (old) CONCOR exact values and utility were unknown

These apparently new counters permit access to valuable totals within control areas

Note These are not cumulative counters Also outputshyquest-count violates naming conversion eg in out

APPENDIX I

CONCOR-EDITOR EXECUTION STATISTICS

C 0 N C o R - E D I T O R EXECUT ION S-TA-T1 ST I CS RUN DATEt 517I

APPENDIX I

CONTROL AREA 0 INPUTS RELD OUTPUTS BY OUTPUTCOHMAND OUTPUTS BY W0ITECOMP4JD NO VALI 0 SEQUENCE NUMBER 00000 00 0 RECTYPES 0

OUESTIONNAIRES RECORDS 0 QUEST KEYWORD- 0 ECORDS- RECORDS FORQUFSTshy

bullDIVISION BY ZERO LINE NUMBER i18 to DIVISION BY ZERO LINE NUMBER 118 bo DIVISION BY ZERO 00 LINE NUMBER 118

Oboo DIVISION BY ZERO 00 LINE NUMBER 118 LTNE |UJMP 000 DIVISION BY ZERO 000 LINE NUMBER 118

bo DIVISION BY ZERO 0 LINE NUMBER 118 00 RUN ABORTED DUE TO DIVISION BY ZERO ERRORS 00 1it LIST IR003 = Ro3t

1 6 42

112 LIST

Comments One of the most sever criticisms 114 LIST IDIVISIION BY ZErO ERROR Cw FOLLOWS of the old CONCOR package was that ]l LET RO01 003 = O it would cease processing for no

apparent reason--ABEND without generating any messages The most )16 LIST iRO01 R003 SET TO ZEROS Pou19 Ron3 conmon types of cessations are division by zero array coordinate 117 LIST RO0 0 = errors and user specified terminations A program to deliberately 118 LET P003 = R002 RO01 test CONCORS ability to providediagnostics was written The system 119 LIST LIST R003 R002RO01 =1 R0031 perfoned exceptionally well and correctly provided the line numbers in the 120 source program which caused the data exceptions The program text has been 121 superimposed upon the Execution Statistics page for illustration purposes 122 END TEST OF OPERANDS AND MSG

123

124

1 5 PEGISTERS RESET TO ZERO

125 LET R001 RO00 R003 = nt

0 N C 1 - Z D I T 0 it - ( r S T A T I S T I C S QJi DAT 0119

I ~ - ~APPENDIX b)APPENDIX TL I A Cr14A) 01 3TUTS JRtTF -0 VtLIQ AP-A I PUT R - 3ITD JTS LY UTPUT - Y C

I)ESTIjAAIES - PECLRW cU] I S T igry-t QEC UP )S irrkqS F] ) T

VLLII= FF= 15( C 1 u U-1 Tr JJ TI A)[ 9 )i 1= TI =WATE ) L i - U = VI L IJ) L T J3r-= I n

D 1 4A 1 EItR P IXSTI J A 1 L = 7 II I fIAT E I i Y C

= y 4 I 19f CZ U AT -Z-2j 1ESTIJh14 E CJ)PI) IATE= 2 V L 1E= L T I P1 =UM 11

= rU F r

F R L- -- I JE TI rjA OrPR eTE= rUMiF=E II V1 LIIF= LIT V M

tYfZU ttj gR R QUIESTI3Ii1 j Vi r IATE= LI n l FJw== 0R V J L T C ( -pJIT E~Q - cUESTT Y2ZjI 1= g rfiflUI)IATL= VeLUJE= P LIf PP= 1 3)

bull~ - A ED 011 1 Tn CCy -tENCE C rGR DI T FOR UP S )r 0 JCCUJ0EICF FJ71 ECGPO

173

175 UNTIL ROO1 gt 10 VARY P001 FROM 1 By Comments

In this example RO01 a 2 dimensional array 177 DO with 10 rows and 10 columns is attempting

178 UNTIL R002 - 10 VARY BO FROM 1 BY 1 to update the smaller (5 X 5) TA2 Nonshy- Uexistent element references caused the error

The program text has been superimposed on this

page for illustration purposes

180 ALLOCATE NOOP = TA1 (RO0] ROO)I

JAI UPDATE TA2 (RO01 R002) RO031

182 LIST N002 = R0037 NOO R003

1P3 LIST VALUES OF TA1 = RO03v TAl (RO01 R002)

184 LIST

185 LIST ITA2=TA2 RO01 Rn02)l

l6 END-DOI LIST

187 END-DO

deg r- -- shy

APPENJDIXI1 J( j

CONTROL AREA INPUTS READ OUTPUTS BY OUTPUT LOMANO OUTPUTS TEltCOMMNDfN

SEQUENCE NUMBER 11T A QUESTIONNAIRES RECORDS QUEST KEYWORD RECORDS SFf- 0 T rr-

JOB STOPPED DUETO USER STOP-RUN COMMAND ON LINC 215 - - -- 2 15 _ _ 4$ A

EDITOR -- NO RMAL END OF J B r T I1)

214 IF IN-RECORD-COUNT gt Comments In this example when the internal variable1215 ITHEN STOP-RUN END-THEN was than

215 TE END-THEN IN-RECORD-COUNT was greater than 500 processing was directed by the source COBOL CONCOR language

216 1 program to cease The program text has been - upon this page for illustrationI _-EC superimposed

S IF IN-RECORD-COUNT- lt 2 - - purposes ~~~~~~~~~~- bull - - ------I-

218~ THEN -LIST lt(THAN 20RECS INR

amp-~~~~~ ntai-beDcent

I

APPENDIX J

DIAGNOSTIC MEFSAGE GUIDE EXAMPLE

APPENDIX J DD-2

WARNING(DD-001) BLANK COMMAND STATEMENT LINE

EXPLANATION A blank command statement line was found in the user Dictionary Division command statement file

ACTION TAKEN Parsing continued with the next command statement line in the Dictionary Division command file

USER RESPONSE Put a comment indicator () in column 1 of the command statement line if spacing is desired if not remove the command statement line

ERROR (DD-002) ILLEGAL FIRST CHARACTER IN STRING

EXPLANATION A character not in the CONCOR legal character set (A-Z 0-9 - + the comma character () the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found as the first character in the string

ACTION TAKEN Parsing began again with the next string

USER RESPONSE Check for possible keying errors and ensure that the first character is a member of the valid CONCOR

character set

ERROR (DD-003) END OF LINE REACHED WITH NO DELIMITER FOR ILLEGAL STRING

EXPLANATION A string starting with a character not valid in the CONCOR legal character set (A-Z 0-9 - + the comma character ()p the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found without one of the delimiter characters ( =) when the end of the command line was reached

ACTION TAKEN Parsing began again with the next user command statement line in the Dictionary Division command file

USER RESPONSE Check for possible keying errors and ensure that each of the characters in the string is a member of the CONCOR valid character set (A-Z 0-9 - + the comma character () he delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ())

ERROR (DD-004) COMMENT INDICATOR () IN STRING

EXPLANATION The comment indicator character () was found embedded in a string

Page 13: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …

9

and stored as an object module on the system no other compilations should berequired for questionnaire processing files of the same type Theoreticallya single well-written CONCOR program is all that would be required to process an entire census run

Appendix H contrasts the internal identifiers of the old and new languageversions Without such identifiers a user would have little information about the status of input as it is processed by EDITOR As noted in theappendix most internal pointers are reset upon each break in the CONTROL-AREAprovided a CONTROL-AREA has been defined The limitation here is that there are obvious instances when the termination in the processing mode would beadvantageous based on run counts although a CONTROL-AREA has been specifiedeg debugging CONCOR programs or comparing input files Therefore another set of pointers should be implemented for this purpose and made available for programmer reference

One clearly disturbing development which needs to be pursued during inshydepth testing of the system concerns the MAX-STORAGE parameters of the DEFINE-RECORD statement As shown in the figure on the following page when MAX-STORAGE was set equal to the maximum value a COBOL program was generated whichrequired 1O00K of core to run The MAX-STORAGE value of 999 is clearly notrealistic under most processing circumstances This example drives home severalimportant points about CONCOR The core requiremenis of CONCOR generated proshygrams can be influenced significantly by the amount or nature of programmerspecified I0 operations In fact it is possible to generate a program of a size most foreign country machines could not process It is recommended that tests determine a realistic max-value restriction for implementation to prevent problems in this area

The final area of recommended modification concerns the newly implementedREPORT-DIVISION The purpose of the REPORT-DIVISION is to enable a user todescribe or specify certain CONCOR language statements which will generatestatistical reports These reports contain statistics generated by EDITOR as specified by the GENERATE-EDIT-STATISTICS command of the EXECUTION4-DIVISIONAll of the reports produced are organized according to the data fields definedby the AREA-CONTROL command of the DATA-DICTIONARY If the AREA-CONTROL command is not defined in the DATA-DICTIONARY then all the statistics aresummarized at the total run level If a control area field is defined then allstatistics will be summarized for each unique CONTROL AREA as encountered bythe EDITCR program on the input file Statistics by total run level will notbe available This in part relates back to previous discussions citing theneed for new internal identifiers Report listings may contain the values ofentire records or entire questionnaires depending upon the keyword used inthe report generation commands The problem centers upon the homogeneity of CONCOR printouts during a production run

It is virtually impossible to distinguish reports on the basis of thevolumes they were run against Some means should be provided to allow users touniquely and purposefully label the reports generated in this division Indeedthe whole name REPORT-DIVISION suggests that such a command is implicit andappropriate Such a LABEL-REPORT or REPORT-FILE command along with file inforshymation from the system should not be difficult to implement

FIGURE 3

C O N C o R PAGE

SC0 N S I S T E N C Y A N D C-0 R R E C t I ON ) RUN 6AE 0hi88n

E D I T A N D I M P U T ATI 0 N S Y S T EM

USER DICTIONARY DIVISION-SOURCE LISTING

LINE NUMBER

7070

71

72 DEFINE-RECORD RECORDNAME= HOUSING-RECORDi 72

73 MAX-STORAGE=- 999 73

74 RLrRD-TYPE 11t NOTE AN LITERAL 74

267

71

P6T

268 DEFINE-RECORD RECORD-NAME= POPULATION-RECORD p68

P69MAX-STORAGE= 999269

270RECORD-TYPE= F1270

IEF2R5I JES2JOB04966SO0114 SYSOUT iEF2851 SYS8nO18T095553RAOOOAMCP12ARO000005 PASSED

= IEF2851 VOL qFP NOS TmPOn2 IEF285I wHKHDGMCONCORP12AERRORRECS KEPT

= IEF2851 VOL SEP NOS TMPO03 IEF3731 STEP LOAD START 800181001

IEF3741 STEP LOAD STOP 800181002 CPU OMIN 0165SEC SRR flMIN O25SEC VIRT iO00K SYS 3

STEP START TIME - 100155 STEP END TIME - 100257 STEP CPU TIME - 165 SECONDS STEP COMPLETION CODE - C 0

O TAPES - 0 DISKS - 0 PEGION (REQUESTED - 1000K USED - 1000K) IO COUNT 211

CARDS READ - 0 CARDS PUNCHED - 0 LINES PRINTE6 - 22

STEPNAME - LOAD CHARGE (MACHINE UNITS - 1198 DOLLARS- S-77 5TONUM9E 1

11

Concluding Remarks of System Modifications

Version 1 of the COBOL CONCOR language was notorious for abending without the generation of diagnostic messages Appendix I sets forth a test of some of these historical problems In eac) instance the system provided messagesand source program line references wLch enabled immediate identification of the processing difficulties However tests of the various commands revealed the general high performance character of the language in its newest version Under these circumstances the recommended modifica-ions should be considered as ones which will uniformly allow CONCOR to realize t-r potential of its design enhance its utility and moreover endure as a software product

12

IV THE ADEQUACY OF COBOL CONCOR DOCUMENTATION

The current documentation of CONCOR consists of a newly developed relativelyvoluminous systems manual a Diagnostic Message Guide and some loose-leaf inshystructions concerning the installation of the system on IBM computers Of these three documents the Diagnostic Message Guide is clearly complete and adequateand could bear no changes except to provide message number tabs to enable a programmer to more quickly locate the text of a message number An example of the superior format of this guide appears as Appendix J

Conspicuously absent from the list of systems documents is a Users Guide as originally specified in the ISPC enhancement proposal

A thorough assessment of the Systems Manual has been undertaken and indishycates with some reorganization and editing it could be made a more useful and usable document as itis well constituted informationally In its current form the Systems Manual can be best described as a working document a document which was not intended to teach the CONCOR language but centered upon an audishyence already familiar with previous CONCOR releases Further it is not asystems manual per se as some of its components and appendixes could more appropriately compose a users guide For example excerpts from the POPSTAN publications which review the basic editing concepts and contain the POPSTAN example problem should be deleted from the systems manual and would be more contextually effective in the Users Guide The EXECUTION-DIVISION chapter of the Systems Manual would especially benefit from general editing and reorganishyzation (Specifically the contents of pages 90-92 should be moved to page 63 -shythese pages explain the placement and purpose of the three permissible routines of the CONCOR language Had the author been granted more time he could have outlined additional specific modifications to bring existing documentation to a level of adequacy) However some guidelines stand out

1 Installation instruction and considerations should compose a chapter of the Systems Manual rather than a separate document (Hand-out materials covering installation were not adequate) Further a documentation scheme setting forth how CONCOR was installed on each computer and any modifications required during installation could be serially added to and become a permanent part of the installations systems manual If CONCOR is going to be used and if agencies are going to actively support it as a package installation limitations must be known and well-documented The prescribed format for this documentation should be set forth and any modifishycations to CONCOR should be continuously updated on these shcets so that a history of changes to the language will be readily accessible to supporting agencies If CONCOR is to survive as a standardized system editing results must be capable of being replicated among installations Such a form could contain other information as well concerning the tyce of computer amount of care and nature of utilities supporting

13

users Upon installation a copy of this form could be

sent to the US agency which will ultimately be responsible

for supporting the CONCOR package

an appendix2 A complete COBOL CONCOR program should appear in

for reference

3 The development of the Users Guide should include an intensive

review of the editing concepts involved in processing census

data files beyond the POPSTAN materials

4 An explanation of the CONCOR benchmark program syould appear

in the Users Guide and the Systems Manual The running of a

supplied benchmark program should be a standard installation

protocol used to test all operational aspects of a new

installation

This tool which has5 The development of a CONCOR pocket card been proven useful in teaching and assisting programmers in

utilizing programming language lays out all commands options on

a single small card An example of such a pocket card is the

Such a small document is useful in clari-IBM Assembler Card fying the language and setting forth structural rules without

continual reference to full-size manuals

14

V CONCLUSION

In a situation of extreme need CONCOR could probably be utilized as it seems capable of performing its editing functiens Its usefulness as a data

cleaning tool is generally undisputed The importance of having CONCOR exshyhaustively tested by an independent agency cannot be over-emphasized in light of prior failures of the system to perform adequately Additionally core requirements and processing speed should be precisely determined One must be encouraged in the amount of progress which has been made in a relatively short amount of time towards the completion of the CONCOR package While it is clearly beyond the scope of this report to estimate the cost and the amount of time required to implement the herein outlined highly desirable changes to the CONCOR language and documentation and then exhaustively test the system it is believed that this process could be completed within a single quarter (The changes are of the nature that no structural redefinishytions of a systematic nature would be required) These enhancements would make the COBOL CONCOR package more complete consistent and reliable in the field ISPC has competently moved the system to the point that its comshypletion iswithin reach

Given the probable revision of the current Systems Reference Manual and the development of a suitable Users Guide document it is likely that this version of CONCOR could be taught to experienced programmers (unfamiliar with previous versions of the language) in as little as four weeks And while one is impressed by the potential power and overall simplicity of the COBOL CONCOR language it is difficult to envision individuals with no real computer programming experience utilizing its full capabilities Hence the importance of including a review of basic editing concepts in the documentation is undershylined

Further as documentation is often critical to the perceived ease of use of computer-based systems itwill be essential to have these various manuals examined independently prior to their general use

As previously mentioned Appendix G represents features of which the ISPC feels could have positive impact on CONCOR Depending upon the outcome of the testing recommended in previous chapters and the availability of funding supporting agencies may desire to look again at an assembler (ALC) version of the language

Overall COBOL CONCOR remains an intermediate product yet to realize its potential which is thought considerable Software development of this nature is characteristically of a long-term evolutionary design In light of this reality agencies should be prepared to evaluate such projects success or failure in terms other than that of immediate utility It is the philosophy underlying the system not the system itself which will ultimately be exported

APPENDIX A

Bucen Enforcement Proposal

APPENDIX A

BUCEN IN-HOUSE REVIEW ENHANCEMENT PROPOSAL

1 Easy to use interrecord referencing

2 Improved output file capabilities

A provide overflow protection on WRITE command

B provide OUTPUT command to create corrected questionnaire file that matches IP file data dictionary

3 Improvededit statistics reported (LISTERR)

A provide automatic (user-specified) area break

B provide options for compilation and displaying edit statistics at various levels

C provide automatic (user-specified) tolerance checking of error rates by area

D automatically capture IDs of areas failing tolerance check

4 Clean up known bugs in code

5 Comprehensive testing

6 Clean up and enhance documentation

A reference manual more examples error message guide

B installation guide

C systems manual

APPENDIX B

EVALUATIVE CRITERIA

APPEiUIX B [4 si 11 T q tII

UNITED STATES GOVERNMENshy

Pfern0 KGa 7 dll TO DSPOPFPSD Robert H Haladay

DATE December 3 1979

DSPOPDEIO Liliane Floge

SUBJECT Criteria for Evaluation of SuCen COBOL CONCOR September 30thVersion as presented at January 1930 orkshop

The following points should be considered when evaluatingCONCOR the COBOLediting system during participation at the January 19380 workshyshop

1 Are the Installation anual the System ilanual and theUsers anual ready for use Are these manuals adequate fortheir purposes Are the manuals clear and consistentthey be understood by subject-matter personnel Can

as well program ers

as

2 Is the editing system capable of impleaenting all thefeatures as described in the documenit_tion 3 Does the system appc3Ar to be ely debugged andtested and ready for release to dLvlooing conris 4 Does the system appear to be fast enough to edit a

census in a reasonable airount of time 5 hat size core does tine sys t- equire

6 CorFm-ent in general on coole te-nss usefulness andefficiency of ir-nease usethe n of by developingcountry prograrrers and other c nsus personnel

cc Su-a-- e Olds APiA SDi OrWJulio Ortuizar luotolaCelta Systems

APPENDIX C

WORKSHOP ITINERARY

APPENDIX C

CONCOR Workshop Schedule January 7-18 1980

U S Bureau of the Census International Statistical Programs Center

Scuderi Building Room 209 4235 28th Avenue Marlow Heights Maryland

Monday January 7

930 am shy 1000 Welcoming Remarks

Overview of Workshop

1000 - 1000 Introduction to CONCOR - Purpose and function

- History of development - General computer

requirements

115 - 200 System Description

-Constraints in design of CONCOR

-Basic subsystems of CONCOR

-User interactions with system

-Examples of outputs produced

1030 shy 1045 Break

1045 - 1200 Editing Concepts - Ways to interrogate

data - Ways to correct data

- Editing housing and

population data - POPSTAN

- Advantages of CONCOR

1200 shy 115 pm Break

200 - 230

230 - 245

245 - 325

User Program Organization -Divisions - Sections

- Routines - Commands

Break

Command Language

Description -Types of statements -Format -Syntax

is

2

Tuesday January 8

Dictionary Division Command Statements

115 pm-2 15 Input-Record-Section930 am-10

30 Punctuation - Define-RecordInput data referencing

Working-Data-SectionWorking data declara-- New-Datations - Array-DataInternal data represen-

tation and storage Break215 - 230

1030 - 1045 Break 230 - 325 Dictionary Examples

- Minimum dictionary1045 - 1200 Dictionary-Attributes-

structureSection - Maximum dictionary- Dictionary-Name

structure

- Hand out dictionaryFile-Section problem- Input-File

- Output-File - Write-File

- Error-File

1200 - 115 pm Break

Wednesday January 9

115 pm-2 1 5 Execution Division Command

930 am-1030 Discussion of Partici-Statements (continued)pants Dictionary

- Routines of Edit-Specishyproblems fications-Section

Free work time -Prolog-Routine -Filter-Routine -Epilog-Routine

- Types and functions 1030 - 1045 Break

of edit specification

commands1045 - 1200 Execution Division

Command Statements - Range- Punctuation - Assert- Subscripting

- Internal Identifiers - Report-Control-Section

-Generate-Edit-Statistics 215 - 230 Break-Count-Imputes

-Examples 230 - 325 - PassFail clauses

- List1200 - 115 pm Break

3

Thursday January 10

930 am-1030 Discussion of Problems - If115 pm-215

Free work time - UntilExit - Stop

1030 - 1045 Break 215 - 230 Break

1045 shy 1200 Edit Specification Command Statements 230 - 325 - Drecode

(continued) - Grecode

- Allocate - Update - Let

1200 - 115pm Break

Friday January 11

930 am-1030 Edit Specification 115 pm-3 25 Hand out Edit Specishyfications as problemCommand Statements

(continued) Free work time

- Output - Write

1030 - 1045 Break

1045 - 1200 Report Division Command Statements - Display-Control-

Section -Display-Edit-Statistics

- Tolerance-Control-Section -Error-Rate-Check -Reject-File

-Report Examples

1200 - 115 pm Break

I

29

Monday January 14

930 am-1030 Discuss procedures for running problems on computer

1030-1045 Break

1045-1200 Component Programs of the CONCOR system

1200- 115 pm Break

Tuesday January 15

930 am - 325 pm Free work time

Wednesday January 16

930 am 1200 Free work time

1200- 115 pm Break

115 pm-215 How to Install CONCOR on IBM 360370 OS

215- 230 Break

230-325 Free work time

Thursday January 17

930 am-325 Free work time

115 pm-215 Future CONCOR Design Considerations - for census processing - for survey processing

- manual correction system

215- 230 Break

230 - 245 Evaluation Guidelines

- Hand out evaluation forms

245 - 325 Free work time

Friday January 18

930 am-1030 Free work time 115-325 Free work time

1030 - 1045 Break

1045 - 1200 Group Discussion on CONCOR - submission of written evaluations Distribution of IBM installation tapes Presentation of Certificates to Participants

1200-115 pm Break

APPENDIX D

PARTICIPANTS

APPENDIX D

CONCOR WORKSHOP January 7 - 18 1980 Marlow Heights MD

PARTICIPANTS

KENYA

James Midianga Government Computer Centre Central Bureau of Statistics PO Box 30266 Nairobi Kenya

PANAMA

Omar Rivera Rios Data Processing Specialist Contraloria General de la Republica de Panama Apartado Postal 5213 Panama 5 Panama

PHILIPPINES Guida T Capellan OIC Computer Programming Division National Census and Statistics Office Solicarel Bldg 1 Magsaysay Blvd

Sta Mesa Manila Philippines

THAILAND

Angsumal Sunalai National Statistical Office Bangkok 1 Thailand

EGYPT

Farag Sedky Mourad Ghaleb CAPMAS (Centeral Agency for Public

Mobilization and Statistics)NASR City Cairo Egypt

L

2 31

SAUDI ARABIA

Shargi R Al-Shargi National Computer Center PO Box 2534 Riyadh Saudi Arabia

Ahmed S Al-Ghamdi Dept Director of National Computer Center PO Box 2534 Riyadh Saudi Arabia

OTHER

Robert W ONeal USREPJECOR APO New York NY 09038

John N Adams Data Processing Advisor DACCA Department of State Washington DC 20520

OR co UNDP PO Box 224 Dacca Bangladesh

Joe Quasney US Census Bureau Washington DC 20233

Howard Brunsman 5715 N Ninth St Arlington VA 22205

Larry Shiller and Julio Ortuzar Delta Systems Consultants Inc 264 Alhambra Circle Coral Cables FL 33134

Nicholas Ourusoff Burpee Hill Road New London NH 03257 (United Nations)

Frederic J Grant IV Systems Development GeorgiaWorld Congress Institute 580 North Omni International Atlanta Georgia 30303

3 32

Rafael Samper Mary Jo Keenan Catherine B Gleason LACDR Room 2246 NS Washington DC 20523

John Marshall AIDSERDMPSE Rm 706C SA-12 Washington DC 20523

Susanne Bacon and Mary K Friday ISPCData Processing US Census Bureau Washington DC 20233

Leo Dougherty ISPCWorld Census Staff US Bureau of the Census Washington DC 20233

33

CON C OR WORKSHOP

January 7-18 1979 Washington DC

Staff

Robert R Bair

Luis Garcia

David Malkovsky

Sandra Mansfield

Selma Sawaya

Vivian Toro

APPENDIX E

CONCOR EVALUATION FORM

0

APPENDIX E

CONCOR WORKSHOP January 16 1980

BASIC COBOL VERSION 2

December 1979 Release

Participant Evaluation of Package

1 What is your impression of this version of CONCOR in terms of its usefulness and reliability for editing housing and population census data in less developed countries

2 If you are familiar with previous versions of CONCOR how does this current version compare to them

2D

35

2

3 How would you compare the utility of this version of CONCOR for less developed countries with other systems for editing data of which you are familiar

4 How well do you feel the documentation (Reference Manual and Diagnostic Messages Guide) supports the software

36

3

5 How well do you think this version of CONCOR would serve for editing other types of statistical censuses or surveys

6 If your organization does statistical data processing by computer would you recommend to your superiors that this software be used to edit the data Do you feel assistance would be required in the installation of this software on your organizations computer andor in training userR on the packages applicability to their work

37

7 In what way(s) could improvements or enhancements be made to this version

of the CONCOR software andor its documentation

8 Other comments

1shy

APPENDIX F

NEWOLD COMMAND COMPARISONS

0 0

OLD CONCOR VERSION 1 DECEMBER 1978

DATA-DICTIONARY

DEFINE INPUT-FILE

DEFINE ERROR-FILE

DEFINE COMMON-DATA

(contains the questionnaire-IDand record type location information)

DEFINE RECORD-TYPE=

DEFINE NEW-DATA

DEFINE ARRAY-DATA

APPENDIX F

NEW CONCOR VERSION 2 DECEMBER 1979

DICTIONARY-DIVISION

DICTIONARY-ATTRIBUTES-SECTION

DICTIONARY-NAME

FILE-SECTION

INPUT-FILE OUTPUT-FILE WRITE-FILE ERROR-FILE

DOENTIFICATION-CONTROL-SECTION

AREA-CONTROL

QUESTIONNAIRE-CONTROL

RECORD-CONTROL

INPUT-RECORD-SECTION

DEFINE-RECORD

WORKING-DATA-SECTION

NEW-DATA ARRAY-DATA

END-DiVISION

COMMENTS

In Version 1 all command keyword labels had to begin in column 1 ofthe coding sheet The position notation of Version 2 permits thecommand to begin in columns 1-72

The period preceding a statement

signifies that it is but a commentcard Accordingly it should be noted that while the END-DIVISION command must be present the division name identifier DICTIONARY-DIVISION

has not been implemented

Note Some parameters for each commandhave been altered even where general correspondence exists

OLD CONCOR VERSION 1 DECEMBER 1978

EDIT-SPECIFICATION

PROLOG

FILTER TYPE ( )

EPILOG

ALLOCATE (ALL) ASSERT (AST)

END ERROR

FILTER TYPE ( ) WITH

IF LET

RANGE (RNG) RECODE (REC) STOP --keyword

UPDATE (UPD) WRITE (WRT) XRECODE (XREC)

NEW CONCOR VERSION 2 DECEMBER 1979

EXECUTION-DIVISION

REPORT-CONTROL-SECTION

COUNT-IMPUTES GENERATE-EDIT-STATISTICS

EDIT-SPECIFICATION-SECTION

PROLOG-ROUTINE

FILTER-ROUTINE (name-of-record-type)

EPILOG-ROUTINE

ALLOCATE (ALLOC) ASSERT (ASRT) DRECODE (DRCD) END-DIVISION

EXIT

GRECODE (GRCD) IF LET LIST OUTPUT RANGE (RNG)

STOP UNTIL UPDATE (UPD) WRITE (WRT)

END-DIVISION

COMMENTS

Periods preceding division and section names signify that they are treated as comments by CONCOR

END-DIVISION signifies the termination of the CONCOR division organization and must always be present even though division identifiers have not been implemented

This figure additionally permits a comparison of the EDIT-SPECI-FICATION commands of both CONCOR language versions Note the changes in abbreviations and that some keyword options (not shown) have been altered even where general corresponshydence exists Individual commands are discussed on the following pages

CONCOR LANGUAGE COMMAND STATEMENTS

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

ALLOCATE (ALL) ALLOCATE (ALLOC) Allows for the assignment of values and with the UPDATE command providesthe imputation mechanism Major difference is this statement now providesfor a receiving-identifier list and statistics are generated as coded inthe GENERATE-EDIT-STATISTICS command option of the EXECUTION-DIVISION

ASSERT fAST) ASSERT (ASRT) The ASSERT command is the basic consistency editor command of CONCOR Command now provides for a NOERROR and MESSAGE option Additionallyby utilizinq the DUMPR DUMPO keywords provides the user with the option of using all the current value of all user-identifiers defined for the record type being processed Other changes to this command include the pass end-pass fail end-fail constructions for the P purpose of maintaining structured logic

(see RECODE) DRECODE (DRCD) Provides a method for recoding data items having a starting value of zero and continue in ascending sequence Replaces t previous XRECODE command

EXIT Provides the means to terminate EXECUTION (ESCAPE) of command statements within the UNTIL command

END END-IF END-THEN

A solitary END command is no lonqer permitted in conditional construction Note the similarity to COBOL pseudocode

END-ELSE

ERROR not implemented No longer supported in language

(continued)

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

FILTER TYPE ( ) WITH not implemented No longer are option statements supported in language This command once specified essentially a GOTO depending on construction

(see XRECODE) GRECODE (GRCD) This new command provides a method for the group recoding of input values

IF IF The IF statement has properties found virtually in all programming languages ie provides a means of evaluating a condition and controlling the execution of alternative logical statements Similar to pseudocode a CONCOR programmer must utilize the structured command IF THEN END-THEN ELSE END-ELSE construction in place of the IF THEN END ELSE END statements acceptable to the 1978 version

LET LET Virtually unchanged

LIST New command which allows for the generation of specific messages to be displayed in the report of edit statistics by questionnaire as well as specific individual identifiers

OUTPUT New command which provides the means by which records will be written to the output-file

(continuej)

4)bull

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 VERSION 1

RANGE (RNG)

RECODE (REC)

STOP- keyword

UPDATE (UPD)

(continued)

DECEMBER 1979 VERSION 2

RANGE (RNG)

name changed

STOP- keyword

UNTIL DO END-DO

UPDATE (UPD)

COMMENTS

As a basic editing command of CONCOR the December 1979 version permits the listing of the out-of-range values as well as the entire record or questionnaire through DUMPR and DUMPO optionswithin the command Another new option of this command involves the utilization of a PASS END-PASS FAIL END-FAIL construction permitting the specification of logicalcommand paths depending upon the outcome of the range test

Name changed to DRECODE Virtually identical with the DRC command in the COCENTS system

Unchanged

DO-END-DO one of the most significant and powerful enhancements to the CONCOR language this command provides a loopingcapability similar to most other high-level pogramminglanguages The number of iterations is under user control as well as the value initially assigned to the user identifier (counter) which is incremented with each successive execution of the command statements

Virtually unchanged it is the basic mechanism for maintaining arrays involved in imputation processes

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

WRITE (WRT) WRITE (WRT) The WRITE language statement while once utilized as the primary output command is now used to create an auxiliary or derivative file Statistics may be accumulated with the specification of the GENERATE-EDIT-STATISTICS option of the EXECUTION-DIVISION (See WRITE-FILE of DICTIONARY-DIVISION)

XRECODE (XREC) name changed Old command which has become the GRECODE command in the December 1979 version

0 0 0

DECEMBER 1978 VERSION 1

DECEMBER 1979 VERSION 2

Not implemented REPORT-DIViSION

DISPLAY-CONTROL-SECTION

DISPLAY-EDIT-STATISTICS

TOLERANCE-CONTROL-SECTION

ERROR-RATE-CHECK REJECT-FILE

END-DIVISION

COMMENTS

This new division (note division and section names treated as comments) provides the means to controlthe type of edit statistics reports to be generated Generally all reports produced by the commands ofthis division are organized according to the AREA-CONTROL command of the DICTIONARY-DIVISION Notethat this means the input data file must be preproshycessed (presorted external to CONCOR) on the basisof the AREA-CONTROL data field as CONCOR is incapableof merging uncontiguous records

There is currently no command to enable users to specify unique report headings on the listingfrom this division

APPENDIX G

ISPC FUTURE ENHANCEMENT LIST

G-1

Appendix G

Future Versions of CONCOR Design Considerations

The present version of the CONCOR system (Basic COBOL Version 2) was designed to provide adequate computing power to properly edit housing and population census da a using automatic correction techniquns As a result CONCOR should not be construed to be the ultimate software package for generalized data editing The developers of this version of CONCOR realize that there is a group of improvements that can be made to the system which would facilitate the census data editing process A second group of changes or enhancements also exist that could facilitate the editing of survey or other types of census data It should be noted that some of the modifications to meet these two objectives at times are mutually exclusive Some of the possible modifications and considerations are outlined below

1 Housing and Population Census Processing

1 1 Software

1111 Dictionary Division

(1) Implementation of the Dictionary Attributes Section This would allow the user to specify the dictionary size (number of pages) on disk and control other file characteristics such as volume specification and retention periods if applicable

(2) Addition of an optional input andor output file(s) which could be used to initialize and save values stored in arrays and used during the allocation process This feature would require additional syntax analysis and the creation of new commands to perform the movement of the data values

(3) An output record format area (Output Record Section) could be added This would allow the creation of edited output files in a different format and length than the file read into CONCOR This implementation would require other enhanceients to provide a simple means of moving values from th-iT i tato tile output file Unique naming of user-identifiers would also need to be addressed Another possiblity is the declaration of both the input and output location in the same command when specifying the contents of tile data record to be edited

(4) Addition of user-identifier naires to fields used in the controlling of summary statistics and unique questionnaire determination This would require modification to the system level data dictionary

46 G-2

(5) Increasing the number of area and questionnaire control fields

This implementation would require modification lo the

dictionary as tell as to the error records passed to the Report

Division Another problem would be the formatting of the extra

data values in the report headings

(6) Permit the mixing of difFerent data item types in the CONTROL

commands This would allow the greatest flexibilty in choosing

control fields In order to implement this enhancement the

dictionary and the generated COBOL program (EDITOR) would

require modification

(7) Give the user access to the values in the control fields during

the editing process Either the user would be prohibited from

changing these values (as is currently done with CONCOR

internal identifiers) to prevent the creation of new questionnaires or the summary information gathered on a

questionnaire basis would become meaningless Another

ramification of this enhancement is how to handle fields defined as alphanumeric as CONCOR internally works in binary

(8) Allow input data records with a similar format to be defined by

the same DEFINE-RECORD command statement This could be

accomplished by permitting multiple values in the RECORD-TYPE

clause of the DEFINE-RECORD command A possible problem here

is in the determination of which value to be move to the output

record when the record is to be written out using the OUTPUT

command 0

(9) Implement a COMMION-DATA concept to handle items command to all

record types This could be costly in terms of core storage

and additional conversion time if a storage area is allocated

for each identifier for each record type Only permitting one

common area for all records would require validation of the

data to ensure the values were exactly the same on each record

This wJould require the CONCOR program to perform many more

internal checks when reading the data file into the store area

(10) Permit the selective inclusion of data items found in the

DEFINE-RECORD command statement into the universe count used

for the determination of tolerance levels when using the COUNT-IIPUTES command statement

(11) Increase the number and types of data format referencing now

permitted External numeric data (type N) could be expanded to

handle 18 digit numbers Also signed numeric and packed

decimal data format could be added The signed numeric format

would allow both leading blanks and negative data values This

format is essential ihen dealing with data files generated by

FORTRAN programs

0

+

G-3 47

(12) Increase the number of comparison strings permitted for alphanumeric data items This would allow for easier processing of alphanumeric strings but would require modifications to the system data dictionary

(13) If a range of valid values were specified with the declaration of a user-identifier in the DEFINE-RECORD command an automatic range check could be performed prior to CONCOR giving the user access to the data items on the questionnaire The inclusion of such values are now done by many users in the form of comment statements This enhancement would also facilitate the use of the dictionary by tabulation software

(14) The language format used in the Dictionary Division could be modified to recognize both the space and comma character as valid delimiters The user would still be required to code two commas to receive the system default value for any parameter

(15) Provide a repetition factor in the initialization of arrays

(16) Print the data dictionary name in the titles of the dictionary documentation or provide a means by which the user may specify a title to appear in the listings

112 Execution Division

(M) Implement the Run Control Section This would allow the user to increase execution speed of the EDITOR program This would be accomplished by eliminating some of the system protection features such as division by zero and out of bounds checking In this secion the user could also specify new maximum values for the EDITOR error limits

(2) Provide an internal identifier (OCCURRENCE-PTR) which would contain the occurrence number of the record currently being filtered The only problem to resolve is how to treat this id-ntifier outside of the Filter Routines

(3) Implement a CREATE commnand This command would allow the user to create a record that Was not found on the input file This implementation would require special care as it gives the user the ability to change the valme of a CONCOR internal identifier (TYPE-CGUNT) and modify the contents of the store Another consequence of this command is what action is to be taken in the calculation of tolerance in terms of this new records inclusion into the universe of observations and number of changes

iK

48 G-4

(4) The code generated in the EDITOR program for the DRECODE

command should be modified to utilized a lookup table approach

This would decrease the execution time of the command but woul C

require additional communication between the GENED and GENDD

programs

(5) Permit the user to continue the message text field over

successive lines of code

(6) Implementation of a SET command command that would change

CONCORs default occurrence pointer for the duration of the SET

command This command would be ever helpful as the validation

of the existence of a record would only have to be done once

when the SET command was encounterd The major drawback to

this command is in the users ability to remember the action

taken during the last execution of the command when the

Execution Division command statements may cover many pages of

code

(7) Implement a method of specifying different data formats for

fields on the WRITE command statement This would give greater

flexibility to the user but would require major revision to the

routines now used to process the WRITE command

(8) Provide the user with two sets of internal identifiers The

first set currently exists in the system and is valid on the

basis of the current control area (if specified) being

executed The second set would be accumulated on a run total

basis and would require the creation of new CONCOR internal

ident ifiers

(9) Implement automatic array declarations for capturing allocation

frequency distributions This could be done by the system for

discrete values (especially if point number 13 above is

implemented) but a mechanism for handling continuous values

would need to be designed nother program would then be

required to format the information gathered into a readable

form

113 Report Division

(I) Provide a means by which the user may specify their own headings or titles -for the reports

(2) Give the user the ability to override page ejection as a means

of saving paper

(3) Provide a neans by which the statistics gathered may be

presented in other aqgregations other than the area-break and

total levels now pro ided This would require another

procedure to aggregate the statistics and pass them onto the

Report Division before the printing of the reports began Thi

extra procedure is required because of program size

cons i derat ions

G-5

49

114 System Level

(1) Generation of assembler (ALC) language code for the EDITOR program to be used on the IBM 360370 series type computers

(2) Modification of the GENSRC program i utilize a lookup table instead of the linear search technioa2 now used

(3) Modification of the RDMSGTXT program to look at continuation lines when searching for the text to be supplied by the system

(4) Hlodification to the system data dictionary to better arrange and make available information needed most frequently Also to examine the possibility of making the section dealing with the message text and report generation a separate file Investigate the industry standards for dictionary format and see how CONCOR meets the requirements set forth

(5) Make the parsing and syntacial routines interactive so as to increase the product ity of the CONCOR user This does not mean to imply that the editing process itself should be made interactive but rather only the development of the user code should be put online

12 Documentation

(1)The development of a self-teaching CONCOR manual

(2) The development of a CONCOR Users Guide

G-6

0

50

2 Survey or other Census Processing

21 Software

(1) Make optional the specification of the record type and questionnaire identification information as some files are not

hierarchical in nature

(2) Provide another input file as a means of doing a check-in procedure of sample cases expected This new input file would contain the master list of those oberservations selected to be

examined

(3) Allow floating point calculations

(4) Provide a mechanism for referencing repetitive sections of questionnaire This referencing (loop letter approach) could be accomplished in the same fashion as interrecord checking is now implemented Note that by using this approach two versions of the software would be required One version would be as CONCOR is now implemented and the second would accept flat data files where the subscript indicated a repetitive section on the survey document

(5) Add a weighting scheme to allow meaningful totals and summary

statistics based upon the sampling frame used to gather the data being edited

(6) Increase the number of user-identifiers allowed in the system because of the more detailed information found on survey forms

(7) Provide a means of manual correction of erroneous data items This correction scheme would utilize the data dictionary and allow for correction based upon the user-identifer name and

questionnaire identification CELADE has a COBOL version of

this program but it has not been finished nor tested

42

APPENDIX H

CONCOR SYSTEM INTERNAL VARIABLES

0 APPENO H

CONCOR SYSTEM INTERNAL VARIABLES

1 EOF-FLAG

2 ERROR-TN-QUESTIONNAIRE

3 CURRENT-POINTER-VALUE

4 TYPE-COUNT ( )

5 RECORD-COUNT

6 QUESTIONNAIRE-COUNT

7 RECORDS-IN-STORE

8 INVALID-RECORD-TYPE-COUNT

9 INCOMPLETE-FLAG

10 CONTINUATION-FLAG

11 INVALID-RECORD-FLAG

EOF-FLAG

ERRORS-IN-QUESTIONNAIRE

TYPE-COUNT ( )

IN-RECORD-COUNT

IN-QUESTIONNAIRE-COUNT

RECORDS-IN-STORE

INVALID-RECORD-TYPE-COUNT

INCOMPLETE-FLAG

CONTINUATION-FLAG

OUT-OF-RANGE

NOT-NUMBER

BLANK

(new internal counters) OUT-RECORD-COUNT OUTPUT-QUEST-COUNT WRITE-RECORD-COUNT

Change in spelling of variable

Not mentioned--possible ommission from manual

Not implemented

Note Current documentation equivocates when some variables

are reset--most are initialized with each control area break

Though implemented as reserved identifiers due to poor documentation of (old) CONCOR exact values and utility were unknown

These apparently new counters permit access to valuable totals within control areas

Note These are not cumulative counters Also outputshyquest-count violates naming conversion eg in out

APPENDIX I

CONCOR-EDITOR EXECUTION STATISTICS

C 0 N C o R - E D I T O R EXECUT ION S-TA-T1 ST I CS RUN DATEt 517I

APPENDIX I

CONTROL AREA 0 INPUTS RELD OUTPUTS BY OUTPUTCOHMAND OUTPUTS BY W0ITECOMP4JD NO VALI 0 SEQUENCE NUMBER 00000 00 0 RECTYPES 0

OUESTIONNAIRES RECORDS 0 QUEST KEYWORD- 0 ECORDS- RECORDS FORQUFSTshy

bullDIVISION BY ZERO LINE NUMBER i18 to DIVISION BY ZERO LINE NUMBER 118 bo DIVISION BY ZERO 00 LINE NUMBER 118

Oboo DIVISION BY ZERO 00 LINE NUMBER 118 LTNE |UJMP 000 DIVISION BY ZERO 000 LINE NUMBER 118

bo DIVISION BY ZERO 0 LINE NUMBER 118 00 RUN ABORTED DUE TO DIVISION BY ZERO ERRORS 00 1it LIST IR003 = Ro3t

1 6 42

112 LIST

Comments One of the most sever criticisms 114 LIST IDIVISIION BY ZErO ERROR Cw FOLLOWS of the old CONCOR package was that ]l LET RO01 003 = O it would cease processing for no

apparent reason--ABEND without generating any messages The most )16 LIST iRO01 R003 SET TO ZEROS Pou19 Ron3 conmon types of cessations are division by zero array coordinate 117 LIST RO0 0 = errors and user specified terminations A program to deliberately 118 LET P003 = R002 RO01 test CONCORS ability to providediagnostics was written The system 119 LIST LIST R003 R002RO01 =1 R0031 perfoned exceptionally well and correctly provided the line numbers in the 120 source program which caused the data exceptions The program text has been 121 superimposed upon the Execution Statistics page for illustration purposes 122 END TEST OF OPERANDS AND MSG

123

124

1 5 PEGISTERS RESET TO ZERO

125 LET R001 RO00 R003 = nt

0 N C 1 - Z D I T 0 it - ( r S T A T I S T I C S QJi DAT 0119

I ~ - ~APPENDIX b)APPENDIX TL I A Cr14A) 01 3TUTS JRtTF -0 VtLIQ AP-A I PUT R - 3ITD JTS LY UTPUT - Y C

I)ESTIjAAIES - PECLRW cU] I S T igry-t QEC UP )S irrkqS F] ) T

VLLII= FF= 15( C 1 u U-1 Tr JJ TI A)[ 9 )i 1= TI =WATE ) L i - U = VI L IJ) L T J3r-= I n

D 1 4A 1 EItR P IXSTI J A 1 L = 7 II I fIAT E I i Y C

= y 4 I 19f CZ U AT -Z-2j 1ESTIJh14 E CJ)PI) IATE= 2 V L 1E= L T I P1 =UM 11

= rU F r

F R L- -- I JE TI rjA OrPR eTE= rUMiF=E II V1 LIIF= LIT V M

tYfZU ttj gR R QUIESTI3Ii1 j Vi r IATE= LI n l FJw== 0R V J L T C ( -pJIT E~Q - cUESTT Y2ZjI 1= g rfiflUI)IATL= VeLUJE= P LIf PP= 1 3)

bull~ - A ED 011 1 Tn CCy -tENCE C rGR DI T FOR UP S )r 0 JCCUJ0EICF FJ71 ECGPO

173

175 UNTIL ROO1 gt 10 VARY P001 FROM 1 By Comments

In this example RO01 a 2 dimensional array 177 DO with 10 rows and 10 columns is attempting

178 UNTIL R002 - 10 VARY BO FROM 1 BY 1 to update the smaller (5 X 5) TA2 Nonshy- Uexistent element references caused the error

The program text has been superimposed on this

page for illustration purposes

180 ALLOCATE NOOP = TA1 (RO0] ROO)I

JAI UPDATE TA2 (RO01 R002) RO031

182 LIST N002 = R0037 NOO R003

1P3 LIST VALUES OF TA1 = RO03v TAl (RO01 R002)

184 LIST

185 LIST ITA2=TA2 RO01 Rn02)l

l6 END-DOI LIST

187 END-DO

deg r- -- shy

APPENJDIXI1 J( j

CONTROL AREA INPUTS READ OUTPUTS BY OUTPUT LOMANO OUTPUTS TEltCOMMNDfN

SEQUENCE NUMBER 11T A QUESTIONNAIRES RECORDS QUEST KEYWORD RECORDS SFf- 0 T rr-

JOB STOPPED DUETO USER STOP-RUN COMMAND ON LINC 215 - - -- 2 15 _ _ 4$ A

EDITOR -- NO RMAL END OF J B r T I1)

214 IF IN-RECORD-COUNT gt Comments In this example when the internal variable1215 ITHEN STOP-RUN END-THEN was than

215 TE END-THEN IN-RECORD-COUNT was greater than 500 processing was directed by the source COBOL CONCOR language

216 1 program to cease The program text has been - upon this page for illustrationI _-EC superimposed

S IF IN-RECORD-COUNT- lt 2 - - purposes ~~~~~~~~~~- bull - - ------I-

218~ THEN -LIST lt(THAN 20RECS INR

amp-~~~~~ ntai-beDcent

I

APPENDIX J

DIAGNOSTIC MEFSAGE GUIDE EXAMPLE

APPENDIX J DD-2

WARNING(DD-001) BLANK COMMAND STATEMENT LINE

EXPLANATION A blank command statement line was found in the user Dictionary Division command statement file

ACTION TAKEN Parsing continued with the next command statement line in the Dictionary Division command file

USER RESPONSE Put a comment indicator () in column 1 of the command statement line if spacing is desired if not remove the command statement line

ERROR (DD-002) ILLEGAL FIRST CHARACTER IN STRING

EXPLANATION A character not in the CONCOR legal character set (A-Z 0-9 - + the comma character () the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found as the first character in the string

ACTION TAKEN Parsing began again with the next string

USER RESPONSE Check for possible keying errors and ensure that the first character is a member of the valid CONCOR

character set

ERROR (DD-003) END OF LINE REACHED WITH NO DELIMITER FOR ILLEGAL STRING

EXPLANATION A string starting with a character not valid in the CONCOR legal character set (A-Z 0-9 - + the comma character ()p the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found without one of the delimiter characters ( =) when the end of the command line was reached

ACTION TAKEN Parsing began again with the next user command statement line in the Dictionary Division command file

USER RESPONSE Check for possible keying errors and ensure that each of the characters in the string is a member of the CONCOR valid character set (A-Z 0-9 - + the comma character () he delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ())

ERROR (DD-004) COMMENT INDICATOR () IN STRING

EXPLANATION The comment indicator character () was found embedded in a string

Page 14: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …

FIGURE 3

C O N C o R PAGE

SC0 N S I S T E N C Y A N D C-0 R R E C t I ON ) RUN 6AE 0hi88n

E D I T A N D I M P U T ATI 0 N S Y S T EM

USER DICTIONARY DIVISION-SOURCE LISTING

LINE NUMBER

7070

71

72 DEFINE-RECORD RECORDNAME= HOUSING-RECORDi 72

73 MAX-STORAGE=- 999 73

74 RLrRD-TYPE 11t NOTE AN LITERAL 74

267

71

P6T

268 DEFINE-RECORD RECORD-NAME= POPULATION-RECORD p68

P69MAX-STORAGE= 999269

270RECORD-TYPE= F1270

IEF2R5I JES2JOB04966SO0114 SYSOUT iEF2851 SYS8nO18T095553RAOOOAMCP12ARO000005 PASSED

= IEF2851 VOL qFP NOS TmPOn2 IEF285I wHKHDGMCONCORP12AERRORRECS KEPT

= IEF2851 VOL SEP NOS TMPO03 IEF3731 STEP LOAD START 800181001

IEF3741 STEP LOAD STOP 800181002 CPU OMIN 0165SEC SRR flMIN O25SEC VIRT iO00K SYS 3

STEP START TIME - 100155 STEP END TIME - 100257 STEP CPU TIME - 165 SECONDS STEP COMPLETION CODE - C 0

O TAPES - 0 DISKS - 0 PEGION (REQUESTED - 1000K USED - 1000K) IO COUNT 211

CARDS READ - 0 CARDS PUNCHED - 0 LINES PRINTE6 - 22

STEPNAME - LOAD CHARGE (MACHINE UNITS - 1198 DOLLARS- S-77 5TONUM9E 1

11

Concluding Remarks of System Modifications

Version 1 of the COBOL CONCOR language was notorious for abending without the generation of diagnostic messages Appendix I sets forth a test of some of these historical problems In eac) instance the system provided messagesand source program line references wLch enabled immediate identification of the processing difficulties However tests of the various commands revealed the general high performance character of the language in its newest version Under these circumstances the recommended modifica-ions should be considered as ones which will uniformly allow CONCOR to realize t-r potential of its design enhance its utility and moreover endure as a software product

12

IV THE ADEQUACY OF COBOL CONCOR DOCUMENTATION

The current documentation of CONCOR consists of a newly developed relativelyvoluminous systems manual a Diagnostic Message Guide and some loose-leaf inshystructions concerning the installation of the system on IBM computers Of these three documents the Diagnostic Message Guide is clearly complete and adequateand could bear no changes except to provide message number tabs to enable a programmer to more quickly locate the text of a message number An example of the superior format of this guide appears as Appendix J

Conspicuously absent from the list of systems documents is a Users Guide as originally specified in the ISPC enhancement proposal

A thorough assessment of the Systems Manual has been undertaken and indishycates with some reorganization and editing it could be made a more useful and usable document as itis well constituted informationally In its current form the Systems Manual can be best described as a working document a document which was not intended to teach the CONCOR language but centered upon an audishyence already familiar with previous CONCOR releases Further it is not asystems manual per se as some of its components and appendixes could more appropriately compose a users guide For example excerpts from the POPSTAN publications which review the basic editing concepts and contain the POPSTAN example problem should be deleted from the systems manual and would be more contextually effective in the Users Guide The EXECUTION-DIVISION chapter of the Systems Manual would especially benefit from general editing and reorganishyzation (Specifically the contents of pages 90-92 should be moved to page 63 -shythese pages explain the placement and purpose of the three permissible routines of the CONCOR language Had the author been granted more time he could have outlined additional specific modifications to bring existing documentation to a level of adequacy) However some guidelines stand out

1 Installation instruction and considerations should compose a chapter of the Systems Manual rather than a separate document (Hand-out materials covering installation were not adequate) Further a documentation scheme setting forth how CONCOR was installed on each computer and any modifications required during installation could be serially added to and become a permanent part of the installations systems manual If CONCOR is going to be used and if agencies are going to actively support it as a package installation limitations must be known and well-documented The prescribed format for this documentation should be set forth and any modifishycations to CONCOR should be continuously updated on these shcets so that a history of changes to the language will be readily accessible to supporting agencies If CONCOR is to survive as a standardized system editing results must be capable of being replicated among installations Such a form could contain other information as well concerning the tyce of computer amount of care and nature of utilities supporting

13

users Upon installation a copy of this form could be

sent to the US agency which will ultimately be responsible

for supporting the CONCOR package

an appendix2 A complete COBOL CONCOR program should appear in

for reference

3 The development of the Users Guide should include an intensive

review of the editing concepts involved in processing census

data files beyond the POPSTAN materials

4 An explanation of the CONCOR benchmark program syould appear

in the Users Guide and the Systems Manual The running of a

supplied benchmark program should be a standard installation

protocol used to test all operational aspects of a new

installation

This tool which has5 The development of a CONCOR pocket card been proven useful in teaching and assisting programmers in

utilizing programming language lays out all commands options on

a single small card An example of such a pocket card is the

Such a small document is useful in clari-IBM Assembler Card fying the language and setting forth structural rules without

continual reference to full-size manuals

14

V CONCLUSION

In a situation of extreme need CONCOR could probably be utilized as it seems capable of performing its editing functiens Its usefulness as a data

cleaning tool is generally undisputed The importance of having CONCOR exshyhaustively tested by an independent agency cannot be over-emphasized in light of prior failures of the system to perform adequately Additionally core requirements and processing speed should be precisely determined One must be encouraged in the amount of progress which has been made in a relatively short amount of time towards the completion of the CONCOR package While it is clearly beyond the scope of this report to estimate the cost and the amount of time required to implement the herein outlined highly desirable changes to the CONCOR language and documentation and then exhaustively test the system it is believed that this process could be completed within a single quarter (The changes are of the nature that no structural redefinishytions of a systematic nature would be required) These enhancements would make the COBOL CONCOR package more complete consistent and reliable in the field ISPC has competently moved the system to the point that its comshypletion iswithin reach

Given the probable revision of the current Systems Reference Manual and the development of a suitable Users Guide document it is likely that this version of CONCOR could be taught to experienced programmers (unfamiliar with previous versions of the language) in as little as four weeks And while one is impressed by the potential power and overall simplicity of the COBOL CONCOR language it is difficult to envision individuals with no real computer programming experience utilizing its full capabilities Hence the importance of including a review of basic editing concepts in the documentation is undershylined

Further as documentation is often critical to the perceived ease of use of computer-based systems itwill be essential to have these various manuals examined independently prior to their general use

As previously mentioned Appendix G represents features of which the ISPC feels could have positive impact on CONCOR Depending upon the outcome of the testing recommended in previous chapters and the availability of funding supporting agencies may desire to look again at an assembler (ALC) version of the language

Overall COBOL CONCOR remains an intermediate product yet to realize its potential which is thought considerable Software development of this nature is characteristically of a long-term evolutionary design In light of this reality agencies should be prepared to evaluate such projects success or failure in terms other than that of immediate utility It is the philosophy underlying the system not the system itself which will ultimately be exported

APPENDIX A

Bucen Enforcement Proposal

APPENDIX A

BUCEN IN-HOUSE REVIEW ENHANCEMENT PROPOSAL

1 Easy to use interrecord referencing

2 Improved output file capabilities

A provide overflow protection on WRITE command

B provide OUTPUT command to create corrected questionnaire file that matches IP file data dictionary

3 Improvededit statistics reported (LISTERR)

A provide automatic (user-specified) area break

B provide options for compilation and displaying edit statistics at various levels

C provide automatic (user-specified) tolerance checking of error rates by area

D automatically capture IDs of areas failing tolerance check

4 Clean up known bugs in code

5 Comprehensive testing

6 Clean up and enhance documentation

A reference manual more examples error message guide

B installation guide

C systems manual

APPENDIX B

EVALUATIVE CRITERIA

APPEiUIX B [4 si 11 T q tII

UNITED STATES GOVERNMENshy

Pfern0 KGa 7 dll TO DSPOPFPSD Robert H Haladay

DATE December 3 1979

DSPOPDEIO Liliane Floge

SUBJECT Criteria for Evaluation of SuCen COBOL CONCOR September 30thVersion as presented at January 1930 orkshop

The following points should be considered when evaluatingCONCOR the COBOLediting system during participation at the January 19380 workshyshop

1 Are the Installation anual the System ilanual and theUsers anual ready for use Are these manuals adequate fortheir purposes Are the manuals clear and consistentthey be understood by subject-matter personnel Can

as well program ers

as

2 Is the editing system capable of impleaenting all thefeatures as described in the documenit_tion 3 Does the system appc3Ar to be ely debugged andtested and ready for release to dLvlooing conris 4 Does the system appear to be fast enough to edit a

census in a reasonable airount of time 5 hat size core does tine sys t- equire

6 CorFm-ent in general on coole te-nss usefulness andefficiency of ir-nease usethe n of by developingcountry prograrrers and other c nsus personnel

cc Su-a-- e Olds APiA SDi OrWJulio Ortuizar luotolaCelta Systems

APPENDIX C

WORKSHOP ITINERARY

APPENDIX C

CONCOR Workshop Schedule January 7-18 1980

U S Bureau of the Census International Statistical Programs Center

Scuderi Building Room 209 4235 28th Avenue Marlow Heights Maryland

Monday January 7

930 am shy 1000 Welcoming Remarks

Overview of Workshop

1000 - 1000 Introduction to CONCOR - Purpose and function

- History of development - General computer

requirements

115 - 200 System Description

-Constraints in design of CONCOR

-Basic subsystems of CONCOR

-User interactions with system

-Examples of outputs produced

1030 shy 1045 Break

1045 - 1200 Editing Concepts - Ways to interrogate

data - Ways to correct data

- Editing housing and

population data - POPSTAN

- Advantages of CONCOR

1200 shy 115 pm Break

200 - 230

230 - 245

245 - 325

User Program Organization -Divisions - Sections

- Routines - Commands

Break

Command Language

Description -Types of statements -Format -Syntax

is

2

Tuesday January 8

Dictionary Division Command Statements

115 pm-2 15 Input-Record-Section930 am-10

30 Punctuation - Define-RecordInput data referencing

Working-Data-SectionWorking data declara-- New-Datations - Array-DataInternal data represen-

tation and storage Break215 - 230

1030 - 1045 Break 230 - 325 Dictionary Examples

- Minimum dictionary1045 - 1200 Dictionary-Attributes-

structureSection - Maximum dictionary- Dictionary-Name

structure

- Hand out dictionaryFile-Section problem- Input-File

- Output-File - Write-File

- Error-File

1200 - 115 pm Break

Wednesday January 9

115 pm-2 1 5 Execution Division Command

930 am-1030 Discussion of Partici-Statements (continued)pants Dictionary

- Routines of Edit-Specishyproblems fications-Section

Free work time -Prolog-Routine -Filter-Routine -Epilog-Routine

- Types and functions 1030 - 1045 Break

of edit specification

commands1045 - 1200 Execution Division

Command Statements - Range- Punctuation - Assert- Subscripting

- Internal Identifiers - Report-Control-Section

-Generate-Edit-Statistics 215 - 230 Break-Count-Imputes

-Examples 230 - 325 - PassFail clauses

- List1200 - 115 pm Break

3

Thursday January 10

930 am-1030 Discussion of Problems - If115 pm-215

Free work time - UntilExit - Stop

1030 - 1045 Break 215 - 230 Break

1045 shy 1200 Edit Specification Command Statements 230 - 325 - Drecode

(continued) - Grecode

- Allocate - Update - Let

1200 - 115pm Break

Friday January 11

930 am-1030 Edit Specification 115 pm-3 25 Hand out Edit Specishyfications as problemCommand Statements

(continued) Free work time

- Output - Write

1030 - 1045 Break

1045 - 1200 Report Division Command Statements - Display-Control-

Section -Display-Edit-Statistics

- Tolerance-Control-Section -Error-Rate-Check -Reject-File

-Report Examples

1200 - 115 pm Break

I

29

Monday January 14

930 am-1030 Discuss procedures for running problems on computer

1030-1045 Break

1045-1200 Component Programs of the CONCOR system

1200- 115 pm Break

Tuesday January 15

930 am - 325 pm Free work time

Wednesday January 16

930 am 1200 Free work time

1200- 115 pm Break

115 pm-215 How to Install CONCOR on IBM 360370 OS

215- 230 Break

230-325 Free work time

Thursday January 17

930 am-325 Free work time

115 pm-215 Future CONCOR Design Considerations - for census processing - for survey processing

- manual correction system

215- 230 Break

230 - 245 Evaluation Guidelines

- Hand out evaluation forms

245 - 325 Free work time

Friday January 18

930 am-1030 Free work time 115-325 Free work time

1030 - 1045 Break

1045 - 1200 Group Discussion on CONCOR - submission of written evaluations Distribution of IBM installation tapes Presentation of Certificates to Participants

1200-115 pm Break

APPENDIX D

PARTICIPANTS

APPENDIX D

CONCOR WORKSHOP January 7 - 18 1980 Marlow Heights MD

PARTICIPANTS

KENYA

James Midianga Government Computer Centre Central Bureau of Statistics PO Box 30266 Nairobi Kenya

PANAMA

Omar Rivera Rios Data Processing Specialist Contraloria General de la Republica de Panama Apartado Postal 5213 Panama 5 Panama

PHILIPPINES Guida T Capellan OIC Computer Programming Division National Census and Statistics Office Solicarel Bldg 1 Magsaysay Blvd

Sta Mesa Manila Philippines

THAILAND

Angsumal Sunalai National Statistical Office Bangkok 1 Thailand

EGYPT

Farag Sedky Mourad Ghaleb CAPMAS (Centeral Agency for Public

Mobilization and Statistics)NASR City Cairo Egypt

L

2 31

SAUDI ARABIA

Shargi R Al-Shargi National Computer Center PO Box 2534 Riyadh Saudi Arabia

Ahmed S Al-Ghamdi Dept Director of National Computer Center PO Box 2534 Riyadh Saudi Arabia

OTHER

Robert W ONeal USREPJECOR APO New York NY 09038

John N Adams Data Processing Advisor DACCA Department of State Washington DC 20520

OR co UNDP PO Box 224 Dacca Bangladesh

Joe Quasney US Census Bureau Washington DC 20233

Howard Brunsman 5715 N Ninth St Arlington VA 22205

Larry Shiller and Julio Ortuzar Delta Systems Consultants Inc 264 Alhambra Circle Coral Cables FL 33134

Nicholas Ourusoff Burpee Hill Road New London NH 03257 (United Nations)

Frederic J Grant IV Systems Development GeorgiaWorld Congress Institute 580 North Omni International Atlanta Georgia 30303

3 32

Rafael Samper Mary Jo Keenan Catherine B Gleason LACDR Room 2246 NS Washington DC 20523

John Marshall AIDSERDMPSE Rm 706C SA-12 Washington DC 20523

Susanne Bacon and Mary K Friday ISPCData Processing US Census Bureau Washington DC 20233

Leo Dougherty ISPCWorld Census Staff US Bureau of the Census Washington DC 20233

33

CON C OR WORKSHOP

January 7-18 1979 Washington DC

Staff

Robert R Bair

Luis Garcia

David Malkovsky

Sandra Mansfield

Selma Sawaya

Vivian Toro

APPENDIX E

CONCOR EVALUATION FORM

0

APPENDIX E

CONCOR WORKSHOP January 16 1980

BASIC COBOL VERSION 2

December 1979 Release

Participant Evaluation of Package

1 What is your impression of this version of CONCOR in terms of its usefulness and reliability for editing housing and population census data in less developed countries

2 If you are familiar with previous versions of CONCOR how does this current version compare to them

2D

35

2

3 How would you compare the utility of this version of CONCOR for less developed countries with other systems for editing data of which you are familiar

4 How well do you feel the documentation (Reference Manual and Diagnostic Messages Guide) supports the software

36

3

5 How well do you think this version of CONCOR would serve for editing other types of statistical censuses or surveys

6 If your organization does statistical data processing by computer would you recommend to your superiors that this software be used to edit the data Do you feel assistance would be required in the installation of this software on your organizations computer andor in training userR on the packages applicability to their work

37

7 In what way(s) could improvements or enhancements be made to this version

of the CONCOR software andor its documentation

8 Other comments

1shy

APPENDIX F

NEWOLD COMMAND COMPARISONS

0 0

OLD CONCOR VERSION 1 DECEMBER 1978

DATA-DICTIONARY

DEFINE INPUT-FILE

DEFINE ERROR-FILE

DEFINE COMMON-DATA

(contains the questionnaire-IDand record type location information)

DEFINE RECORD-TYPE=

DEFINE NEW-DATA

DEFINE ARRAY-DATA

APPENDIX F

NEW CONCOR VERSION 2 DECEMBER 1979

DICTIONARY-DIVISION

DICTIONARY-ATTRIBUTES-SECTION

DICTIONARY-NAME

FILE-SECTION

INPUT-FILE OUTPUT-FILE WRITE-FILE ERROR-FILE

DOENTIFICATION-CONTROL-SECTION

AREA-CONTROL

QUESTIONNAIRE-CONTROL

RECORD-CONTROL

INPUT-RECORD-SECTION

DEFINE-RECORD

WORKING-DATA-SECTION

NEW-DATA ARRAY-DATA

END-DiVISION

COMMENTS

In Version 1 all command keyword labels had to begin in column 1 ofthe coding sheet The position notation of Version 2 permits thecommand to begin in columns 1-72

The period preceding a statement

signifies that it is but a commentcard Accordingly it should be noted that while the END-DIVISION command must be present the division name identifier DICTIONARY-DIVISION

has not been implemented

Note Some parameters for each commandhave been altered even where general correspondence exists

OLD CONCOR VERSION 1 DECEMBER 1978

EDIT-SPECIFICATION

PROLOG

FILTER TYPE ( )

EPILOG

ALLOCATE (ALL) ASSERT (AST)

END ERROR

FILTER TYPE ( ) WITH

IF LET

RANGE (RNG) RECODE (REC) STOP --keyword

UPDATE (UPD) WRITE (WRT) XRECODE (XREC)

NEW CONCOR VERSION 2 DECEMBER 1979

EXECUTION-DIVISION

REPORT-CONTROL-SECTION

COUNT-IMPUTES GENERATE-EDIT-STATISTICS

EDIT-SPECIFICATION-SECTION

PROLOG-ROUTINE

FILTER-ROUTINE (name-of-record-type)

EPILOG-ROUTINE

ALLOCATE (ALLOC) ASSERT (ASRT) DRECODE (DRCD) END-DIVISION

EXIT

GRECODE (GRCD) IF LET LIST OUTPUT RANGE (RNG)

STOP UNTIL UPDATE (UPD) WRITE (WRT)

END-DIVISION

COMMENTS

Periods preceding division and section names signify that they are treated as comments by CONCOR

END-DIVISION signifies the termination of the CONCOR division organization and must always be present even though division identifiers have not been implemented

This figure additionally permits a comparison of the EDIT-SPECI-FICATION commands of both CONCOR language versions Note the changes in abbreviations and that some keyword options (not shown) have been altered even where general corresponshydence exists Individual commands are discussed on the following pages

CONCOR LANGUAGE COMMAND STATEMENTS

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

ALLOCATE (ALL) ALLOCATE (ALLOC) Allows for the assignment of values and with the UPDATE command providesthe imputation mechanism Major difference is this statement now providesfor a receiving-identifier list and statistics are generated as coded inthe GENERATE-EDIT-STATISTICS command option of the EXECUTION-DIVISION

ASSERT fAST) ASSERT (ASRT) The ASSERT command is the basic consistency editor command of CONCOR Command now provides for a NOERROR and MESSAGE option Additionallyby utilizinq the DUMPR DUMPO keywords provides the user with the option of using all the current value of all user-identifiers defined for the record type being processed Other changes to this command include the pass end-pass fail end-fail constructions for the P purpose of maintaining structured logic

(see RECODE) DRECODE (DRCD) Provides a method for recoding data items having a starting value of zero and continue in ascending sequence Replaces t previous XRECODE command

EXIT Provides the means to terminate EXECUTION (ESCAPE) of command statements within the UNTIL command

END END-IF END-THEN

A solitary END command is no lonqer permitted in conditional construction Note the similarity to COBOL pseudocode

END-ELSE

ERROR not implemented No longer supported in language

(continued)

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

FILTER TYPE ( ) WITH not implemented No longer are option statements supported in language This command once specified essentially a GOTO depending on construction

(see XRECODE) GRECODE (GRCD) This new command provides a method for the group recoding of input values

IF IF The IF statement has properties found virtually in all programming languages ie provides a means of evaluating a condition and controlling the execution of alternative logical statements Similar to pseudocode a CONCOR programmer must utilize the structured command IF THEN END-THEN ELSE END-ELSE construction in place of the IF THEN END ELSE END statements acceptable to the 1978 version

LET LET Virtually unchanged

LIST New command which allows for the generation of specific messages to be displayed in the report of edit statistics by questionnaire as well as specific individual identifiers

OUTPUT New command which provides the means by which records will be written to the output-file

(continuej)

4)bull

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 VERSION 1

RANGE (RNG)

RECODE (REC)

STOP- keyword

UPDATE (UPD)

(continued)

DECEMBER 1979 VERSION 2

RANGE (RNG)

name changed

STOP- keyword

UNTIL DO END-DO

UPDATE (UPD)

COMMENTS

As a basic editing command of CONCOR the December 1979 version permits the listing of the out-of-range values as well as the entire record or questionnaire through DUMPR and DUMPO optionswithin the command Another new option of this command involves the utilization of a PASS END-PASS FAIL END-FAIL construction permitting the specification of logicalcommand paths depending upon the outcome of the range test

Name changed to DRECODE Virtually identical with the DRC command in the COCENTS system

Unchanged

DO-END-DO one of the most significant and powerful enhancements to the CONCOR language this command provides a loopingcapability similar to most other high-level pogramminglanguages The number of iterations is under user control as well as the value initially assigned to the user identifier (counter) which is incremented with each successive execution of the command statements

Virtually unchanged it is the basic mechanism for maintaining arrays involved in imputation processes

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

WRITE (WRT) WRITE (WRT) The WRITE language statement while once utilized as the primary output command is now used to create an auxiliary or derivative file Statistics may be accumulated with the specification of the GENERATE-EDIT-STATISTICS option of the EXECUTION-DIVISION (See WRITE-FILE of DICTIONARY-DIVISION)

XRECODE (XREC) name changed Old command which has become the GRECODE command in the December 1979 version

0 0 0

DECEMBER 1978 VERSION 1

DECEMBER 1979 VERSION 2

Not implemented REPORT-DIViSION

DISPLAY-CONTROL-SECTION

DISPLAY-EDIT-STATISTICS

TOLERANCE-CONTROL-SECTION

ERROR-RATE-CHECK REJECT-FILE

END-DIVISION

COMMENTS

This new division (note division and section names treated as comments) provides the means to controlthe type of edit statistics reports to be generated Generally all reports produced by the commands ofthis division are organized according to the AREA-CONTROL command of the DICTIONARY-DIVISION Notethat this means the input data file must be preproshycessed (presorted external to CONCOR) on the basisof the AREA-CONTROL data field as CONCOR is incapableof merging uncontiguous records

There is currently no command to enable users to specify unique report headings on the listingfrom this division

APPENDIX G

ISPC FUTURE ENHANCEMENT LIST

G-1

Appendix G

Future Versions of CONCOR Design Considerations

The present version of the CONCOR system (Basic COBOL Version 2) was designed to provide adequate computing power to properly edit housing and population census da a using automatic correction techniquns As a result CONCOR should not be construed to be the ultimate software package for generalized data editing The developers of this version of CONCOR realize that there is a group of improvements that can be made to the system which would facilitate the census data editing process A second group of changes or enhancements also exist that could facilitate the editing of survey or other types of census data It should be noted that some of the modifications to meet these two objectives at times are mutually exclusive Some of the possible modifications and considerations are outlined below

1 Housing and Population Census Processing

1 1 Software

1111 Dictionary Division

(1) Implementation of the Dictionary Attributes Section This would allow the user to specify the dictionary size (number of pages) on disk and control other file characteristics such as volume specification and retention periods if applicable

(2) Addition of an optional input andor output file(s) which could be used to initialize and save values stored in arrays and used during the allocation process This feature would require additional syntax analysis and the creation of new commands to perform the movement of the data values

(3) An output record format area (Output Record Section) could be added This would allow the creation of edited output files in a different format and length than the file read into CONCOR This implementation would require other enhanceients to provide a simple means of moving values from th-iT i tato tile output file Unique naming of user-identifiers would also need to be addressed Another possiblity is the declaration of both the input and output location in the same command when specifying the contents of tile data record to be edited

(4) Addition of user-identifier naires to fields used in the controlling of summary statistics and unique questionnaire determination This would require modification to the system level data dictionary

46 G-2

(5) Increasing the number of area and questionnaire control fields

This implementation would require modification lo the

dictionary as tell as to the error records passed to the Report

Division Another problem would be the formatting of the extra

data values in the report headings

(6) Permit the mixing of difFerent data item types in the CONTROL

commands This would allow the greatest flexibilty in choosing

control fields In order to implement this enhancement the

dictionary and the generated COBOL program (EDITOR) would

require modification

(7) Give the user access to the values in the control fields during

the editing process Either the user would be prohibited from

changing these values (as is currently done with CONCOR

internal identifiers) to prevent the creation of new questionnaires or the summary information gathered on a

questionnaire basis would become meaningless Another

ramification of this enhancement is how to handle fields defined as alphanumeric as CONCOR internally works in binary

(8) Allow input data records with a similar format to be defined by

the same DEFINE-RECORD command statement This could be

accomplished by permitting multiple values in the RECORD-TYPE

clause of the DEFINE-RECORD command A possible problem here

is in the determination of which value to be move to the output

record when the record is to be written out using the OUTPUT

command 0

(9) Implement a COMMION-DATA concept to handle items command to all

record types This could be costly in terms of core storage

and additional conversion time if a storage area is allocated

for each identifier for each record type Only permitting one

common area for all records would require validation of the

data to ensure the values were exactly the same on each record

This wJould require the CONCOR program to perform many more

internal checks when reading the data file into the store area

(10) Permit the selective inclusion of data items found in the

DEFINE-RECORD command statement into the universe count used

for the determination of tolerance levels when using the COUNT-IIPUTES command statement

(11) Increase the number and types of data format referencing now

permitted External numeric data (type N) could be expanded to

handle 18 digit numbers Also signed numeric and packed

decimal data format could be added The signed numeric format

would allow both leading blanks and negative data values This

format is essential ihen dealing with data files generated by

FORTRAN programs

0

+

G-3 47

(12) Increase the number of comparison strings permitted for alphanumeric data items This would allow for easier processing of alphanumeric strings but would require modifications to the system data dictionary

(13) If a range of valid values were specified with the declaration of a user-identifier in the DEFINE-RECORD command an automatic range check could be performed prior to CONCOR giving the user access to the data items on the questionnaire The inclusion of such values are now done by many users in the form of comment statements This enhancement would also facilitate the use of the dictionary by tabulation software

(14) The language format used in the Dictionary Division could be modified to recognize both the space and comma character as valid delimiters The user would still be required to code two commas to receive the system default value for any parameter

(15) Provide a repetition factor in the initialization of arrays

(16) Print the data dictionary name in the titles of the dictionary documentation or provide a means by which the user may specify a title to appear in the listings

112 Execution Division

(M) Implement the Run Control Section This would allow the user to increase execution speed of the EDITOR program This would be accomplished by eliminating some of the system protection features such as division by zero and out of bounds checking In this secion the user could also specify new maximum values for the EDITOR error limits

(2) Provide an internal identifier (OCCURRENCE-PTR) which would contain the occurrence number of the record currently being filtered The only problem to resolve is how to treat this id-ntifier outside of the Filter Routines

(3) Implement a CREATE commnand This command would allow the user to create a record that Was not found on the input file This implementation would require special care as it gives the user the ability to change the valme of a CONCOR internal identifier (TYPE-CGUNT) and modify the contents of the store Another consequence of this command is what action is to be taken in the calculation of tolerance in terms of this new records inclusion into the universe of observations and number of changes

iK

48 G-4

(4) The code generated in the EDITOR program for the DRECODE

command should be modified to utilized a lookup table approach

This would decrease the execution time of the command but woul C

require additional communication between the GENED and GENDD

programs

(5) Permit the user to continue the message text field over

successive lines of code

(6) Implementation of a SET command command that would change

CONCORs default occurrence pointer for the duration of the SET

command This command would be ever helpful as the validation

of the existence of a record would only have to be done once

when the SET command was encounterd The major drawback to

this command is in the users ability to remember the action

taken during the last execution of the command when the

Execution Division command statements may cover many pages of

code

(7) Implement a method of specifying different data formats for

fields on the WRITE command statement This would give greater

flexibility to the user but would require major revision to the

routines now used to process the WRITE command

(8) Provide the user with two sets of internal identifiers The

first set currently exists in the system and is valid on the

basis of the current control area (if specified) being

executed The second set would be accumulated on a run total

basis and would require the creation of new CONCOR internal

ident ifiers

(9) Implement automatic array declarations for capturing allocation

frequency distributions This could be done by the system for

discrete values (especially if point number 13 above is

implemented) but a mechanism for handling continuous values

would need to be designed nother program would then be

required to format the information gathered into a readable

form

113 Report Division

(I) Provide a means by which the user may specify their own headings or titles -for the reports

(2) Give the user the ability to override page ejection as a means

of saving paper

(3) Provide a neans by which the statistics gathered may be

presented in other aqgregations other than the area-break and

total levels now pro ided This would require another

procedure to aggregate the statistics and pass them onto the

Report Division before the printing of the reports began Thi

extra procedure is required because of program size

cons i derat ions

G-5

49

114 System Level

(1) Generation of assembler (ALC) language code for the EDITOR program to be used on the IBM 360370 series type computers

(2) Modification of the GENSRC program i utilize a lookup table instead of the linear search technioa2 now used

(3) Modification of the RDMSGTXT program to look at continuation lines when searching for the text to be supplied by the system

(4) Hlodification to the system data dictionary to better arrange and make available information needed most frequently Also to examine the possibility of making the section dealing with the message text and report generation a separate file Investigate the industry standards for dictionary format and see how CONCOR meets the requirements set forth

(5) Make the parsing and syntacial routines interactive so as to increase the product ity of the CONCOR user This does not mean to imply that the editing process itself should be made interactive but rather only the development of the user code should be put online

12 Documentation

(1)The development of a self-teaching CONCOR manual

(2) The development of a CONCOR Users Guide

G-6

0

50

2 Survey or other Census Processing

21 Software

(1) Make optional the specification of the record type and questionnaire identification information as some files are not

hierarchical in nature

(2) Provide another input file as a means of doing a check-in procedure of sample cases expected This new input file would contain the master list of those oberservations selected to be

examined

(3) Allow floating point calculations

(4) Provide a mechanism for referencing repetitive sections of questionnaire This referencing (loop letter approach) could be accomplished in the same fashion as interrecord checking is now implemented Note that by using this approach two versions of the software would be required One version would be as CONCOR is now implemented and the second would accept flat data files where the subscript indicated a repetitive section on the survey document

(5) Add a weighting scheme to allow meaningful totals and summary

statistics based upon the sampling frame used to gather the data being edited

(6) Increase the number of user-identifiers allowed in the system because of the more detailed information found on survey forms

(7) Provide a means of manual correction of erroneous data items This correction scheme would utilize the data dictionary and allow for correction based upon the user-identifer name and

questionnaire identification CELADE has a COBOL version of

this program but it has not been finished nor tested

42

APPENDIX H

CONCOR SYSTEM INTERNAL VARIABLES

0 APPENO H

CONCOR SYSTEM INTERNAL VARIABLES

1 EOF-FLAG

2 ERROR-TN-QUESTIONNAIRE

3 CURRENT-POINTER-VALUE

4 TYPE-COUNT ( )

5 RECORD-COUNT

6 QUESTIONNAIRE-COUNT

7 RECORDS-IN-STORE

8 INVALID-RECORD-TYPE-COUNT

9 INCOMPLETE-FLAG

10 CONTINUATION-FLAG

11 INVALID-RECORD-FLAG

EOF-FLAG

ERRORS-IN-QUESTIONNAIRE

TYPE-COUNT ( )

IN-RECORD-COUNT

IN-QUESTIONNAIRE-COUNT

RECORDS-IN-STORE

INVALID-RECORD-TYPE-COUNT

INCOMPLETE-FLAG

CONTINUATION-FLAG

OUT-OF-RANGE

NOT-NUMBER

BLANK

(new internal counters) OUT-RECORD-COUNT OUTPUT-QUEST-COUNT WRITE-RECORD-COUNT

Change in spelling of variable

Not mentioned--possible ommission from manual

Not implemented

Note Current documentation equivocates when some variables

are reset--most are initialized with each control area break

Though implemented as reserved identifiers due to poor documentation of (old) CONCOR exact values and utility were unknown

These apparently new counters permit access to valuable totals within control areas

Note These are not cumulative counters Also outputshyquest-count violates naming conversion eg in out

APPENDIX I

CONCOR-EDITOR EXECUTION STATISTICS

C 0 N C o R - E D I T O R EXECUT ION S-TA-T1 ST I CS RUN DATEt 517I

APPENDIX I

CONTROL AREA 0 INPUTS RELD OUTPUTS BY OUTPUTCOHMAND OUTPUTS BY W0ITECOMP4JD NO VALI 0 SEQUENCE NUMBER 00000 00 0 RECTYPES 0

OUESTIONNAIRES RECORDS 0 QUEST KEYWORD- 0 ECORDS- RECORDS FORQUFSTshy

bullDIVISION BY ZERO LINE NUMBER i18 to DIVISION BY ZERO LINE NUMBER 118 bo DIVISION BY ZERO 00 LINE NUMBER 118

Oboo DIVISION BY ZERO 00 LINE NUMBER 118 LTNE |UJMP 000 DIVISION BY ZERO 000 LINE NUMBER 118

bo DIVISION BY ZERO 0 LINE NUMBER 118 00 RUN ABORTED DUE TO DIVISION BY ZERO ERRORS 00 1it LIST IR003 = Ro3t

1 6 42

112 LIST

Comments One of the most sever criticisms 114 LIST IDIVISIION BY ZErO ERROR Cw FOLLOWS of the old CONCOR package was that ]l LET RO01 003 = O it would cease processing for no

apparent reason--ABEND without generating any messages The most )16 LIST iRO01 R003 SET TO ZEROS Pou19 Ron3 conmon types of cessations are division by zero array coordinate 117 LIST RO0 0 = errors and user specified terminations A program to deliberately 118 LET P003 = R002 RO01 test CONCORS ability to providediagnostics was written The system 119 LIST LIST R003 R002RO01 =1 R0031 perfoned exceptionally well and correctly provided the line numbers in the 120 source program which caused the data exceptions The program text has been 121 superimposed upon the Execution Statistics page for illustration purposes 122 END TEST OF OPERANDS AND MSG

123

124

1 5 PEGISTERS RESET TO ZERO

125 LET R001 RO00 R003 = nt

0 N C 1 - Z D I T 0 it - ( r S T A T I S T I C S QJi DAT 0119

I ~ - ~APPENDIX b)APPENDIX TL I A Cr14A) 01 3TUTS JRtTF -0 VtLIQ AP-A I PUT R - 3ITD JTS LY UTPUT - Y C

I)ESTIjAAIES - PECLRW cU] I S T igry-t QEC UP )S irrkqS F] ) T

VLLII= FF= 15( C 1 u U-1 Tr JJ TI A)[ 9 )i 1= TI =WATE ) L i - U = VI L IJ) L T J3r-= I n

D 1 4A 1 EItR P IXSTI J A 1 L = 7 II I fIAT E I i Y C

= y 4 I 19f CZ U AT -Z-2j 1ESTIJh14 E CJ)PI) IATE= 2 V L 1E= L T I P1 =UM 11

= rU F r

F R L- -- I JE TI rjA OrPR eTE= rUMiF=E II V1 LIIF= LIT V M

tYfZU ttj gR R QUIESTI3Ii1 j Vi r IATE= LI n l FJw== 0R V J L T C ( -pJIT E~Q - cUESTT Y2ZjI 1= g rfiflUI)IATL= VeLUJE= P LIf PP= 1 3)

bull~ - A ED 011 1 Tn CCy -tENCE C rGR DI T FOR UP S )r 0 JCCUJ0EICF FJ71 ECGPO

173

175 UNTIL ROO1 gt 10 VARY P001 FROM 1 By Comments

In this example RO01 a 2 dimensional array 177 DO with 10 rows and 10 columns is attempting

178 UNTIL R002 - 10 VARY BO FROM 1 BY 1 to update the smaller (5 X 5) TA2 Nonshy- Uexistent element references caused the error

The program text has been superimposed on this

page for illustration purposes

180 ALLOCATE NOOP = TA1 (RO0] ROO)I

JAI UPDATE TA2 (RO01 R002) RO031

182 LIST N002 = R0037 NOO R003

1P3 LIST VALUES OF TA1 = RO03v TAl (RO01 R002)

184 LIST

185 LIST ITA2=TA2 RO01 Rn02)l

l6 END-DOI LIST

187 END-DO

deg r- -- shy

APPENJDIXI1 J( j

CONTROL AREA INPUTS READ OUTPUTS BY OUTPUT LOMANO OUTPUTS TEltCOMMNDfN

SEQUENCE NUMBER 11T A QUESTIONNAIRES RECORDS QUEST KEYWORD RECORDS SFf- 0 T rr-

JOB STOPPED DUETO USER STOP-RUN COMMAND ON LINC 215 - - -- 2 15 _ _ 4$ A

EDITOR -- NO RMAL END OF J B r T I1)

214 IF IN-RECORD-COUNT gt Comments In this example when the internal variable1215 ITHEN STOP-RUN END-THEN was than

215 TE END-THEN IN-RECORD-COUNT was greater than 500 processing was directed by the source COBOL CONCOR language

216 1 program to cease The program text has been - upon this page for illustrationI _-EC superimposed

S IF IN-RECORD-COUNT- lt 2 - - purposes ~~~~~~~~~~- bull - - ------I-

218~ THEN -LIST lt(THAN 20RECS INR

amp-~~~~~ ntai-beDcent

I

APPENDIX J

DIAGNOSTIC MEFSAGE GUIDE EXAMPLE

APPENDIX J DD-2

WARNING(DD-001) BLANK COMMAND STATEMENT LINE

EXPLANATION A blank command statement line was found in the user Dictionary Division command statement file

ACTION TAKEN Parsing continued with the next command statement line in the Dictionary Division command file

USER RESPONSE Put a comment indicator () in column 1 of the command statement line if spacing is desired if not remove the command statement line

ERROR (DD-002) ILLEGAL FIRST CHARACTER IN STRING

EXPLANATION A character not in the CONCOR legal character set (A-Z 0-9 - + the comma character () the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found as the first character in the string

ACTION TAKEN Parsing began again with the next string

USER RESPONSE Check for possible keying errors and ensure that the first character is a member of the valid CONCOR

character set

ERROR (DD-003) END OF LINE REACHED WITH NO DELIMITER FOR ILLEGAL STRING

EXPLANATION A string starting with a character not valid in the CONCOR legal character set (A-Z 0-9 - + the comma character ()p the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found without one of the delimiter characters ( =) when the end of the command line was reached

ACTION TAKEN Parsing began again with the next user command statement line in the Dictionary Division command file

USER RESPONSE Check for possible keying errors and ensure that each of the characters in the string is a member of the CONCOR valid character set (A-Z 0-9 - + the comma character () he delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ())

ERROR (DD-004) COMMENT INDICATOR () IN STRING

EXPLANATION The comment indicator character () was found embedded in a string

Page 15: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …

11

Concluding Remarks of System Modifications

Version 1 of the COBOL CONCOR language was notorious for abending without the generation of diagnostic messages Appendix I sets forth a test of some of these historical problems In eac) instance the system provided messagesand source program line references wLch enabled immediate identification of the processing difficulties However tests of the various commands revealed the general high performance character of the language in its newest version Under these circumstances the recommended modifica-ions should be considered as ones which will uniformly allow CONCOR to realize t-r potential of its design enhance its utility and moreover endure as a software product

12

IV THE ADEQUACY OF COBOL CONCOR DOCUMENTATION

The current documentation of CONCOR consists of a newly developed relativelyvoluminous systems manual a Diagnostic Message Guide and some loose-leaf inshystructions concerning the installation of the system on IBM computers Of these three documents the Diagnostic Message Guide is clearly complete and adequateand could bear no changes except to provide message number tabs to enable a programmer to more quickly locate the text of a message number An example of the superior format of this guide appears as Appendix J

Conspicuously absent from the list of systems documents is a Users Guide as originally specified in the ISPC enhancement proposal

A thorough assessment of the Systems Manual has been undertaken and indishycates with some reorganization and editing it could be made a more useful and usable document as itis well constituted informationally In its current form the Systems Manual can be best described as a working document a document which was not intended to teach the CONCOR language but centered upon an audishyence already familiar with previous CONCOR releases Further it is not asystems manual per se as some of its components and appendixes could more appropriately compose a users guide For example excerpts from the POPSTAN publications which review the basic editing concepts and contain the POPSTAN example problem should be deleted from the systems manual and would be more contextually effective in the Users Guide The EXECUTION-DIVISION chapter of the Systems Manual would especially benefit from general editing and reorganishyzation (Specifically the contents of pages 90-92 should be moved to page 63 -shythese pages explain the placement and purpose of the three permissible routines of the CONCOR language Had the author been granted more time he could have outlined additional specific modifications to bring existing documentation to a level of adequacy) However some guidelines stand out

1 Installation instruction and considerations should compose a chapter of the Systems Manual rather than a separate document (Hand-out materials covering installation were not adequate) Further a documentation scheme setting forth how CONCOR was installed on each computer and any modifications required during installation could be serially added to and become a permanent part of the installations systems manual If CONCOR is going to be used and if agencies are going to actively support it as a package installation limitations must be known and well-documented The prescribed format for this documentation should be set forth and any modifishycations to CONCOR should be continuously updated on these shcets so that a history of changes to the language will be readily accessible to supporting agencies If CONCOR is to survive as a standardized system editing results must be capable of being replicated among installations Such a form could contain other information as well concerning the tyce of computer amount of care and nature of utilities supporting

13

users Upon installation a copy of this form could be

sent to the US agency which will ultimately be responsible

for supporting the CONCOR package

an appendix2 A complete COBOL CONCOR program should appear in

for reference

3 The development of the Users Guide should include an intensive

review of the editing concepts involved in processing census

data files beyond the POPSTAN materials

4 An explanation of the CONCOR benchmark program syould appear

in the Users Guide and the Systems Manual The running of a

supplied benchmark program should be a standard installation

protocol used to test all operational aspects of a new

installation

This tool which has5 The development of a CONCOR pocket card been proven useful in teaching and assisting programmers in

utilizing programming language lays out all commands options on

a single small card An example of such a pocket card is the

Such a small document is useful in clari-IBM Assembler Card fying the language and setting forth structural rules without

continual reference to full-size manuals

14

V CONCLUSION

In a situation of extreme need CONCOR could probably be utilized as it seems capable of performing its editing functiens Its usefulness as a data

cleaning tool is generally undisputed The importance of having CONCOR exshyhaustively tested by an independent agency cannot be over-emphasized in light of prior failures of the system to perform adequately Additionally core requirements and processing speed should be precisely determined One must be encouraged in the amount of progress which has been made in a relatively short amount of time towards the completion of the CONCOR package While it is clearly beyond the scope of this report to estimate the cost and the amount of time required to implement the herein outlined highly desirable changes to the CONCOR language and documentation and then exhaustively test the system it is believed that this process could be completed within a single quarter (The changes are of the nature that no structural redefinishytions of a systematic nature would be required) These enhancements would make the COBOL CONCOR package more complete consistent and reliable in the field ISPC has competently moved the system to the point that its comshypletion iswithin reach

Given the probable revision of the current Systems Reference Manual and the development of a suitable Users Guide document it is likely that this version of CONCOR could be taught to experienced programmers (unfamiliar with previous versions of the language) in as little as four weeks And while one is impressed by the potential power and overall simplicity of the COBOL CONCOR language it is difficult to envision individuals with no real computer programming experience utilizing its full capabilities Hence the importance of including a review of basic editing concepts in the documentation is undershylined

Further as documentation is often critical to the perceived ease of use of computer-based systems itwill be essential to have these various manuals examined independently prior to their general use

As previously mentioned Appendix G represents features of which the ISPC feels could have positive impact on CONCOR Depending upon the outcome of the testing recommended in previous chapters and the availability of funding supporting agencies may desire to look again at an assembler (ALC) version of the language

Overall COBOL CONCOR remains an intermediate product yet to realize its potential which is thought considerable Software development of this nature is characteristically of a long-term evolutionary design In light of this reality agencies should be prepared to evaluate such projects success or failure in terms other than that of immediate utility It is the philosophy underlying the system not the system itself which will ultimately be exported

APPENDIX A

Bucen Enforcement Proposal

APPENDIX A

BUCEN IN-HOUSE REVIEW ENHANCEMENT PROPOSAL

1 Easy to use interrecord referencing

2 Improved output file capabilities

A provide overflow protection on WRITE command

B provide OUTPUT command to create corrected questionnaire file that matches IP file data dictionary

3 Improvededit statistics reported (LISTERR)

A provide automatic (user-specified) area break

B provide options for compilation and displaying edit statistics at various levels

C provide automatic (user-specified) tolerance checking of error rates by area

D automatically capture IDs of areas failing tolerance check

4 Clean up known bugs in code

5 Comprehensive testing

6 Clean up and enhance documentation

A reference manual more examples error message guide

B installation guide

C systems manual

APPENDIX B

EVALUATIVE CRITERIA

APPEiUIX B [4 si 11 T q tII

UNITED STATES GOVERNMENshy

Pfern0 KGa 7 dll TO DSPOPFPSD Robert H Haladay

DATE December 3 1979

DSPOPDEIO Liliane Floge

SUBJECT Criteria for Evaluation of SuCen COBOL CONCOR September 30thVersion as presented at January 1930 orkshop

The following points should be considered when evaluatingCONCOR the COBOLediting system during participation at the January 19380 workshyshop

1 Are the Installation anual the System ilanual and theUsers anual ready for use Are these manuals adequate fortheir purposes Are the manuals clear and consistentthey be understood by subject-matter personnel Can

as well program ers

as

2 Is the editing system capable of impleaenting all thefeatures as described in the documenit_tion 3 Does the system appc3Ar to be ely debugged andtested and ready for release to dLvlooing conris 4 Does the system appear to be fast enough to edit a

census in a reasonable airount of time 5 hat size core does tine sys t- equire

6 CorFm-ent in general on coole te-nss usefulness andefficiency of ir-nease usethe n of by developingcountry prograrrers and other c nsus personnel

cc Su-a-- e Olds APiA SDi OrWJulio Ortuizar luotolaCelta Systems

APPENDIX C

WORKSHOP ITINERARY

APPENDIX C

CONCOR Workshop Schedule January 7-18 1980

U S Bureau of the Census International Statistical Programs Center

Scuderi Building Room 209 4235 28th Avenue Marlow Heights Maryland

Monday January 7

930 am shy 1000 Welcoming Remarks

Overview of Workshop

1000 - 1000 Introduction to CONCOR - Purpose and function

- History of development - General computer

requirements

115 - 200 System Description

-Constraints in design of CONCOR

-Basic subsystems of CONCOR

-User interactions with system

-Examples of outputs produced

1030 shy 1045 Break

1045 - 1200 Editing Concepts - Ways to interrogate

data - Ways to correct data

- Editing housing and

population data - POPSTAN

- Advantages of CONCOR

1200 shy 115 pm Break

200 - 230

230 - 245

245 - 325

User Program Organization -Divisions - Sections

- Routines - Commands

Break

Command Language

Description -Types of statements -Format -Syntax

is

2

Tuesday January 8

Dictionary Division Command Statements

115 pm-2 15 Input-Record-Section930 am-10

30 Punctuation - Define-RecordInput data referencing

Working-Data-SectionWorking data declara-- New-Datations - Array-DataInternal data represen-

tation and storage Break215 - 230

1030 - 1045 Break 230 - 325 Dictionary Examples

- Minimum dictionary1045 - 1200 Dictionary-Attributes-

structureSection - Maximum dictionary- Dictionary-Name

structure

- Hand out dictionaryFile-Section problem- Input-File

- Output-File - Write-File

- Error-File

1200 - 115 pm Break

Wednesday January 9

115 pm-2 1 5 Execution Division Command

930 am-1030 Discussion of Partici-Statements (continued)pants Dictionary

- Routines of Edit-Specishyproblems fications-Section

Free work time -Prolog-Routine -Filter-Routine -Epilog-Routine

- Types and functions 1030 - 1045 Break

of edit specification

commands1045 - 1200 Execution Division

Command Statements - Range- Punctuation - Assert- Subscripting

- Internal Identifiers - Report-Control-Section

-Generate-Edit-Statistics 215 - 230 Break-Count-Imputes

-Examples 230 - 325 - PassFail clauses

- List1200 - 115 pm Break

3

Thursday January 10

930 am-1030 Discussion of Problems - If115 pm-215

Free work time - UntilExit - Stop

1030 - 1045 Break 215 - 230 Break

1045 shy 1200 Edit Specification Command Statements 230 - 325 - Drecode

(continued) - Grecode

- Allocate - Update - Let

1200 - 115pm Break

Friday January 11

930 am-1030 Edit Specification 115 pm-3 25 Hand out Edit Specishyfications as problemCommand Statements

(continued) Free work time

- Output - Write

1030 - 1045 Break

1045 - 1200 Report Division Command Statements - Display-Control-

Section -Display-Edit-Statistics

- Tolerance-Control-Section -Error-Rate-Check -Reject-File

-Report Examples

1200 - 115 pm Break

I

29

Monday January 14

930 am-1030 Discuss procedures for running problems on computer

1030-1045 Break

1045-1200 Component Programs of the CONCOR system

1200- 115 pm Break

Tuesday January 15

930 am - 325 pm Free work time

Wednesday January 16

930 am 1200 Free work time

1200- 115 pm Break

115 pm-215 How to Install CONCOR on IBM 360370 OS

215- 230 Break

230-325 Free work time

Thursday January 17

930 am-325 Free work time

115 pm-215 Future CONCOR Design Considerations - for census processing - for survey processing

- manual correction system

215- 230 Break

230 - 245 Evaluation Guidelines

- Hand out evaluation forms

245 - 325 Free work time

Friday January 18

930 am-1030 Free work time 115-325 Free work time

1030 - 1045 Break

1045 - 1200 Group Discussion on CONCOR - submission of written evaluations Distribution of IBM installation tapes Presentation of Certificates to Participants

1200-115 pm Break

APPENDIX D

PARTICIPANTS

APPENDIX D

CONCOR WORKSHOP January 7 - 18 1980 Marlow Heights MD

PARTICIPANTS

KENYA

James Midianga Government Computer Centre Central Bureau of Statistics PO Box 30266 Nairobi Kenya

PANAMA

Omar Rivera Rios Data Processing Specialist Contraloria General de la Republica de Panama Apartado Postal 5213 Panama 5 Panama

PHILIPPINES Guida T Capellan OIC Computer Programming Division National Census and Statistics Office Solicarel Bldg 1 Magsaysay Blvd

Sta Mesa Manila Philippines

THAILAND

Angsumal Sunalai National Statistical Office Bangkok 1 Thailand

EGYPT

Farag Sedky Mourad Ghaleb CAPMAS (Centeral Agency for Public

Mobilization and Statistics)NASR City Cairo Egypt

L

2 31

SAUDI ARABIA

Shargi R Al-Shargi National Computer Center PO Box 2534 Riyadh Saudi Arabia

Ahmed S Al-Ghamdi Dept Director of National Computer Center PO Box 2534 Riyadh Saudi Arabia

OTHER

Robert W ONeal USREPJECOR APO New York NY 09038

John N Adams Data Processing Advisor DACCA Department of State Washington DC 20520

OR co UNDP PO Box 224 Dacca Bangladesh

Joe Quasney US Census Bureau Washington DC 20233

Howard Brunsman 5715 N Ninth St Arlington VA 22205

Larry Shiller and Julio Ortuzar Delta Systems Consultants Inc 264 Alhambra Circle Coral Cables FL 33134

Nicholas Ourusoff Burpee Hill Road New London NH 03257 (United Nations)

Frederic J Grant IV Systems Development GeorgiaWorld Congress Institute 580 North Omni International Atlanta Georgia 30303

3 32

Rafael Samper Mary Jo Keenan Catherine B Gleason LACDR Room 2246 NS Washington DC 20523

John Marshall AIDSERDMPSE Rm 706C SA-12 Washington DC 20523

Susanne Bacon and Mary K Friday ISPCData Processing US Census Bureau Washington DC 20233

Leo Dougherty ISPCWorld Census Staff US Bureau of the Census Washington DC 20233

33

CON C OR WORKSHOP

January 7-18 1979 Washington DC

Staff

Robert R Bair

Luis Garcia

David Malkovsky

Sandra Mansfield

Selma Sawaya

Vivian Toro

APPENDIX E

CONCOR EVALUATION FORM

0

APPENDIX E

CONCOR WORKSHOP January 16 1980

BASIC COBOL VERSION 2

December 1979 Release

Participant Evaluation of Package

1 What is your impression of this version of CONCOR in terms of its usefulness and reliability for editing housing and population census data in less developed countries

2 If you are familiar with previous versions of CONCOR how does this current version compare to them

2D

35

2

3 How would you compare the utility of this version of CONCOR for less developed countries with other systems for editing data of which you are familiar

4 How well do you feel the documentation (Reference Manual and Diagnostic Messages Guide) supports the software

36

3

5 How well do you think this version of CONCOR would serve for editing other types of statistical censuses or surveys

6 If your organization does statistical data processing by computer would you recommend to your superiors that this software be used to edit the data Do you feel assistance would be required in the installation of this software on your organizations computer andor in training userR on the packages applicability to their work

37

7 In what way(s) could improvements or enhancements be made to this version

of the CONCOR software andor its documentation

8 Other comments

1shy

APPENDIX F

NEWOLD COMMAND COMPARISONS

0 0

OLD CONCOR VERSION 1 DECEMBER 1978

DATA-DICTIONARY

DEFINE INPUT-FILE

DEFINE ERROR-FILE

DEFINE COMMON-DATA

(contains the questionnaire-IDand record type location information)

DEFINE RECORD-TYPE=

DEFINE NEW-DATA

DEFINE ARRAY-DATA

APPENDIX F

NEW CONCOR VERSION 2 DECEMBER 1979

DICTIONARY-DIVISION

DICTIONARY-ATTRIBUTES-SECTION

DICTIONARY-NAME

FILE-SECTION

INPUT-FILE OUTPUT-FILE WRITE-FILE ERROR-FILE

DOENTIFICATION-CONTROL-SECTION

AREA-CONTROL

QUESTIONNAIRE-CONTROL

RECORD-CONTROL

INPUT-RECORD-SECTION

DEFINE-RECORD

WORKING-DATA-SECTION

NEW-DATA ARRAY-DATA

END-DiVISION

COMMENTS

In Version 1 all command keyword labels had to begin in column 1 ofthe coding sheet The position notation of Version 2 permits thecommand to begin in columns 1-72

The period preceding a statement

signifies that it is but a commentcard Accordingly it should be noted that while the END-DIVISION command must be present the division name identifier DICTIONARY-DIVISION

has not been implemented

Note Some parameters for each commandhave been altered even where general correspondence exists

OLD CONCOR VERSION 1 DECEMBER 1978

EDIT-SPECIFICATION

PROLOG

FILTER TYPE ( )

EPILOG

ALLOCATE (ALL) ASSERT (AST)

END ERROR

FILTER TYPE ( ) WITH

IF LET

RANGE (RNG) RECODE (REC) STOP --keyword

UPDATE (UPD) WRITE (WRT) XRECODE (XREC)

NEW CONCOR VERSION 2 DECEMBER 1979

EXECUTION-DIVISION

REPORT-CONTROL-SECTION

COUNT-IMPUTES GENERATE-EDIT-STATISTICS

EDIT-SPECIFICATION-SECTION

PROLOG-ROUTINE

FILTER-ROUTINE (name-of-record-type)

EPILOG-ROUTINE

ALLOCATE (ALLOC) ASSERT (ASRT) DRECODE (DRCD) END-DIVISION

EXIT

GRECODE (GRCD) IF LET LIST OUTPUT RANGE (RNG)

STOP UNTIL UPDATE (UPD) WRITE (WRT)

END-DIVISION

COMMENTS

Periods preceding division and section names signify that they are treated as comments by CONCOR

END-DIVISION signifies the termination of the CONCOR division organization and must always be present even though division identifiers have not been implemented

This figure additionally permits a comparison of the EDIT-SPECI-FICATION commands of both CONCOR language versions Note the changes in abbreviations and that some keyword options (not shown) have been altered even where general corresponshydence exists Individual commands are discussed on the following pages

CONCOR LANGUAGE COMMAND STATEMENTS

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

ALLOCATE (ALL) ALLOCATE (ALLOC) Allows for the assignment of values and with the UPDATE command providesthe imputation mechanism Major difference is this statement now providesfor a receiving-identifier list and statistics are generated as coded inthe GENERATE-EDIT-STATISTICS command option of the EXECUTION-DIVISION

ASSERT fAST) ASSERT (ASRT) The ASSERT command is the basic consistency editor command of CONCOR Command now provides for a NOERROR and MESSAGE option Additionallyby utilizinq the DUMPR DUMPO keywords provides the user with the option of using all the current value of all user-identifiers defined for the record type being processed Other changes to this command include the pass end-pass fail end-fail constructions for the P purpose of maintaining structured logic

(see RECODE) DRECODE (DRCD) Provides a method for recoding data items having a starting value of zero and continue in ascending sequence Replaces t previous XRECODE command

EXIT Provides the means to terminate EXECUTION (ESCAPE) of command statements within the UNTIL command

END END-IF END-THEN

A solitary END command is no lonqer permitted in conditional construction Note the similarity to COBOL pseudocode

END-ELSE

ERROR not implemented No longer supported in language

(continued)

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

FILTER TYPE ( ) WITH not implemented No longer are option statements supported in language This command once specified essentially a GOTO depending on construction

(see XRECODE) GRECODE (GRCD) This new command provides a method for the group recoding of input values

IF IF The IF statement has properties found virtually in all programming languages ie provides a means of evaluating a condition and controlling the execution of alternative logical statements Similar to pseudocode a CONCOR programmer must utilize the structured command IF THEN END-THEN ELSE END-ELSE construction in place of the IF THEN END ELSE END statements acceptable to the 1978 version

LET LET Virtually unchanged

LIST New command which allows for the generation of specific messages to be displayed in the report of edit statistics by questionnaire as well as specific individual identifiers

OUTPUT New command which provides the means by which records will be written to the output-file

(continuej)

4)bull

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 VERSION 1

RANGE (RNG)

RECODE (REC)

STOP- keyword

UPDATE (UPD)

(continued)

DECEMBER 1979 VERSION 2

RANGE (RNG)

name changed

STOP- keyword

UNTIL DO END-DO

UPDATE (UPD)

COMMENTS

As a basic editing command of CONCOR the December 1979 version permits the listing of the out-of-range values as well as the entire record or questionnaire through DUMPR and DUMPO optionswithin the command Another new option of this command involves the utilization of a PASS END-PASS FAIL END-FAIL construction permitting the specification of logicalcommand paths depending upon the outcome of the range test

Name changed to DRECODE Virtually identical with the DRC command in the COCENTS system

Unchanged

DO-END-DO one of the most significant and powerful enhancements to the CONCOR language this command provides a loopingcapability similar to most other high-level pogramminglanguages The number of iterations is under user control as well as the value initially assigned to the user identifier (counter) which is incremented with each successive execution of the command statements

Virtually unchanged it is the basic mechanism for maintaining arrays involved in imputation processes

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

WRITE (WRT) WRITE (WRT) The WRITE language statement while once utilized as the primary output command is now used to create an auxiliary or derivative file Statistics may be accumulated with the specification of the GENERATE-EDIT-STATISTICS option of the EXECUTION-DIVISION (See WRITE-FILE of DICTIONARY-DIVISION)

XRECODE (XREC) name changed Old command which has become the GRECODE command in the December 1979 version

0 0 0

DECEMBER 1978 VERSION 1

DECEMBER 1979 VERSION 2

Not implemented REPORT-DIViSION

DISPLAY-CONTROL-SECTION

DISPLAY-EDIT-STATISTICS

TOLERANCE-CONTROL-SECTION

ERROR-RATE-CHECK REJECT-FILE

END-DIVISION

COMMENTS

This new division (note division and section names treated as comments) provides the means to controlthe type of edit statistics reports to be generated Generally all reports produced by the commands ofthis division are organized according to the AREA-CONTROL command of the DICTIONARY-DIVISION Notethat this means the input data file must be preproshycessed (presorted external to CONCOR) on the basisof the AREA-CONTROL data field as CONCOR is incapableof merging uncontiguous records

There is currently no command to enable users to specify unique report headings on the listingfrom this division

APPENDIX G

ISPC FUTURE ENHANCEMENT LIST

G-1

Appendix G

Future Versions of CONCOR Design Considerations

The present version of the CONCOR system (Basic COBOL Version 2) was designed to provide adequate computing power to properly edit housing and population census da a using automatic correction techniquns As a result CONCOR should not be construed to be the ultimate software package for generalized data editing The developers of this version of CONCOR realize that there is a group of improvements that can be made to the system which would facilitate the census data editing process A second group of changes or enhancements also exist that could facilitate the editing of survey or other types of census data It should be noted that some of the modifications to meet these two objectives at times are mutually exclusive Some of the possible modifications and considerations are outlined below

1 Housing and Population Census Processing

1 1 Software

1111 Dictionary Division

(1) Implementation of the Dictionary Attributes Section This would allow the user to specify the dictionary size (number of pages) on disk and control other file characteristics such as volume specification and retention periods if applicable

(2) Addition of an optional input andor output file(s) which could be used to initialize and save values stored in arrays and used during the allocation process This feature would require additional syntax analysis and the creation of new commands to perform the movement of the data values

(3) An output record format area (Output Record Section) could be added This would allow the creation of edited output files in a different format and length than the file read into CONCOR This implementation would require other enhanceients to provide a simple means of moving values from th-iT i tato tile output file Unique naming of user-identifiers would also need to be addressed Another possiblity is the declaration of both the input and output location in the same command when specifying the contents of tile data record to be edited

(4) Addition of user-identifier naires to fields used in the controlling of summary statistics and unique questionnaire determination This would require modification to the system level data dictionary

46 G-2

(5) Increasing the number of area and questionnaire control fields

This implementation would require modification lo the

dictionary as tell as to the error records passed to the Report

Division Another problem would be the formatting of the extra

data values in the report headings

(6) Permit the mixing of difFerent data item types in the CONTROL

commands This would allow the greatest flexibilty in choosing

control fields In order to implement this enhancement the

dictionary and the generated COBOL program (EDITOR) would

require modification

(7) Give the user access to the values in the control fields during

the editing process Either the user would be prohibited from

changing these values (as is currently done with CONCOR

internal identifiers) to prevent the creation of new questionnaires or the summary information gathered on a

questionnaire basis would become meaningless Another

ramification of this enhancement is how to handle fields defined as alphanumeric as CONCOR internally works in binary

(8) Allow input data records with a similar format to be defined by

the same DEFINE-RECORD command statement This could be

accomplished by permitting multiple values in the RECORD-TYPE

clause of the DEFINE-RECORD command A possible problem here

is in the determination of which value to be move to the output

record when the record is to be written out using the OUTPUT

command 0

(9) Implement a COMMION-DATA concept to handle items command to all

record types This could be costly in terms of core storage

and additional conversion time if a storage area is allocated

for each identifier for each record type Only permitting one

common area for all records would require validation of the

data to ensure the values were exactly the same on each record

This wJould require the CONCOR program to perform many more

internal checks when reading the data file into the store area

(10) Permit the selective inclusion of data items found in the

DEFINE-RECORD command statement into the universe count used

for the determination of tolerance levels when using the COUNT-IIPUTES command statement

(11) Increase the number and types of data format referencing now

permitted External numeric data (type N) could be expanded to

handle 18 digit numbers Also signed numeric and packed

decimal data format could be added The signed numeric format

would allow both leading blanks and negative data values This

format is essential ihen dealing with data files generated by

FORTRAN programs

0

+

G-3 47

(12) Increase the number of comparison strings permitted for alphanumeric data items This would allow for easier processing of alphanumeric strings but would require modifications to the system data dictionary

(13) If a range of valid values were specified with the declaration of a user-identifier in the DEFINE-RECORD command an automatic range check could be performed prior to CONCOR giving the user access to the data items on the questionnaire The inclusion of such values are now done by many users in the form of comment statements This enhancement would also facilitate the use of the dictionary by tabulation software

(14) The language format used in the Dictionary Division could be modified to recognize both the space and comma character as valid delimiters The user would still be required to code two commas to receive the system default value for any parameter

(15) Provide a repetition factor in the initialization of arrays

(16) Print the data dictionary name in the titles of the dictionary documentation or provide a means by which the user may specify a title to appear in the listings

112 Execution Division

(M) Implement the Run Control Section This would allow the user to increase execution speed of the EDITOR program This would be accomplished by eliminating some of the system protection features such as division by zero and out of bounds checking In this secion the user could also specify new maximum values for the EDITOR error limits

(2) Provide an internal identifier (OCCURRENCE-PTR) which would contain the occurrence number of the record currently being filtered The only problem to resolve is how to treat this id-ntifier outside of the Filter Routines

(3) Implement a CREATE commnand This command would allow the user to create a record that Was not found on the input file This implementation would require special care as it gives the user the ability to change the valme of a CONCOR internal identifier (TYPE-CGUNT) and modify the contents of the store Another consequence of this command is what action is to be taken in the calculation of tolerance in terms of this new records inclusion into the universe of observations and number of changes

iK

48 G-4

(4) The code generated in the EDITOR program for the DRECODE

command should be modified to utilized a lookup table approach

This would decrease the execution time of the command but woul C

require additional communication between the GENED and GENDD

programs

(5) Permit the user to continue the message text field over

successive lines of code

(6) Implementation of a SET command command that would change

CONCORs default occurrence pointer for the duration of the SET

command This command would be ever helpful as the validation

of the existence of a record would only have to be done once

when the SET command was encounterd The major drawback to

this command is in the users ability to remember the action

taken during the last execution of the command when the

Execution Division command statements may cover many pages of

code

(7) Implement a method of specifying different data formats for

fields on the WRITE command statement This would give greater

flexibility to the user but would require major revision to the

routines now used to process the WRITE command

(8) Provide the user with two sets of internal identifiers The

first set currently exists in the system and is valid on the

basis of the current control area (if specified) being

executed The second set would be accumulated on a run total

basis and would require the creation of new CONCOR internal

ident ifiers

(9) Implement automatic array declarations for capturing allocation

frequency distributions This could be done by the system for

discrete values (especially if point number 13 above is

implemented) but a mechanism for handling continuous values

would need to be designed nother program would then be

required to format the information gathered into a readable

form

113 Report Division

(I) Provide a means by which the user may specify their own headings or titles -for the reports

(2) Give the user the ability to override page ejection as a means

of saving paper

(3) Provide a neans by which the statistics gathered may be

presented in other aqgregations other than the area-break and

total levels now pro ided This would require another

procedure to aggregate the statistics and pass them onto the

Report Division before the printing of the reports began Thi

extra procedure is required because of program size

cons i derat ions

G-5

49

114 System Level

(1) Generation of assembler (ALC) language code for the EDITOR program to be used on the IBM 360370 series type computers

(2) Modification of the GENSRC program i utilize a lookup table instead of the linear search technioa2 now used

(3) Modification of the RDMSGTXT program to look at continuation lines when searching for the text to be supplied by the system

(4) Hlodification to the system data dictionary to better arrange and make available information needed most frequently Also to examine the possibility of making the section dealing with the message text and report generation a separate file Investigate the industry standards for dictionary format and see how CONCOR meets the requirements set forth

(5) Make the parsing and syntacial routines interactive so as to increase the product ity of the CONCOR user This does not mean to imply that the editing process itself should be made interactive but rather only the development of the user code should be put online

12 Documentation

(1)The development of a self-teaching CONCOR manual

(2) The development of a CONCOR Users Guide

G-6

0

50

2 Survey or other Census Processing

21 Software

(1) Make optional the specification of the record type and questionnaire identification information as some files are not

hierarchical in nature

(2) Provide another input file as a means of doing a check-in procedure of sample cases expected This new input file would contain the master list of those oberservations selected to be

examined

(3) Allow floating point calculations

(4) Provide a mechanism for referencing repetitive sections of questionnaire This referencing (loop letter approach) could be accomplished in the same fashion as interrecord checking is now implemented Note that by using this approach two versions of the software would be required One version would be as CONCOR is now implemented and the second would accept flat data files where the subscript indicated a repetitive section on the survey document

(5) Add a weighting scheme to allow meaningful totals and summary

statistics based upon the sampling frame used to gather the data being edited

(6) Increase the number of user-identifiers allowed in the system because of the more detailed information found on survey forms

(7) Provide a means of manual correction of erroneous data items This correction scheme would utilize the data dictionary and allow for correction based upon the user-identifer name and

questionnaire identification CELADE has a COBOL version of

this program but it has not been finished nor tested

42

APPENDIX H

CONCOR SYSTEM INTERNAL VARIABLES

0 APPENO H

CONCOR SYSTEM INTERNAL VARIABLES

1 EOF-FLAG

2 ERROR-TN-QUESTIONNAIRE

3 CURRENT-POINTER-VALUE

4 TYPE-COUNT ( )

5 RECORD-COUNT

6 QUESTIONNAIRE-COUNT

7 RECORDS-IN-STORE

8 INVALID-RECORD-TYPE-COUNT

9 INCOMPLETE-FLAG

10 CONTINUATION-FLAG

11 INVALID-RECORD-FLAG

EOF-FLAG

ERRORS-IN-QUESTIONNAIRE

TYPE-COUNT ( )

IN-RECORD-COUNT

IN-QUESTIONNAIRE-COUNT

RECORDS-IN-STORE

INVALID-RECORD-TYPE-COUNT

INCOMPLETE-FLAG

CONTINUATION-FLAG

OUT-OF-RANGE

NOT-NUMBER

BLANK

(new internal counters) OUT-RECORD-COUNT OUTPUT-QUEST-COUNT WRITE-RECORD-COUNT

Change in spelling of variable

Not mentioned--possible ommission from manual

Not implemented

Note Current documentation equivocates when some variables

are reset--most are initialized with each control area break

Though implemented as reserved identifiers due to poor documentation of (old) CONCOR exact values and utility were unknown

These apparently new counters permit access to valuable totals within control areas

Note These are not cumulative counters Also outputshyquest-count violates naming conversion eg in out

APPENDIX I

CONCOR-EDITOR EXECUTION STATISTICS

C 0 N C o R - E D I T O R EXECUT ION S-TA-T1 ST I CS RUN DATEt 517I

APPENDIX I

CONTROL AREA 0 INPUTS RELD OUTPUTS BY OUTPUTCOHMAND OUTPUTS BY W0ITECOMP4JD NO VALI 0 SEQUENCE NUMBER 00000 00 0 RECTYPES 0

OUESTIONNAIRES RECORDS 0 QUEST KEYWORD- 0 ECORDS- RECORDS FORQUFSTshy

bullDIVISION BY ZERO LINE NUMBER i18 to DIVISION BY ZERO LINE NUMBER 118 bo DIVISION BY ZERO 00 LINE NUMBER 118

Oboo DIVISION BY ZERO 00 LINE NUMBER 118 LTNE |UJMP 000 DIVISION BY ZERO 000 LINE NUMBER 118

bo DIVISION BY ZERO 0 LINE NUMBER 118 00 RUN ABORTED DUE TO DIVISION BY ZERO ERRORS 00 1it LIST IR003 = Ro3t

1 6 42

112 LIST

Comments One of the most sever criticisms 114 LIST IDIVISIION BY ZErO ERROR Cw FOLLOWS of the old CONCOR package was that ]l LET RO01 003 = O it would cease processing for no

apparent reason--ABEND without generating any messages The most )16 LIST iRO01 R003 SET TO ZEROS Pou19 Ron3 conmon types of cessations are division by zero array coordinate 117 LIST RO0 0 = errors and user specified terminations A program to deliberately 118 LET P003 = R002 RO01 test CONCORS ability to providediagnostics was written The system 119 LIST LIST R003 R002RO01 =1 R0031 perfoned exceptionally well and correctly provided the line numbers in the 120 source program which caused the data exceptions The program text has been 121 superimposed upon the Execution Statistics page for illustration purposes 122 END TEST OF OPERANDS AND MSG

123

124

1 5 PEGISTERS RESET TO ZERO

125 LET R001 RO00 R003 = nt

0 N C 1 - Z D I T 0 it - ( r S T A T I S T I C S QJi DAT 0119

I ~ - ~APPENDIX b)APPENDIX TL I A Cr14A) 01 3TUTS JRtTF -0 VtLIQ AP-A I PUT R - 3ITD JTS LY UTPUT - Y C

I)ESTIjAAIES - PECLRW cU] I S T igry-t QEC UP )S irrkqS F] ) T

VLLII= FF= 15( C 1 u U-1 Tr JJ TI A)[ 9 )i 1= TI =WATE ) L i - U = VI L IJ) L T J3r-= I n

D 1 4A 1 EItR P IXSTI J A 1 L = 7 II I fIAT E I i Y C

= y 4 I 19f CZ U AT -Z-2j 1ESTIJh14 E CJ)PI) IATE= 2 V L 1E= L T I P1 =UM 11

= rU F r

F R L- -- I JE TI rjA OrPR eTE= rUMiF=E II V1 LIIF= LIT V M

tYfZU ttj gR R QUIESTI3Ii1 j Vi r IATE= LI n l FJw== 0R V J L T C ( -pJIT E~Q - cUESTT Y2ZjI 1= g rfiflUI)IATL= VeLUJE= P LIf PP= 1 3)

bull~ - A ED 011 1 Tn CCy -tENCE C rGR DI T FOR UP S )r 0 JCCUJ0EICF FJ71 ECGPO

173

175 UNTIL ROO1 gt 10 VARY P001 FROM 1 By Comments

In this example RO01 a 2 dimensional array 177 DO with 10 rows and 10 columns is attempting

178 UNTIL R002 - 10 VARY BO FROM 1 BY 1 to update the smaller (5 X 5) TA2 Nonshy- Uexistent element references caused the error

The program text has been superimposed on this

page for illustration purposes

180 ALLOCATE NOOP = TA1 (RO0] ROO)I

JAI UPDATE TA2 (RO01 R002) RO031

182 LIST N002 = R0037 NOO R003

1P3 LIST VALUES OF TA1 = RO03v TAl (RO01 R002)

184 LIST

185 LIST ITA2=TA2 RO01 Rn02)l

l6 END-DOI LIST

187 END-DO

deg r- -- shy

APPENJDIXI1 J( j

CONTROL AREA INPUTS READ OUTPUTS BY OUTPUT LOMANO OUTPUTS TEltCOMMNDfN

SEQUENCE NUMBER 11T A QUESTIONNAIRES RECORDS QUEST KEYWORD RECORDS SFf- 0 T rr-

JOB STOPPED DUETO USER STOP-RUN COMMAND ON LINC 215 - - -- 2 15 _ _ 4$ A

EDITOR -- NO RMAL END OF J B r T I1)

214 IF IN-RECORD-COUNT gt Comments In this example when the internal variable1215 ITHEN STOP-RUN END-THEN was than

215 TE END-THEN IN-RECORD-COUNT was greater than 500 processing was directed by the source COBOL CONCOR language

216 1 program to cease The program text has been - upon this page for illustrationI _-EC superimposed

S IF IN-RECORD-COUNT- lt 2 - - purposes ~~~~~~~~~~- bull - - ------I-

218~ THEN -LIST lt(THAN 20RECS INR

amp-~~~~~ ntai-beDcent

I

APPENDIX J

DIAGNOSTIC MEFSAGE GUIDE EXAMPLE

APPENDIX J DD-2

WARNING(DD-001) BLANK COMMAND STATEMENT LINE

EXPLANATION A blank command statement line was found in the user Dictionary Division command statement file

ACTION TAKEN Parsing continued with the next command statement line in the Dictionary Division command file

USER RESPONSE Put a comment indicator () in column 1 of the command statement line if spacing is desired if not remove the command statement line

ERROR (DD-002) ILLEGAL FIRST CHARACTER IN STRING

EXPLANATION A character not in the CONCOR legal character set (A-Z 0-9 - + the comma character () the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found as the first character in the string

ACTION TAKEN Parsing began again with the next string

USER RESPONSE Check for possible keying errors and ensure that the first character is a member of the valid CONCOR

character set

ERROR (DD-003) END OF LINE REACHED WITH NO DELIMITER FOR ILLEGAL STRING

EXPLANATION A string starting with a character not valid in the CONCOR legal character set (A-Z 0-9 - + the comma character ()p the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found without one of the delimiter characters ( =) when the end of the command line was reached

ACTION TAKEN Parsing began again with the next user command statement line in the Dictionary Division command file

USER RESPONSE Check for possible keying errors and ensure that each of the characters in the string is a member of the CONCOR valid character set (A-Z 0-9 - + the comma character () he delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ())

ERROR (DD-004) COMMENT INDICATOR () IN STRING

EXPLANATION The comment indicator character () was found embedded in a string

Page 16: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …

12

IV THE ADEQUACY OF COBOL CONCOR DOCUMENTATION

The current documentation of CONCOR consists of a newly developed relativelyvoluminous systems manual a Diagnostic Message Guide and some loose-leaf inshystructions concerning the installation of the system on IBM computers Of these three documents the Diagnostic Message Guide is clearly complete and adequateand could bear no changes except to provide message number tabs to enable a programmer to more quickly locate the text of a message number An example of the superior format of this guide appears as Appendix J

Conspicuously absent from the list of systems documents is a Users Guide as originally specified in the ISPC enhancement proposal

A thorough assessment of the Systems Manual has been undertaken and indishycates with some reorganization and editing it could be made a more useful and usable document as itis well constituted informationally In its current form the Systems Manual can be best described as a working document a document which was not intended to teach the CONCOR language but centered upon an audishyence already familiar with previous CONCOR releases Further it is not asystems manual per se as some of its components and appendixes could more appropriately compose a users guide For example excerpts from the POPSTAN publications which review the basic editing concepts and contain the POPSTAN example problem should be deleted from the systems manual and would be more contextually effective in the Users Guide The EXECUTION-DIVISION chapter of the Systems Manual would especially benefit from general editing and reorganishyzation (Specifically the contents of pages 90-92 should be moved to page 63 -shythese pages explain the placement and purpose of the three permissible routines of the CONCOR language Had the author been granted more time he could have outlined additional specific modifications to bring existing documentation to a level of adequacy) However some guidelines stand out

1 Installation instruction and considerations should compose a chapter of the Systems Manual rather than a separate document (Hand-out materials covering installation were not adequate) Further a documentation scheme setting forth how CONCOR was installed on each computer and any modifications required during installation could be serially added to and become a permanent part of the installations systems manual If CONCOR is going to be used and if agencies are going to actively support it as a package installation limitations must be known and well-documented The prescribed format for this documentation should be set forth and any modifishycations to CONCOR should be continuously updated on these shcets so that a history of changes to the language will be readily accessible to supporting agencies If CONCOR is to survive as a standardized system editing results must be capable of being replicated among installations Such a form could contain other information as well concerning the tyce of computer amount of care and nature of utilities supporting

13

users Upon installation a copy of this form could be

sent to the US agency which will ultimately be responsible

for supporting the CONCOR package

an appendix2 A complete COBOL CONCOR program should appear in

for reference

3 The development of the Users Guide should include an intensive

review of the editing concepts involved in processing census

data files beyond the POPSTAN materials

4 An explanation of the CONCOR benchmark program syould appear

in the Users Guide and the Systems Manual The running of a

supplied benchmark program should be a standard installation

protocol used to test all operational aspects of a new

installation

This tool which has5 The development of a CONCOR pocket card been proven useful in teaching and assisting programmers in

utilizing programming language lays out all commands options on

a single small card An example of such a pocket card is the

Such a small document is useful in clari-IBM Assembler Card fying the language and setting forth structural rules without

continual reference to full-size manuals

14

V CONCLUSION

In a situation of extreme need CONCOR could probably be utilized as it seems capable of performing its editing functiens Its usefulness as a data

cleaning tool is generally undisputed The importance of having CONCOR exshyhaustively tested by an independent agency cannot be over-emphasized in light of prior failures of the system to perform adequately Additionally core requirements and processing speed should be precisely determined One must be encouraged in the amount of progress which has been made in a relatively short amount of time towards the completion of the CONCOR package While it is clearly beyond the scope of this report to estimate the cost and the amount of time required to implement the herein outlined highly desirable changes to the CONCOR language and documentation and then exhaustively test the system it is believed that this process could be completed within a single quarter (The changes are of the nature that no structural redefinishytions of a systematic nature would be required) These enhancements would make the COBOL CONCOR package more complete consistent and reliable in the field ISPC has competently moved the system to the point that its comshypletion iswithin reach

Given the probable revision of the current Systems Reference Manual and the development of a suitable Users Guide document it is likely that this version of CONCOR could be taught to experienced programmers (unfamiliar with previous versions of the language) in as little as four weeks And while one is impressed by the potential power and overall simplicity of the COBOL CONCOR language it is difficult to envision individuals with no real computer programming experience utilizing its full capabilities Hence the importance of including a review of basic editing concepts in the documentation is undershylined

Further as documentation is often critical to the perceived ease of use of computer-based systems itwill be essential to have these various manuals examined independently prior to their general use

As previously mentioned Appendix G represents features of which the ISPC feels could have positive impact on CONCOR Depending upon the outcome of the testing recommended in previous chapters and the availability of funding supporting agencies may desire to look again at an assembler (ALC) version of the language

Overall COBOL CONCOR remains an intermediate product yet to realize its potential which is thought considerable Software development of this nature is characteristically of a long-term evolutionary design In light of this reality agencies should be prepared to evaluate such projects success or failure in terms other than that of immediate utility It is the philosophy underlying the system not the system itself which will ultimately be exported

APPENDIX A

Bucen Enforcement Proposal

APPENDIX A

BUCEN IN-HOUSE REVIEW ENHANCEMENT PROPOSAL

1 Easy to use interrecord referencing

2 Improved output file capabilities

A provide overflow protection on WRITE command

B provide OUTPUT command to create corrected questionnaire file that matches IP file data dictionary

3 Improvededit statistics reported (LISTERR)

A provide automatic (user-specified) area break

B provide options for compilation and displaying edit statistics at various levels

C provide automatic (user-specified) tolerance checking of error rates by area

D automatically capture IDs of areas failing tolerance check

4 Clean up known bugs in code

5 Comprehensive testing

6 Clean up and enhance documentation

A reference manual more examples error message guide

B installation guide

C systems manual

APPENDIX B

EVALUATIVE CRITERIA

APPEiUIX B [4 si 11 T q tII

UNITED STATES GOVERNMENshy

Pfern0 KGa 7 dll TO DSPOPFPSD Robert H Haladay

DATE December 3 1979

DSPOPDEIO Liliane Floge

SUBJECT Criteria for Evaluation of SuCen COBOL CONCOR September 30thVersion as presented at January 1930 orkshop

The following points should be considered when evaluatingCONCOR the COBOLediting system during participation at the January 19380 workshyshop

1 Are the Installation anual the System ilanual and theUsers anual ready for use Are these manuals adequate fortheir purposes Are the manuals clear and consistentthey be understood by subject-matter personnel Can

as well program ers

as

2 Is the editing system capable of impleaenting all thefeatures as described in the documenit_tion 3 Does the system appc3Ar to be ely debugged andtested and ready for release to dLvlooing conris 4 Does the system appear to be fast enough to edit a

census in a reasonable airount of time 5 hat size core does tine sys t- equire

6 CorFm-ent in general on coole te-nss usefulness andefficiency of ir-nease usethe n of by developingcountry prograrrers and other c nsus personnel

cc Su-a-- e Olds APiA SDi OrWJulio Ortuizar luotolaCelta Systems

APPENDIX C

WORKSHOP ITINERARY

APPENDIX C

CONCOR Workshop Schedule January 7-18 1980

U S Bureau of the Census International Statistical Programs Center

Scuderi Building Room 209 4235 28th Avenue Marlow Heights Maryland

Monday January 7

930 am shy 1000 Welcoming Remarks

Overview of Workshop

1000 - 1000 Introduction to CONCOR - Purpose and function

- History of development - General computer

requirements

115 - 200 System Description

-Constraints in design of CONCOR

-Basic subsystems of CONCOR

-User interactions with system

-Examples of outputs produced

1030 shy 1045 Break

1045 - 1200 Editing Concepts - Ways to interrogate

data - Ways to correct data

- Editing housing and

population data - POPSTAN

- Advantages of CONCOR

1200 shy 115 pm Break

200 - 230

230 - 245

245 - 325

User Program Organization -Divisions - Sections

- Routines - Commands

Break

Command Language

Description -Types of statements -Format -Syntax

is

2

Tuesday January 8

Dictionary Division Command Statements

115 pm-2 15 Input-Record-Section930 am-10

30 Punctuation - Define-RecordInput data referencing

Working-Data-SectionWorking data declara-- New-Datations - Array-DataInternal data represen-

tation and storage Break215 - 230

1030 - 1045 Break 230 - 325 Dictionary Examples

- Minimum dictionary1045 - 1200 Dictionary-Attributes-

structureSection - Maximum dictionary- Dictionary-Name

structure

- Hand out dictionaryFile-Section problem- Input-File

- Output-File - Write-File

- Error-File

1200 - 115 pm Break

Wednesday January 9

115 pm-2 1 5 Execution Division Command

930 am-1030 Discussion of Partici-Statements (continued)pants Dictionary

- Routines of Edit-Specishyproblems fications-Section

Free work time -Prolog-Routine -Filter-Routine -Epilog-Routine

- Types and functions 1030 - 1045 Break

of edit specification

commands1045 - 1200 Execution Division

Command Statements - Range- Punctuation - Assert- Subscripting

- Internal Identifiers - Report-Control-Section

-Generate-Edit-Statistics 215 - 230 Break-Count-Imputes

-Examples 230 - 325 - PassFail clauses

- List1200 - 115 pm Break

3

Thursday January 10

930 am-1030 Discussion of Problems - If115 pm-215

Free work time - UntilExit - Stop

1030 - 1045 Break 215 - 230 Break

1045 shy 1200 Edit Specification Command Statements 230 - 325 - Drecode

(continued) - Grecode

- Allocate - Update - Let

1200 - 115pm Break

Friday January 11

930 am-1030 Edit Specification 115 pm-3 25 Hand out Edit Specishyfications as problemCommand Statements

(continued) Free work time

- Output - Write

1030 - 1045 Break

1045 - 1200 Report Division Command Statements - Display-Control-

Section -Display-Edit-Statistics

- Tolerance-Control-Section -Error-Rate-Check -Reject-File

-Report Examples

1200 - 115 pm Break

I

29

Monday January 14

930 am-1030 Discuss procedures for running problems on computer

1030-1045 Break

1045-1200 Component Programs of the CONCOR system

1200- 115 pm Break

Tuesday January 15

930 am - 325 pm Free work time

Wednesday January 16

930 am 1200 Free work time

1200- 115 pm Break

115 pm-215 How to Install CONCOR on IBM 360370 OS

215- 230 Break

230-325 Free work time

Thursday January 17

930 am-325 Free work time

115 pm-215 Future CONCOR Design Considerations - for census processing - for survey processing

- manual correction system

215- 230 Break

230 - 245 Evaluation Guidelines

- Hand out evaluation forms

245 - 325 Free work time

Friday January 18

930 am-1030 Free work time 115-325 Free work time

1030 - 1045 Break

1045 - 1200 Group Discussion on CONCOR - submission of written evaluations Distribution of IBM installation tapes Presentation of Certificates to Participants

1200-115 pm Break

APPENDIX D

PARTICIPANTS

APPENDIX D

CONCOR WORKSHOP January 7 - 18 1980 Marlow Heights MD

PARTICIPANTS

KENYA

James Midianga Government Computer Centre Central Bureau of Statistics PO Box 30266 Nairobi Kenya

PANAMA

Omar Rivera Rios Data Processing Specialist Contraloria General de la Republica de Panama Apartado Postal 5213 Panama 5 Panama

PHILIPPINES Guida T Capellan OIC Computer Programming Division National Census and Statistics Office Solicarel Bldg 1 Magsaysay Blvd

Sta Mesa Manila Philippines

THAILAND

Angsumal Sunalai National Statistical Office Bangkok 1 Thailand

EGYPT

Farag Sedky Mourad Ghaleb CAPMAS (Centeral Agency for Public

Mobilization and Statistics)NASR City Cairo Egypt

L

2 31

SAUDI ARABIA

Shargi R Al-Shargi National Computer Center PO Box 2534 Riyadh Saudi Arabia

Ahmed S Al-Ghamdi Dept Director of National Computer Center PO Box 2534 Riyadh Saudi Arabia

OTHER

Robert W ONeal USREPJECOR APO New York NY 09038

John N Adams Data Processing Advisor DACCA Department of State Washington DC 20520

OR co UNDP PO Box 224 Dacca Bangladesh

Joe Quasney US Census Bureau Washington DC 20233

Howard Brunsman 5715 N Ninth St Arlington VA 22205

Larry Shiller and Julio Ortuzar Delta Systems Consultants Inc 264 Alhambra Circle Coral Cables FL 33134

Nicholas Ourusoff Burpee Hill Road New London NH 03257 (United Nations)

Frederic J Grant IV Systems Development GeorgiaWorld Congress Institute 580 North Omni International Atlanta Georgia 30303

3 32

Rafael Samper Mary Jo Keenan Catherine B Gleason LACDR Room 2246 NS Washington DC 20523

John Marshall AIDSERDMPSE Rm 706C SA-12 Washington DC 20523

Susanne Bacon and Mary K Friday ISPCData Processing US Census Bureau Washington DC 20233

Leo Dougherty ISPCWorld Census Staff US Bureau of the Census Washington DC 20233

33

CON C OR WORKSHOP

January 7-18 1979 Washington DC

Staff

Robert R Bair

Luis Garcia

David Malkovsky

Sandra Mansfield

Selma Sawaya

Vivian Toro

APPENDIX E

CONCOR EVALUATION FORM

0

APPENDIX E

CONCOR WORKSHOP January 16 1980

BASIC COBOL VERSION 2

December 1979 Release

Participant Evaluation of Package

1 What is your impression of this version of CONCOR in terms of its usefulness and reliability for editing housing and population census data in less developed countries

2 If you are familiar with previous versions of CONCOR how does this current version compare to them

2D

35

2

3 How would you compare the utility of this version of CONCOR for less developed countries with other systems for editing data of which you are familiar

4 How well do you feel the documentation (Reference Manual and Diagnostic Messages Guide) supports the software

36

3

5 How well do you think this version of CONCOR would serve for editing other types of statistical censuses or surveys

6 If your organization does statistical data processing by computer would you recommend to your superiors that this software be used to edit the data Do you feel assistance would be required in the installation of this software on your organizations computer andor in training userR on the packages applicability to their work

37

7 In what way(s) could improvements or enhancements be made to this version

of the CONCOR software andor its documentation

8 Other comments

1shy

APPENDIX F

NEWOLD COMMAND COMPARISONS

0 0

OLD CONCOR VERSION 1 DECEMBER 1978

DATA-DICTIONARY

DEFINE INPUT-FILE

DEFINE ERROR-FILE

DEFINE COMMON-DATA

(contains the questionnaire-IDand record type location information)

DEFINE RECORD-TYPE=

DEFINE NEW-DATA

DEFINE ARRAY-DATA

APPENDIX F

NEW CONCOR VERSION 2 DECEMBER 1979

DICTIONARY-DIVISION

DICTIONARY-ATTRIBUTES-SECTION

DICTIONARY-NAME

FILE-SECTION

INPUT-FILE OUTPUT-FILE WRITE-FILE ERROR-FILE

DOENTIFICATION-CONTROL-SECTION

AREA-CONTROL

QUESTIONNAIRE-CONTROL

RECORD-CONTROL

INPUT-RECORD-SECTION

DEFINE-RECORD

WORKING-DATA-SECTION

NEW-DATA ARRAY-DATA

END-DiVISION

COMMENTS

In Version 1 all command keyword labels had to begin in column 1 ofthe coding sheet The position notation of Version 2 permits thecommand to begin in columns 1-72

The period preceding a statement

signifies that it is but a commentcard Accordingly it should be noted that while the END-DIVISION command must be present the division name identifier DICTIONARY-DIVISION

has not been implemented

Note Some parameters for each commandhave been altered even where general correspondence exists

OLD CONCOR VERSION 1 DECEMBER 1978

EDIT-SPECIFICATION

PROLOG

FILTER TYPE ( )

EPILOG

ALLOCATE (ALL) ASSERT (AST)

END ERROR

FILTER TYPE ( ) WITH

IF LET

RANGE (RNG) RECODE (REC) STOP --keyword

UPDATE (UPD) WRITE (WRT) XRECODE (XREC)

NEW CONCOR VERSION 2 DECEMBER 1979

EXECUTION-DIVISION

REPORT-CONTROL-SECTION

COUNT-IMPUTES GENERATE-EDIT-STATISTICS

EDIT-SPECIFICATION-SECTION

PROLOG-ROUTINE

FILTER-ROUTINE (name-of-record-type)

EPILOG-ROUTINE

ALLOCATE (ALLOC) ASSERT (ASRT) DRECODE (DRCD) END-DIVISION

EXIT

GRECODE (GRCD) IF LET LIST OUTPUT RANGE (RNG)

STOP UNTIL UPDATE (UPD) WRITE (WRT)

END-DIVISION

COMMENTS

Periods preceding division and section names signify that they are treated as comments by CONCOR

END-DIVISION signifies the termination of the CONCOR division organization and must always be present even though division identifiers have not been implemented

This figure additionally permits a comparison of the EDIT-SPECI-FICATION commands of both CONCOR language versions Note the changes in abbreviations and that some keyword options (not shown) have been altered even where general corresponshydence exists Individual commands are discussed on the following pages

CONCOR LANGUAGE COMMAND STATEMENTS

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

ALLOCATE (ALL) ALLOCATE (ALLOC) Allows for the assignment of values and with the UPDATE command providesthe imputation mechanism Major difference is this statement now providesfor a receiving-identifier list and statistics are generated as coded inthe GENERATE-EDIT-STATISTICS command option of the EXECUTION-DIVISION

ASSERT fAST) ASSERT (ASRT) The ASSERT command is the basic consistency editor command of CONCOR Command now provides for a NOERROR and MESSAGE option Additionallyby utilizinq the DUMPR DUMPO keywords provides the user with the option of using all the current value of all user-identifiers defined for the record type being processed Other changes to this command include the pass end-pass fail end-fail constructions for the P purpose of maintaining structured logic

(see RECODE) DRECODE (DRCD) Provides a method for recoding data items having a starting value of zero and continue in ascending sequence Replaces t previous XRECODE command

EXIT Provides the means to terminate EXECUTION (ESCAPE) of command statements within the UNTIL command

END END-IF END-THEN

A solitary END command is no lonqer permitted in conditional construction Note the similarity to COBOL pseudocode

END-ELSE

ERROR not implemented No longer supported in language

(continued)

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

FILTER TYPE ( ) WITH not implemented No longer are option statements supported in language This command once specified essentially a GOTO depending on construction

(see XRECODE) GRECODE (GRCD) This new command provides a method for the group recoding of input values

IF IF The IF statement has properties found virtually in all programming languages ie provides a means of evaluating a condition and controlling the execution of alternative logical statements Similar to pseudocode a CONCOR programmer must utilize the structured command IF THEN END-THEN ELSE END-ELSE construction in place of the IF THEN END ELSE END statements acceptable to the 1978 version

LET LET Virtually unchanged

LIST New command which allows for the generation of specific messages to be displayed in the report of edit statistics by questionnaire as well as specific individual identifiers

OUTPUT New command which provides the means by which records will be written to the output-file

(continuej)

4)bull

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 VERSION 1

RANGE (RNG)

RECODE (REC)

STOP- keyword

UPDATE (UPD)

(continued)

DECEMBER 1979 VERSION 2

RANGE (RNG)

name changed

STOP- keyword

UNTIL DO END-DO

UPDATE (UPD)

COMMENTS

As a basic editing command of CONCOR the December 1979 version permits the listing of the out-of-range values as well as the entire record or questionnaire through DUMPR and DUMPO optionswithin the command Another new option of this command involves the utilization of a PASS END-PASS FAIL END-FAIL construction permitting the specification of logicalcommand paths depending upon the outcome of the range test

Name changed to DRECODE Virtually identical with the DRC command in the COCENTS system

Unchanged

DO-END-DO one of the most significant and powerful enhancements to the CONCOR language this command provides a loopingcapability similar to most other high-level pogramminglanguages The number of iterations is under user control as well as the value initially assigned to the user identifier (counter) which is incremented with each successive execution of the command statements

Virtually unchanged it is the basic mechanism for maintaining arrays involved in imputation processes

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

WRITE (WRT) WRITE (WRT) The WRITE language statement while once utilized as the primary output command is now used to create an auxiliary or derivative file Statistics may be accumulated with the specification of the GENERATE-EDIT-STATISTICS option of the EXECUTION-DIVISION (See WRITE-FILE of DICTIONARY-DIVISION)

XRECODE (XREC) name changed Old command which has become the GRECODE command in the December 1979 version

0 0 0

DECEMBER 1978 VERSION 1

DECEMBER 1979 VERSION 2

Not implemented REPORT-DIViSION

DISPLAY-CONTROL-SECTION

DISPLAY-EDIT-STATISTICS

TOLERANCE-CONTROL-SECTION

ERROR-RATE-CHECK REJECT-FILE

END-DIVISION

COMMENTS

This new division (note division and section names treated as comments) provides the means to controlthe type of edit statistics reports to be generated Generally all reports produced by the commands ofthis division are organized according to the AREA-CONTROL command of the DICTIONARY-DIVISION Notethat this means the input data file must be preproshycessed (presorted external to CONCOR) on the basisof the AREA-CONTROL data field as CONCOR is incapableof merging uncontiguous records

There is currently no command to enable users to specify unique report headings on the listingfrom this division

APPENDIX G

ISPC FUTURE ENHANCEMENT LIST

G-1

Appendix G

Future Versions of CONCOR Design Considerations

The present version of the CONCOR system (Basic COBOL Version 2) was designed to provide adequate computing power to properly edit housing and population census da a using automatic correction techniquns As a result CONCOR should not be construed to be the ultimate software package for generalized data editing The developers of this version of CONCOR realize that there is a group of improvements that can be made to the system which would facilitate the census data editing process A second group of changes or enhancements also exist that could facilitate the editing of survey or other types of census data It should be noted that some of the modifications to meet these two objectives at times are mutually exclusive Some of the possible modifications and considerations are outlined below

1 Housing and Population Census Processing

1 1 Software

1111 Dictionary Division

(1) Implementation of the Dictionary Attributes Section This would allow the user to specify the dictionary size (number of pages) on disk and control other file characteristics such as volume specification and retention periods if applicable

(2) Addition of an optional input andor output file(s) which could be used to initialize and save values stored in arrays and used during the allocation process This feature would require additional syntax analysis and the creation of new commands to perform the movement of the data values

(3) An output record format area (Output Record Section) could be added This would allow the creation of edited output files in a different format and length than the file read into CONCOR This implementation would require other enhanceients to provide a simple means of moving values from th-iT i tato tile output file Unique naming of user-identifiers would also need to be addressed Another possiblity is the declaration of both the input and output location in the same command when specifying the contents of tile data record to be edited

(4) Addition of user-identifier naires to fields used in the controlling of summary statistics and unique questionnaire determination This would require modification to the system level data dictionary

46 G-2

(5) Increasing the number of area and questionnaire control fields

This implementation would require modification lo the

dictionary as tell as to the error records passed to the Report

Division Another problem would be the formatting of the extra

data values in the report headings

(6) Permit the mixing of difFerent data item types in the CONTROL

commands This would allow the greatest flexibilty in choosing

control fields In order to implement this enhancement the

dictionary and the generated COBOL program (EDITOR) would

require modification

(7) Give the user access to the values in the control fields during

the editing process Either the user would be prohibited from

changing these values (as is currently done with CONCOR

internal identifiers) to prevent the creation of new questionnaires or the summary information gathered on a

questionnaire basis would become meaningless Another

ramification of this enhancement is how to handle fields defined as alphanumeric as CONCOR internally works in binary

(8) Allow input data records with a similar format to be defined by

the same DEFINE-RECORD command statement This could be

accomplished by permitting multiple values in the RECORD-TYPE

clause of the DEFINE-RECORD command A possible problem here

is in the determination of which value to be move to the output

record when the record is to be written out using the OUTPUT

command 0

(9) Implement a COMMION-DATA concept to handle items command to all

record types This could be costly in terms of core storage

and additional conversion time if a storage area is allocated

for each identifier for each record type Only permitting one

common area for all records would require validation of the

data to ensure the values were exactly the same on each record

This wJould require the CONCOR program to perform many more

internal checks when reading the data file into the store area

(10) Permit the selective inclusion of data items found in the

DEFINE-RECORD command statement into the universe count used

for the determination of tolerance levels when using the COUNT-IIPUTES command statement

(11) Increase the number and types of data format referencing now

permitted External numeric data (type N) could be expanded to

handle 18 digit numbers Also signed numeric and packed

decimal data format could be added The signed numeric format

would allow both leading blanks and negative data values This

format is essential ihen dealing with data files generated by

FORTRAN programs

0

+

G-3 47

(12) Increase the number of comparison strings permitted for alphanumeric data items This would allow for easier processing of alphanumeric strings but would require modifications to the system data dictionary

(13) If a range of valid values were specified with the declaration of a user-identifier in the DEFINE-RECORD command an automatic range check could be performed prior to CONCOR giving the user access to the data items on the questionnaire The inclusion of such values are now done by many users in the form of comment statements This enhancement would also facilitate the use of the dictionary by tabulation software

(14) The language format used in the Dictionary Division could be modified to recognize both the space and comma character as valid delimiters The user would still be required to code two commas to receive the system default value for any parameter

(15) Provide a repetition factor in the initialization of arrays

(16) Print the data dictionary name in the titles of the dictionary documentation or provide a means by which the user may specify a title to appear in the listings

112 Execution Division

(M) Implement the Run Control Section This would allow the user to increase execution speed of the EDITOR program This would be accomplished by eliminating some of the system protection features such as division by zero and out of bounds checking In this secion the user could also specify new maximum values for the EDITOR error limits

(2) Provide an internal identifier (OCCURRENCE-PTR) which would contain the occurrence number of the record currently being filtered The only problem to resolve is how to treat this id-ntifier outside of the Filter Routines

(3) Implement a CREATE commnand This command would allow the user to create a record that Was not found on the input file This implementation would require special care as it gives the user the ability to change the valme of a CONCOR internal identifier (TYPE-CGUNT) and modify the contents of the store Another consequence of this command is what action is to be taken in the calculation of tolerance in terms of this new records inclusion into the universe of observations and number of changes

iK

48 G-4

(4) The code generated in the EDITOR program for the DRECODE

command should be modified to utilized a lookup table approach

This would decrease the execution time of the command but woul C

require additional communication between the GENED and GENDD

programs

(5) Permit the user to continue the message text field over

successive lines of code

(6) Implementation of a SET command command that would change

CONCORs default occurrence pointer for the duration of the SET

command This command would be ever helpful as the validation

of the existence of a record would only have to be done once

when the SET command was encounterd The major drawback to

this command is in the users ability to remember the action

taken during the last execution of the command when the

Execution Division command statements may cover many pages of

code

(7) Implement a method of specifying different data formats for

fields on the WRITE command statement This would give greater

flexibility to the user but would require major revision to the

routines now used to process the WRITE command

(8) Provide the user with two sets of internal identifiers The

first set currently exists in the system and is valid on the

basis of the current control area (if specified) being

executed The second set would be accumulated on a run total

basis and would require the creation of new CONCOR internal

ident ifiers

(9) Implement automatic array declarations for capturing allocation

frequency distributions This could be done by the system for

discrete values (especially if point number 13 above is

implemented) but a mechanism for handling continuous values

would need to be designed nother program would then be

required to format the information gathered into a readable

form

113 Report Division

(I) Provide a means by which the user may specify their own headings or titles -for the reports

(2) Give the user the ability to override page ejection as a means

of saving paper

(3) Provide a neans by which the statistics gathered may be

presented in other aqgregations other than the area-break and

total levels now pro ided This would require another

procedure to aggregate the statistics and pass them onto the

Report Division before the printing of the reports began Thi

extra procedure is required because of program size

cons i derat ions

G-5

49

114 System Level

(1) Generation of assembler (ALC) language code for the EDITOR program to be used on the IBM 360370 series type computers

(2) Modification of the GENSRC program i utilize a lookup table instead of the linear search technioa2 now used

(3) Modification of the RDMSGTXT program to look at continuation lines when searching for the text to be supplied by the system

(4) Hlodification to the system data dictionary to better arrange and make available information needed most frequently Also to examine the possibility of making the section dealing with the message text and report generation a separate file Investigate the industry standards for dictionary format and see how CONCOR meets the requirements set forth

(5) Make the parsing and syntacial routines interactive so as to increase the product ity of the CONCOR user This does not mean to imply that the editing process itself should be made interactive but rather only the development of the user code should be put online

12 Documentation

(1)The development of a self-teaching CONCOR manual

(2) The development of a CONCOR Users Guide

G-6

0

50

2 Survey or other Census Processing

21 Software

(1) Make optional the specification of the record type and questionnaire identification information as some files are not

hierarchical in nature

(2) Provide another input file as a means of doing a check-in procedure of sample cases expected This new input file would contain the master list of those oberservations selected to be

examined

(3) Allow floating point calculations

(4) Provide a mechanism for referencing repetitive sections of questionnaire This referencing (loop letter approach) could be accomplished in the same fashion as interrecord checking is now implemented Note that by using this approach two versions of the software would be required One version would be as CONCOR is now implemented and the second would accept flat data files where the subscript indicated a repetitive section on the survey document

(5) Add a weighting scheme to allow meaningful totals and summary

statistics based upon the sampling frame used to gather the data being edited

(6) Increase the number of user-identifiers allowed in the system because of the more detailed information found on survey forms

(7) Provide a means of manual correction of erroneous data items This correction scheme would utilize the data dictionary and allow for correction based upon the user-identifer name and

questionnaire identification CELADE has a COBOL version of

this program but it has not been finished nor tested

42

APPENDIX H

CONCOR SYSTEM INTERNAL VARIABLES

0 APPENO H

CONCOR SYSTEM INTERNAL VARIABLES

1 EOF-FLAG

2 ERROR-TN-QUESTIONNAIRE

3 CURRENT-POINTER-VALUE

4 TYPE-COUNT ( )

5 RECORD-COUNT

6 QUESTIONNAIRE-COUNT

7 RECORDS-IN-STORE

8 INVALID-RECORD-TYPE-COUNT

9 INCOMPLETE-FLAG

10 CONTINUATION-FLAG

11 INVALID-RECORD-FLAG

EOF-FLAG

ERRORS-IN-QUESTIONNAIRE

TYPE-COUNT ( )

IN-RECORD-COUNT

IN-QUESTIONNAIRE-COUNT

RECORDS-IN-STORE

INVALID-RECORD-TYPE-COUNT

INCOMPLETE-FLAG

CONTINUATION-FLAG

OUT-OF-RANGE

NOT-NUMBER

BLANK

(new internal counters) OUT-RECORD-COUNT OUTPUT-QUEST-COUNT WRITE-RECORD-COUNT

Change in spelling of variable

Not mentioned--possible ommission from manual

Not implemented

Note Current documentation equivocates when some variables

are reset--most are initialized with each control area break

Though implemented as reserved identifiers due to poor documentation of (old) CONCOR exact values and utility were unknown

These apparently new counters permit access to valuable totals within control areas

Note These are not cumulative counters Also outputshyquest-count violates naming conversion eg in out

APPENDIX I

CONCOR-EDITOR EXECUTION STATISTICS

C 0 N C o R - E D I T O R EXECUT ION S-TA-T1 ST I CS RUN DATEt 517I

APPENDIX I

CONTROL AREA 0 INPUTS RELD OUTPUTS BY OUTPUTCOHMAND OUTPUTS BY W0ITECOMP4JD NO VALI 0 SEQUENCE NUMBER 00000 00 0 RECTYPES 0

OUESTIONNAIRES RECORDS 0 QUEST KEYWORD- 0 ECORDS- RECORDS FORQUFSTshy

bullDIVISION BY ZERO LINE NUMBER i18 to DIVISION BY ZERO LINE NUMBER 118 bo DIVISION BY ZERO 00 LINE NUMBER 118

Oboo DIVISION BY ZERO 00 LINE NUMBER 118 LTNE |UJMP 000 DIVISION BY ZERO 000 LINE NUMBER 118

bo DIVISION BY ZERO 0 LINE NUMBER 118 00 RUN ABORTED DUE TO DIVISION BY ZERO ERRORS 00 1it LIST IR003 = Ro3t

1 6 42

112 LIST

Comments One of the most sever criticisms 114 LIST IDIVISIION BY ZErO ERROR Cw FOLLOWS of the old CONCOR package was that ]l LET RO01 003 = O it would cease processing for no

apparent reason--ABEND without generating any messages The most )16 LIST iRO01 R003 SET TO ZEROS Pou19 Ron3 conmon types of cessations are division by zero array coordinate 117 LIST RO0 0 = errors and user specified terminations A program to deliberately 118 LET P003 = R002 RO01 test CONCORS ability to providediagnostics was written The system 119 LIST LIST R003 R002RO01 =1 R0031 perfoned exceptionally well and correctly provided the line numbers in the 120 source program which caused the data exceptions The program text has been 121 superimposed upon the Execution Statistics page for illustration purposes 122 END TEST OF OPERANDS AND MSG

123

124

1 5 PEGISTERS RESET TO ZERO

125 LET R001 RO00 R003 = nt

0 N C 1 - Z D I T 0 it - ( r S T A T I S T I C S QJi DAT 0119

I ~ - ~APPENDIX b)APPENDIX TL I A Cr14A) 01 3TUTS JRtTF -0 VtLIQ AP-A I PUT R - 3ITD JTS LY UTPUT - Y C

I)ESTIjAAIES - PECLRW cU] I S T igry-t QEC UP )S irrkqS F] ) T

VLLII= FF= 15( C 1 u U-1 Tr JJ TI A)[ 9 )i 1= TI =WATE ) L i - U = VI L IJ) L T J3r-= I n

D 1 4A 1 EItR P IXSTI J A 1 L = 7 II I fIAT E I i Y C

= y 4 I 19f CZ U AT -Z-2j 1ESTIJh14 E CJ)PI) IATE= 2 V L 1E= L T I P1 =UM 11

= rU F r

F R L- -- I JE TI rjA OrPR eTE= rUMiF=E II V1 LIIF= LIT V M

tYfZU ttj gR R QUIESTI3Ii1 j Vi r IATE= LI n l FJw== 0R V J L T C ( -pJIT E~Q - cUESTT Y2ZjI 1= g rfiflUI)IATL= VeLUJE= P LIf PP= 1 3)

bull~ - A ED 011 1 Tn CCy -tENCE C rGR DI T FOR UP S )r 0 JCCUJ0EICF FJ71 ECGPO

173

175 UNTIL ROO1 gt 10 VARY P001 FROM 1 By Comments

In this example RO01 a 2 dimensional array 177 DO with 10 rows and 10 columns is attempting

178 UNTIL R002 - 10 VARY BO FROM 1 BY 1 to update the smaller (5 X 5) TA2 Nonshy- Uexistent element references caused the error

The program text has been superimposed on this

page for illustration purposes

180 ALLOCATE NOOP = TA1 (RO0] ROO)I

JAI UPDATE TA2 (RO01 R002) RO031

182 LIST N002 = R0037 NOO R003

1P3 LIST VALUES OF TA1 = RO03v TAl (RO01 R002)

184 LIST

185 LIST ITA2=TA2 RO01 Rn02)l

l6 END-DOI LIST

187 END-DO

deg r- -- shy

APPENJDIXI1 J( j

CONTROL AREA INPUTS READ OUTPUTS BY OUTPUT LOMANO OUTPUTS TEltCOMMNDfN

SEQUENCE NUMBER 11T A QUESTIONNAIRES RECORDS QUEST KEYWORD RECORDS SFf- 0 T rr-

JOB STOPPED DUETO USER STOP-RUN COMMAND ON LINC 215 - - -- 2 15 _ _ 4$ A

EDITOR -- NO RMAL END OF J B r T I1)

214 IF IN-RECORD-COUNT gt Comments In this example when the internal variable1215 ITHEN STOP-RUN END-THEN was than

215 TE END-THEN IN-RECORD-COUNT was greater than 500 processing was directed by the source COBOL CONCOR language

216 1 program to cease The program text has been - upon this page for illustrationI _-EC superimposed

S IF IN-RECORD-COUNT- lt 2 - - purposes ~~~~~~~~~~- bull - - ------I-

218~ THEN -LIST lt(THAN 20RECS INR

amp-~~~~~ ntai-beDcent

I

APPENDIX J

DIAGNOSTIC MEFSAGE GUIDE EXAMPLE

APPENDIX J DD-2

WARNING(DD-001) BLANK COMMAND STATEMENT LINE

EXPLANATION A blank command statement line was found in the user Dictionary Division command statement file

ACTION TAKEN Parsing continued with the next command statement line in the Dictionary Division command file

USER RESPONSE Put a comment indicator () in column 1 of the command statement line if spacing is desired if not remove the command statement line

ERROR (DD-002) ILLEGAL FIRST CHARACTER IN STRING

EXPLANATION A character not in the CONCOR legal character set (A-Z 0-9 - + the comma character () the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found as the first character in the string

ACTION TAKEN Parsing began again with the next string

USER RESPONSE Check for possible keying errors and ensure that the first character is a member of the valid CONCOR

character set

ERROR (DD-003) END OF LINE REACHED WITH NO DELIMITER FOR ILLEGAL STRING

EXPLANATION A string starting with a character not valid in the CONCOR legal character set (A-Z 0-9 - + the comma character ()p the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found without one of the delimiter characters ( =) when the end of the command line was reached

ACTION TAKEN Parsing began again with the next user command statement line in the Dictionary Division command file

USER RESPONSE Check for possible keying errors and ensure that each of the characters in the string is a member of the CONCOR valid character set (A-Z 0-9 - + the comma character () he delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ())

ERROR (DD-004) COMMENT INDICATOR () IN STRING

EXPLANATION The comment indicator character () was found embedded in a string

Page 17: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …

13

users Upon installation a copy of this form could be

sent to the US agency which will ultimately be responsible

for supporting the CONCOR package

an appendix2 A complete COBOL CONCOR program should appear in

for reference

3 The development of the Users Guide should include an intensive

review of the editing concepts involved in processing census

data files beyond the POPSTAN materials

4 An explanation of the CONCOR benchmark program syould appear

in the Users Guide and the Systems Manual The running of a

supplied benchmark program should be a standard installation

protocol used to test all operational aspects of a new

installation

This tool which has5 The development of a CONCOR pocket card been proven useful in teaching and assisting programmers in

utilizing programming language lays out all commands options on

a single small card An example of such a pocket card is the

Such a small document is useful in clari-IBM Assembler Card fying the language and setting forth structural rules without

continual reference to full-size manuals

14

V CONCLUSION

In a situation of extreme need CONCOR could probably be utilized as it seems capable of performing its editing functiens Its usefulness as a data

cleaning tool is generally undisputed The importance of having CONCOR exshyhaustively tested by an independent agency cannot be over-emphasized in light of prior failures of the system to perform adequately Additionally core requirements and processing speed should be precisely determined One must be encouraged in the amount of progress which has been made in a relatively short amount of time towards the completion of the CONCOR package While it is clearly beyond the scope of this report to estimate the cost and the amount of time required to implement the herein outlined highly desirable changes to the CONCOR language and documentation and then exhaustively test the system it is believed that this process could be completed within a single quarter (The changes are of the nature that no structural redefinishytions of a systematic nature would be required) These enhancements would make the COBOL CONCOR package more complete consistent and reliable in the field ISPC has competently moved the system to the point that its comshypletion iswithin reach

Given the probable revision of the current Systems Reference Manual and the development of a suitable Users Guide document it is likely that this version of CONCOR could be taught to experienced programmers (unfamiliar with previous versions of the language) in as little as four weeks And while one is impressed by the potential power and overall simplicity of the COBOL CONCOR language it is difficult to envision individuals with no real computer programming experience utilizing its full capabilities Hence the importance of including a review of basic editing concepts in the documentation is undershylined

Further as documentation is often critical to the perceived ease of use of computer-based systems itwill be essential to have these various manuals examined independently prior to their general use

As previously mentioned Appendix G represents features of which the ISPC feels could have positive impact on CONCOR Depending upon the outcome of the testing recommended in previous chapters and the availability of funding supporting agencies may desire to look again at an assembler (ALC) version of the language

Overall COBOL CONCOR remains an intermediate product yet to realize its potential which is thought considerable Software development of this nature is characteristically of a long-term evolutionary design In light of this reality agencies should be prepared to evaluate such projects success or failure in terms other than that of immediate utility It is the philosophy underlying the system not the system itself which will ultimately be exported

APPENDIX A

Bucen Enforcement Proposal

APPENDIX A

BUCEN IN-HOUSE REVIEW ENHANCEMENT PROPOSAL

1 Easy to use interrecord referencing

2 Improved output file capabilities

A provide overflow protection on WRITE command

B provide OUTPUT command to create corrected questionnaire file that matches IP file data dictionary

3 Improvededit statistics reported (LISTERR)

A provide automatic (user-specified) area break

B provide options for compilation and displaying edit statistics at various levels

C provide automatic (user-specified) tolerance checking of error rates by area

D automatically capture IDs of areas failing tolerance check

4 Clean up known bugs in code

5 Comprehensive testing

6 Clean up and enhance documentation

A reference manual more examples error message guide

B installation guide

C systems manual

APPENDIX B

EVALUATIVE CRITERIA

APPEiUIX B [4 si 11 T q tII

UNITED STATES GOVERNMENshy

Pfern0 KGa 7 dll TO DSPOPFPSD Robert H Haladay

DATE December 3 1979

DSPOPDEIO Liliane Floge

SUBJECT Criteria for Evaluation of SuCen COBOL CONCOR September 30thVersion as presented at January 1930 orkshop

The following points should be considered when evaluatingCONCOR the COBOLediting system during participation at the January 19380 workshyshop

1 Are the Installation anual the System ilanual and theUsers anual ready for use Are these manuals adequate fortheir purposes Are the manuals clear and consistentthey be understood by subject-matter personnel Can

as well program ers

as

2 Is the editing system capable of impleaenting all thefeatures as described in the documenit_tion 3 Does the system appc3Ar to be ely debugged andtested and ready for release to dLvlooing conris 4 Does the system appear to be fast enough to edit a

census in a reasonable airount of time 5 hat size core does tine sys t- equire

6 CorFm-ent in general on coole te-nss usefulness andefficiency of ir-nease usethe n of by developingcountry prograrrers and other c nsus personnel

cc Su-a-- e Olds APiA SDi OrWJulio Ortuizar luotolaCelta Systems

APPENDIX C

WORKSHOP ITINERARY

APPENDIX C

CONCOR Workshop Schedule January 7-18 1980

U S Bureau of the Census International Statistical Programs Center

Scuderi Building Room 209 4235 28th Avenue Marlow Heights Maryland

Monday January 7

930 am shy 1000 Welcoming Remarks

Overview of Workshop

1000 - 1000 Introduction to CONCOR - Purpose and function

- History of development - General computer

requirements

115 - 200 System Description

-Constraints in design of CONCOR

-Basic subsystems of CONCOR

-User interactions with system

-Examples of outputs produced

1030 shy 1045 Break

1045 - 1200 Editing Concepts - Ways to interrogate

data - Ways to correct data

- Editing housing and

population data - POPSTAN

- Advantages of CONCOR

1200 shy 115 pm Break

200 - 230

230 - 245

245 - 325

User Program Organization -Divisions - Sections

- Routines - Commands

Break

Command Language

Description -Types of statements -Format -Syntax

is

2

Tuesday January 8

Dictionary Division Command Statements

115 pm-2 15 Input-Record-Section930 am-10

30 Punctuation - Define-RecordInput data referencing

Working-Data-SectionWorking data declara-- New-Datations - Array-DataInternal data represen-

tation and storage Break215 - 230

1030 - 1045 Break 230 - 325 Dictionary Examples

- Minimum dictionary1045 - 1200 Dictionary-Attributes-

structureSection - Maximum dictionary- Dictionary-Name

structure

- Hand out dictionaryFile-Section problem- Input-File

- Output-File - Write-File

- Error-File

1200 - 115 pm Break

Wednesday January 9

115 pm-2 1 5 Execution Division Command

930 am-1030 Discussion of Partici-Statements (continued)pants Dictionary

- Routines of Edit-Specishyproblems fications-Section

Free work time -Prolog-Routine -Filter-Routine -Epilog-Routine

- Types and functions 1030 - 1045 Break

of edit specification

commands1045 - 1200 Execution Division

Command Statements - Range- Punctuation - Assert- Subscripting

- Internal Identifiers - Report-Control-Section

-Generate-Edit-Statistics 215 - 230 Break-Count-Imputes

-Examples 230 - 325 - PassFail clauses

- List1200 - 115 pm Break

3

Thursday January 10

930 am-1030 Discussion of Problems - If115 pm-215

Free work time - UntilExit - Stop

1030 - 1045 Break 215 - 230 Break

1045 shy 1200 Edit Specification Command Statements 230 - 325 - Drecode

(continued) - Grecode

- Allocate - Update - Let

1200 - 115pm Break

Friday January 11

930 am-1030 Edit Specification 115 pm-3 25 Hand out Edit Specishyfications as problemCommand Statements

(continued) Free work time

- Output - Write

1030 - 1045 Break

1045 - 1200 Report Division Command Statements - Display-Control-

Section -Display-Edit-Statistics

- Tolerance-Control-Section -Error-Rate-Check -Reject-File

-Report Examples

1200 - 115 pm Break

I

29

Monday January 14

930 am-1030 Discuss procedures for running problems on computer

1030-1045 Break

1045-1200 Component Programs of the CONCOR system

1200- 115 pm Break

Tuesday January 15

930 am - 325 pm Free work time

Wednesday January 16

930 am 1200 Free work time

1200- 115 pm Break

115 pm-215 How to Install CONCOR on IBM 360370 OS

215- 230 Break

230-325 Free work time

Thursday January 17

930 am-325 Free work time

115 pm-215 Future CONCOR Design Considerations - for census processing - for survey processing

- manual correction system

215- 230 Break

230 - 245 Evaluation Guidelines

- Hand out evaluation forms

245 - 325 Free work time

Friday January 18

930 am-1030 Free work time 115-325 Free work time

1030 - 1045 Break

1045 - 1200 Group Discussion on CONCOR - submission of written evaluations Distribution of IBM installation tapes Presentation of Certificates to Participants

1200-115 pm Break

APPENDIX D

PARTICIPANTS

APPENDIX D

CONCOR WORKSHOP January 7 - 18 1980 Marlow Heights MD

PARTICIPANTS

KENYA

James Midianga Government Computer Centre Central Bureau of Statistics PO Box 30266 Nairobi Kenya

PANAMA

Omar Rivera Rios Data Processing Specialist Contraloria General de la Republica de Panama Apartado Postal 5213 Panama 5 Panama

PHILIPPINES Guida T Capellan OIC Computer Programming Division National Census and Statistics Office Solicarel Bldg 1 Magsaysay Blvd

Sta Mesa Manila Philippines

THAILAND

Angsumal Sunalai National Statistical Office Bangkok 1 Thailand

EGYPT

Farag Sedky Mourad Ghaleb CAPMAS (Centeral Agency for Public

Mobilization and Statistics)NASR City Cairo Egypt

L

2 31

SAUDI ARABIA

Shargi R Al-Shargi National Computer Center PO Box 2534 Riyadh Saudi Arabia

Ahmed S Al-Ghamdi Dept Director of National Computer Center PO Box 2534 Riyadh Saudi Arabia

OTHER

Robert W ONeal USREPJECOR APO New York NY 09038

John N Adams Data Processing Advisor DACCA Department of State Washington DC 20520

OR co UNDP PO Box 224 Dacca Bangladesh

Joe Quasney US Census Bureau Washington DC 20233

Howard Brunsman 5715 N Ninth St Arlington VA 22205

Larry Shiller and Julio Ortuzar Delta Systems Consultants Inc 264 Alhambra Circle Coral Cables FL 33134

Nicholas Ourusoff Burpee Hill Road New London NH 03257 (United Nations)

Frederic J Grant IV Systems Development GeorgiaWorld Congress Institute 580 North Omni International Atlanta Georgia 30303

3 32

Rafael Samper Mary Jo Keenan Catherine B Gleason LACDR Room 2246 NS Washington DC 20523

John Marshall AIDSERDMPSE Rm 706C SA-12 Washington DC 20523

Susanne Bacon and Mary K Friday ISPCData Processing US Census Bureau Washington DC 20233

Leo Dougherty ISPCWorld Census Staff US Bureau of the Census Washington DC 20233

33

CON C OR WORKSHOP

January 7-18 1979 Washington DC

Staff

Robert R Bair

Luis Garcia

David Malkovsky

Sandra Mansfield

Selma Sawaya

Vivian Toro

APPENDIX E

CONCOR EVALUATION FORM

0

APPENDIX E

CONCOR WORKSHOP January 16 1980

BASIC COBOL VERSION 2

December 1979 Release

Participant Evaluation of Package

1 What is your impression of this version of CONCOR in terms of its usefulness and reliability for editing housing and population census data in less developed countries

2 If you are familiar with previous versions of CONCOR how does this current version compare to them

2D

35

2

3 How would you compare the utility of this version of CONCOR for less developed countries with other systems for editing data of which you are familiar

4 How well do you feel the documentation (Reference Manual and Diagnostic Messages Guide) supports the software

36

3

5 How well do you think this version of CONCOR would serve for editing other types of statistical censuses or surveys

6 If your organization does statistical data processing by computer would you recommend to your superiors that this software be used to edit the data Do you feel assistance would be required in the installation of this software on your organizations computer andor in training userR on the packages applicability to their work

37

7 In what way(s) could improvements or enhancements be made to this version

of the CONCOR software andor its documentation

8 Other comments

1shy

APPENDIX F

NEWOLD COMMAND COMPARISONS

0 0

OLD CONCOR VERSION 1 DECEMBER 1978

DATA-DICTIONARY

DEFINE INPUT-FILE

DEFINE ERROR-FILE

DEFINE COMMON-DATA

(contains the questionnaire-IDand record type location information)

DEFINE RECORD-TYPE=

DEFINE NEW-DATA

DEFINE ARRAY-DATA

APPENDIX F

NEW CONCOR VERSION 2 DECEMBER 1979

DICTIONARY-DIVISION

DICTIONARY-ATTRIBUTES-SECTION

DICTIONARY-NAME

FILE-SECTION

INPUT-FILE OUTPUT-FILE WRITE-FILE ERROR-FILE

DOENTIFICATION-CONTROL-SECTION

AREA-CONTROL

QUESTIONNAIRE-CONTROL

RECORD-CONTROL

INPUT-RECORD-SECTION

DEFINE-RECORD

WORKING-DATA-SECTION

NEW-DATA ARRAY-DATA

END-DiVISION

COMMENTS

In Version 1 all command keyword labels had to begin in column 1 ofthe coding sheet The position notation of Version 2 permits thecommand to begin in columns 1-72

The period preceding a statement

signifies that it is but a commentcard Accordingly it should be noted that while the END-DIVISION command must be present the division name identifier DICTIONARY-DIVISION

has not been implemented

Note Some parameters for each commandhave been altered even where general correspondence exists

OLD CONCOR VERSION 1 DECEMBER 1978

EDIT-SPECIFICATION

PROLOG

FILTER TYPE ( )

EPILOG

ALLOCATE (ALL) ASSERT (AST)

END ERROR

FILTER TYPE ( ) WITH

IF LET

RANGE (RNG) RECODE (REC) STOP --keyword

UPDATE (UPD) WRITE (WRT) XRECODE (XREC)

NEW CONCOR VERSION 2 DECEMBER 1979

EXECUTION-DIVISION

REPORT-CONTROL-SECTION

COUNT-IMPUTES GENERATE-EDIT-STATISTICS

EDIT-SPECIFICATION-SECTION

PROLOG-ROUTINE

FILTER-ROUTINE (name-of-record-type)

EPILOG-ROUTINE

ALLOCATE (ALLOC) ASSERT (ASRT) DRECODE (DRCD) END-DIVISION

EXIT

GRECODE (GRCD) IF LET LIST OUTPUT RANGE (RNG)

STOP UNTIL UPDATE (UPD) WRITE (WRT)

END-DIVISION

COMMENTS

Periods preceding division and section names signify that they are treated as comments by CONCOR

END-DIVISION signifies the termination of the CONCOR division organization and must always be present even though division identifiers have not been implemented

This figure additionally permits a comparison of the EDIT-SPECI-FICATION commands of both CONCOR language versions Note the changes in abbreviations and that some keyword options (not shown) have been altered even where general corresponshydence exists Individual commands are discussed on the following pages

CONCOR LANGUAGE COMMAND STATEMENTS

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

ALLOCATE (ALL) ALLOCATE (ALLOC) Allows for the assignment of values and with the UPDATE command providesthe imputation mechanism Major difference is this statement now providesfor a receiving-identifier list and statistics are generated as coded inthe GENERATE-EDIT-STATISTICS command option of the EXECUTION-DIVISION

ASSERT fAST) ASSERT (ASRT) The ASSERT command is the basic consistency editor command of CONCOR Command now provides for a NOERROR and MESSAGE option Additionallyby utilizinq the DUMPR DUMPO keywords provides the user with the option of using all the current value of all user-identifiers defined for the record type being processed Other changes to this command include the pass end-pass fail end-fail constructions for the P purpose of maintaining structured logic

(see RECODE) DRECODE (DRCD) Provides a method for recoding data items having a starting value of zero and continue in ascending sequence Replaces t previous XRECODE command

EXIT Provides the means to terminate EXECUTION (ESCAPE) of command statements within the UNTIL command

END END-IF END-THEN

A solitary END command is no lonqer permitted in conditional construction Note the similarity to COBOL pseudocode

END-ELSE

ERROR not implemented No longer supported in language

(continued)

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

FILTER TYPE ( ) WITH not implemented No longer are option statements supported in language This command once specified essentially a GOTO depending on construction

(see XRECODE) GRECODE (GRCD) This new command provides a method for the group recoding of input values

IF IF The IF statement has properties found virtually in all programming languages ie provides a means of evaluating a condition and controlling the execution of alternative logical statements Similar to pseudocode a CONCOR programmer must utilize the structured command IF THEN END-THEN ELSE END-ELSE construction in place of the IF THEN END ELSE END statements acceptable to the 1978 version

LET LET Virtually unchanged

LIST New command which allows for the generation of specific messages to be displayed in the report of edit statistics by questionnaire as well as specific individual identifiers

OUTPUT New command which provides the means by which records will be written to the output-file

(continuej)

4)bull

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 VERSION 1

RANGE (RNG)

RECODE (REC)

STOP- keyword

UPDATE (UPD)

(continued)

DECEMBER 1979 VERSION 2

RANGE (RNG)

name changed

STOP- keyword

UNTIL DO END-DO

UPDATE (UPD)

COMMENTS

As a basic editing command of CONCOR the December 1979 version permits the listing of the out-of-range values as well as the entire record or questionnaire through DUMPR and DUMPO optionswithin the command Another new option of this command involves the utilization of a PASS END-PASS FAIL END-FAIL construction permitting the specification of logicalcommand paths depending upon the outcome of the range test

Name changed to DRECODE Virtually identical with the DRC command in the COCENTS system

Unchanged

DO-END-DO one of the most significant and powerful enhancements to the CONCOR language this command provides a loopingcapability similar to most other high-level pogramminglanguages The number of iterations is under user control as well as the value initially assigned to the user identifier (counter) which is incremented with each successive execution of the command statements

Virtually unchanged it is the basic mechanism for maintaining arrays involved in imputation processes

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

WRITE (WRT) WRITE (WRT) The WRITE language statement while once utilized as the primary output command is now used to create an auxiliary or derivative file Statistics may be accumulated with the specification of the GENERATE-EDIT-STATISTICS option of the EXECUTION-DIVISION (See WRITE-FILE of DICTIONARY-DIVISION)

XRECODE (XREC) name changed Old command which has become the GRECODE command in the December 1979 version

0 0 0

DECEMBER 1978 VERSION 1

DECEMBER 1979 VERSION 2

Not implemented REPORT-DIViSION

DISPLAY-CONTROL-SECTION

DISPLAY-EDIT-STATISTICS

TOLERANCE-CONTROL-SECTION

ERROR-RATE-CHECK REJECT-FILE

END-DIVISION

COMMENTS

This new division (note division and section names treated as comments) provides the means to controlthe type of edit statistics reports to be generated Generally all reports produced by the commands ofthis division are organized according to the AREA-CONTROL command of the DICTIONARY-DIVISION Notethat this means the input data file must be preproshycessed (presorted external to CONCOR) on the basisof the AREA-CONTROL data field as CONCOR is incapableof merging uncontiguous records

There is currently no command to enable users to specify unique report headings on the listingfrom this division

APPENDIX G

ISPC FUTURE ENHANCEMENT LIST

G-1

Appendix G

Future Versions of CONCOR Design Considerations

The present version of the CONCOR system (Basic COBOL Version 2) was designed to provide adequate computing power to properly edit housing and population census da a using automatic correction techniquns As a result CONCOR should not be construed to be the ultimate software package for generalized data editing The developers of this version of CONCOR realize that there is a group of improvements that can be made to the system which would facilitate the census data editing process A second group of changes or enhancements also exist that could facilitate the editing of survey or other types of census data It should be noted that some of the modifications to meet these two objectives at times are mutually exclusive Some of the possible modifications and considerations are outlined below

1 Housing and Population Census Processing

1 1 Software

1111 Dictionary Division

(1) Implementation of the Dictionary Attributes Section This would allow the user to specify the dictionary size (number of pages) on disk and control other file characteristics such as volume specification and retention periods if applicable

(2) Addition of an optional input andor output file(s) which could be used to initialize and save values stored in arrays and used during the allocation process This feature would require additional syntax analysis and the creation of new commands to perform the movement of the data values

(3) An output record format area (Output Record Section) could be added This would allow the creation of edited output files in a different format and length than the file read into CONCOR This implementation would require other enhanceients to provide a simple means of moving values from th-iT i tato tile output file Unique naming of user-identifiers would also need to be addressed Another possiblity is the declaration of both the input and output location in the same command when specifying the contents of tile data record to be edited

(4) Addition of user-identifier naires to fields used in the controlling of summary statistics and unique questionnaire determination This would require modification to the system level data dictionary

46 G-2

(5) Increasing the number of area and questionnaire control fields

This implementation would require modification lo the

dictionary as tell as to the error records passed to the Report

Division Another problem would be the formatting of the extra

data values in the report headings

(6) Permit the mixing of difFerent data item types in the CONTROL

commands This would allow the greatest flexibilty in choosing

control fields In order to implement this enhancement the

dictionary and the generated COBOL program (EDITOR) would

require modification

(7) Give the user access to the values in the control fields during

the editing process Either the user would be prohibited from

changing these values (as is currently done with CONCOR

internal identifiers) to prevent the creation of new questionnaires or the summary information gathered on a

questionnaire basis would become meaningless Another

ramification of this enhancement is how to handle fields defined as alphanumeric as CONCOR internally works in binary

(8) Allow input data records with a similar format to be defined by

the same DEFINE-RECORD command statement This could be

accomplished by permitting multiple values in the RECORD-TYPE

clause of the DEFINE-RECORD command A possible problem here

is in the determination of which value to be move to the output

record when the record is to be written out using the OUTPUT

command 0

(9) Implement a COMMION-DATA concept to handle items command to all

record types This could be costly in terms of core storage

and additional conversion time if a storage area is allocated

for each identifier for each record type Only permitting one

common area for all records would require validation of the

data to ensure the values were exactly the same on each record

This wJould require the CONCOR program to perform many more

internal checks when reading the data file into the store area

(10) Permit the selective inclusion of data items found in the

DEFINE-RECORD command statement into the universe count used

for the determination of tolerance levels when using the COUNT-IIPUTES command statement

(11) Increase the number and types of data format referencing now

permitted External numeric data (type N) could be expanded to

handle 18 digit numbers Also signed numeric and packed

decimal data format could be added The signed numeric format

would allow both leading blanks and negative data values This

format is essential ihen dealing with data files generated by

FORTRAN programs

0

+

G-3 47

(12) Increase the number of comparison strings permitted for alphanumeric data items This would allow for easier processing of alphanumeric strings but would require modifications to the system data dictionary

(13) If a range of valid values were specified with the declaration of a user-identifier in the DEFINE-RECORD command an automatic range check could be performed prior to CONCOR giving the user access to the data items on the questionnaire The inclusion of such values are now done by many users in the form of comment statements This enhancement would also facilitate the use of the dictionary by tabulation software

(14) The language format used in the Dictionary Division could be modified to recognize both the space and comma character as valid delimiters The user would still be required to code two commas to receive the system default value for any parameter

(15) Provide a repetition factor in the initialization of arrays

(16) Print the data dictionary name in the titles of the dictionary documentation or provide a means by which the user may specify a title to appear in the listings

112 Execution Division

(M) Implement the Run Control Section This would allow the user to increase execution speed of the EDITOR program This would be accomplished by eliminating some of the system protection features such as division by zero and out of bounds checking In this secion the user could also specify new maximum values for the EDITOR error limits

(2) Provide an internal identifier (OCCURRENCE-PTR) which would contain the occurrence number of the record currently being filtered The only problem to resolve is how to treat this id-ntifier outside of the Filter Routines

(3) Implement a CREATE commnand This command would allow the user to create a record that Was not found on the input file This implementation would require special care as it gives the user the ability to change the valme of a CONCOR internal identifier (TYPE-CGUNT) and modify the contents of the store Another consequence of this command is what action is to be taken in the calculation of tolerance in terms of this new records inclusion into the universe of observations and number of changes

iK

48 G-4

(4) The code generated in the EDITOR program for the DRECODE

command should be modified to utilized a lookup table approach

This would decrease the execution time of the command but woul C

require additional communication between the GENED and GENDD

programs

(5) Permit the user to continue the message text field over

successive lines of code

(6) Implementation of a SET command command that would change

CONCORs default occurrence pointer for the duration of the SET

command This command would be ever helpful as the validation

of the existence of a record would only have to be done once

when the SET command was encounterd The major drawback to

this command is in the users ability to remember the action

taken during the last execution of the command when the

Execution Division command statements may cover many pages of

code

(7) Implement a method of specifying different data formats for

fields on the WRITE command statement This would give greater

flexibility to the user but would require major revision to the

routines now used to process the WRITE command

(8) Provide the user with two sets of internal identifiers The

first set currently exists in the system and is valid on the

basis of the current control area (if specified) being

executed The second set would be accumulated on a run total

basis and would require the creation of new CONCOR internal

ident ifiers

(9) Implement automatic array declarations for capturing allocation

frequency distributions This could be done by the system for

discrete values (especially if point number 13 above is

implemented) but a mechanism for handling continuous values

would need to be designed nother program would then be

required to format the information gathered into a readable

form

113 Report Division

(I) Provide a means by which the user may specify their own headings or titles -for the reports

(2) Give the user the ability to override page ejection as a means

of saving paper

(3) Provide a neans by which the statistics gathered may be

presented in other aqgregations other than the area-break and

total levels now pro ided This would require another

procedure to aggregate the statistics and pass them onto the

Report Division before the printing of the reports began Thi

extra procedure is required because of program size

cons i derat ions

G-5

49

114 System Level

(1) Generation of assembler (ALC) language code for the EDITOR program to be used on the IBM 360370 series type computers

(2) Modification of the GENSRC program i utilize a lookup table instead of the linear search technioa2 now used

(3) Modification of the RDMSGTXT program to look at continuation lines when searching for the text to be supplied by the system

(4) Hlodification to the system data dictionary to better arrange and make available information needed most frequently Also to examine the possibility of making the section dealing with the message text and report generation a separate file Investigate the industry standards for dictionary format and see how CONCOR meets the requirements set forth

(5) Make the parsing and syntacial routines interactive so as to increase the product ity of the CONCOR user This does not mean to imply that the editing process itself should be made interactive but rather only the development of the user code should be put online

12 Documentation

(1)The development of a self-teaching CONCOR manual

(2) The development of a CONCOR Users Guide

G-6

0

50

2 Survey or other Census Processing

21 Software

(1) Make optional the specification of the record type and questionnaire identification information as some files are not

hierarchical in nature

(2) Provide another input file as a means of doing a check-in procedure of sample cases expected This new input file would contain the master list of those oberservations selected to be

examined

(3) Allow floating point calculations

(4) Provide a mechanism for referencing repetitive sections of questionnaire This referencing (loop letter approach) could be accomplished in the same fashion as interrecord checking is now implemented Note that by using this approach two versions of the software would be required One version would be as CONCOR is now implemented and the second would accept flat data files where the subscript indicated a repetitive section on the survey document

(5) Add a weighting scheme to allow meaningful totals and summary

statistics based upon the sampling frame used to gather the data being edited

(6) Increase the number of user-identifiers allowed in the system because of the more detailed information found on survey forms

(7) Provide a means of manual correction of erroneous data items This correction scheme would utilize the data dictionary and allow for correction based upon the user-identifer name and

questionnaire identification CELADE has a COBOL version of

this program but it has not been finished nor tested

42

APPENDIX H

CONCOR SYSTEM INTERNAL VARIABLES

0 APPENO H

CONCOR SYSTEM INTERNAL VARIABLES

1 EOF-FLAG

2 ERROR-TN-QUESTIONNAIRE

3 CURRENT-POINTER-VALUE

4 TYPE-COUNT ( )

5 RECORD-COUNT

6 QUESTIONNAIRE-COUNT

7 RECORDS-IN-STORE

8 INVALID-RECORD-TYPE-COUNT

9 INCOMPLETE-FLAG

10 CONTINUATION-FLAG

11 INVALID-RECORD-FLAG

EOF-FLAG

ERRORS-IN-QUESTIONNAIRE

TYPE-COUNT ( )

IN-RECORD-COUNT

IN-QUESTIONNAIRE-COUNT

RECORDS-IN-STORE

INVALID-RECORD-TYPE-COUNT

INCOMPLETE-FLAG

CONTINUATION-FLAG

OUT-OF-RANGE

NOT-NUMBER

BLANK

(new internal counters) OUT-RECORD-COUNT OUTPUT-QUEST-COUNT WRITE-RECORD-COUNT

Change in spelling of variable

Not mentioned--possible ommission from manual

Not implemented

Note Current documentation equivocates when some variables

are reset--most are initialized with each control area break

Though implemented as reserved identifiers due to poor documentation of (old) CONCOR exact values and utility were unknown

These apparently new counters permit access to valuable totals within control areas

Note These are not cumulative counters Also outputshyquest-count violates naming conversion eg in out

APPENDIX I

CONCOR-EDITOR EXECUTION STATISTICS

C 0 N C o R - E D I T O R EXECUT ION S-TA-T1 ST I CS RUN DATEt 517I

APPENDIX I

CONTROL AREA 0 INPUTS RELD OUTPUTS BY OUTPUTCOHMAND OUTPUTS BY W0ITECOMP4JD NO VALI 0 SEQUENCE NUMBER 00000 00 0 RECTYPES 0

OUESTIONNAIRES RECORDS 0 QUEST KEYWORD- 0 ECORDS- RECORDS FORQUFSTshy

bullDIVISION BY ZERO LINE NUMBER i18 to DIVISION BY ZERO LINE NUMBER 118 bo DIVISION BY ZERO 00 LINE NUMBER 118

Oboo DIVISION BY ZERO 00 LINE NUMBER 118 LTNE |UJMP 000 DIVISION BY ZERO 000 LINE NUMBER 118

bo DIVISION BY ZERO 0 LINE NUMBER 118 00 RUN ABORTED DUE TO DIVISION BY ZERO ERRORS 00 1it LIST IR003 = Ro3t

1 6 42

112 LIST

Comments One of the most sever criticisms 114 LIST IDIVISIION BY ZErO ERROR Cw FOLLOWS of the old CONCOR package was that ]l LET RO01 003 = O it would cease processing for no

apparent reason--ABEND without generating any messages The most )16 LIST iRO01 R003 SET TO ZEROS Pou19 Ron3 conmon types of cessations are division by zero array coordinate 117 LIST RO0 0 = errors and user specified terminations A program to deliberately 118 LET P003 = R002 RO01 test CONCORS ability to providediagnostics was written The system 119 LIST LIST R003 R002RO01 =1 R0031 perfoned exceptionally well and correctly provided the line numbers in the 120 source program which caused the data exceptions The program text has been 121 superimposed upon the Execution Statistics page for illustration purposes 122 END TEST OF OPERANDS AND MSG

123

124

1 5 PEGISTERS RESET TO ZERO

125 LET R001 RO00 R003 = nt

0 N C 1 - Z D I T 0 it - ( r S T A T I S T I C S QJi DAT 0119

I ~ - ~APPENDIX b)APPENDIX TL I A Cr14A) 01 3TUTS JRtTF -0 VtLIQ AP-A I PUT R - 3ITD JTS LY UTPUT - Y C

I)ESTIjAAIES - PECLRW cU] I S T igry-t QEC UP )S irrkqS F] ) T

VLLII= FF= 15( C 1 u U-1 Tr JJ TI A)[ 9 )i 1= TI =WATE ) L i - U = VI L IJ) L T J3r-= I n

D 1 4A 1 EItR P IXSTI J A 1 L = 7 II I fIAT E I i Y C

= y 4 I 19f CZ U AT -Z-2j 1ESTIJh14 E CJ)PI) IATE= 2 V L 1E= L T I P1 =UM 11

= rU F r

F R L- -- I JE TI rjA OrPR eTE= rUMiF=E II V1 LIIF= LIT V M

tYfZU ttj gR R QUIESTI3Ii1 j Vi r IATE= LI n l FJw== 0R V J L T C ( -pJIT E~Q - cUESTT Y2ZjI 1= g rfiflUI)IATL= VeLUJE= P LIf PP= 1 3)

bull~ - A ED 011 1 Tn CCy -tENCE C rGR DI T FOR UP S )r 0 JCCUJ0EICF FJ71 ECGPO

173

175 UNTIL ROO1 gt 10 VARY P001 FROM 1 By Comments

In this example RO01 a 2 dimensional array 177 DO with 10 rows and 10 columns is attempting

178 UNTIL R002 - 10 VARY BO FROM 1 BY 1 to update the smaller (5 X 5) TA2 Nonshy- Uexistent element references caused the error

The program text has been superimposed on this

page for illustration purposes

180 ALLOCATE NOOP = TA1 (RO0] ROO)I

JAI UPDATE TA2 (RO01 R002) RO031

182 LIST N002 = R0037 NOO R003

1P3 LIST VALUES OF TA1 = RO03v TAl (RO01 R002)

184 LIST

185 LIST ITA2=TA2 RO01 Rn02)l

l6 END-DOI LIST

187 END-DO

deg r- -- shy

APPENJDIXI1 J( j

CONTROL AREA INPUTS READ OUTPUTS BY OUTPUT LOMANO OUTPUTS TEltCOMMNDfN

SEQUENCE NUMBER 11T A QUESTIONNAIRES RECORDS QUEST KEYWORD RECORDS SFf- 0 T rr-

JOB STOPPED DUETO USER STOP-RUN COMMAND ON LINC 215 - - -- 2 15 _ _ 4$ A

EDITOR -- NO RMAL END OF J B r T I1)

214 IF IN-RECORD-COUNT gt Comments In this example when the internal variable1215 ITHEN STOP-RUN END-THEN was than

215 TE END-THEN IN-RECORD-COUNT was greater than 500 processing was directed by the source COBOL CONCOR language

216 1 program to cease The program text has been - upon this page for illustrationI _-EC superimposed

S IF IN-RECORD-COUNT- lt 2 - - purposes ~~~~~~~~~~- bull - - ------I-

218~ THEN -LIST lt(THAN 20RECS INR

amp-~~~~~ ntai-beDcent

I

APPENDIX J

DIAGNOSTIC MEFSAGE GUIDE EXAMPLE

APPENDIX J DD-2

WARNING(DD-001) BLANK COMMAND STATEMENT LINE

EXPLANATION A blank command statement line was found in the user Dictionary Division command statement file

ACTION TAKEN Parsing continued with the next command statement line in the Dictionary Division command file

USER RESPONSE Put a comment indicator () in column 1 of the command statement line if spacing is desired if not remove the command statement line

ERROR (DD-002) ILLEGAL FIRST CHARACTER IN STRING

EXPLANATION A character not in the CONCOR legal character set (A-Z 0-9 - + the comma character () the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found as the first character in the string

ACTION TAKEN Parsing began again with the next string

USER RESPONSE Check for possible keying errors and ensure that the first character is a member of the valid CONCOR

character set

ERROR (DD-003) END OF LINE REACHED WITH NO DELIMITER FOR ILLEGAL STRING

EXPLANATION A string starting with a character not valid in the CONCOR legal character set (A-Z 0-9 - + the comma character ()p the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found without one of the delimiter characters ( =) when the end of the command line was reached

ACTION TAKEN Parsing began again with the next user command statement line in the Dictionary Division command file

USER RESPONSE Check for possible keying errors and ensure that each of the characters in the string is a member of the CONCOR valid character set (A-Z 0-9 - + the comma character () he delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ())

ERROR (DD-004) COMMENT INDICATOR () IN STRING

EXPLANATION The comment indicator character () was found embedded in a string

Page 18: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …

14

V CONCLUSION

In a situation of extreme need CONCOR could probably be utilized as it seems capable of performing its editing functiens Its usefulness as a data

cleaning tool is generally undisputed The importance of having CONCOR exshyhaustively tested by an independent agency cannot be over-emphasized in light of prior failures of the system to perform adequately Additionally core requirements and processing speed should be precisely determined One must be encouraged in the amount of progress which has been made in a relatively short amount of time towards the completion of the CONCOR package While it is clearly beyond the scope of this report to estimate the cost and the amount of time required to implement the herein outlined highly desirable changes to the CONCOR language and documentation and then exhaustively test the system it is believed that this process could be completed within a single quarter (The changes are of the nature that no structural redefinishytions of a systematic nature would be required) These enhancements would make the COBOL CONCOR package more complete consistent and reliable in the field ISPC has competently moved the system to the point that its comshypletion iswithin reach

Given the probable revision of the current Systems Reference Manual and the development of a suitable Users Guide document it is likely that this version of CONCOR could be taught to experienced programmers (unfamiliar with previous versions of the language) in as little as four weeks And while one is impressed by the potential power and overall simplicity of the COBOL CONCOR language it is difficult to envision individuals with no real computer programming experience utilizing its full capabilities Hence the importance of including a review of basic editing concepts in the documentation is undershylined

Further as documentation is often critical to the perceived ease of use of computer-based systems itwill be essential to have these various manuals examined independently prior to their general use

As previously mentioned Appendix G represents features of which the ISPC feels could have positive impact on CONCOR Depending upon the outcome of the testing recommended in previous chapters and the availability of funding supporting agencies may desire to look again at an assembler (ALC) version of the language

Overall COBOL CONCOR remains an intermediate product yet to realize its potential which is thought considerable Software development of this nature is characteristically of a long-term evolutionary design In light of this reality agencies should be prepared to evaluate such projects success or failure in terms other than that of immediate utility It is the philosophy underlying the system not the system itself which will ultimately be exported

APPENDIX A

Bucen Enforcement Proposal

APPENDIX A

BUCEN IN-HOUSE REVIEW ENHANCEMENT PROPOSAL

1 Easy to use interrecord referencing

2 Improved output file capabilities

A provide overflow protection on WRITE command

B provide OUTPUT command to create corrected questionnaire file that matches IP file data dictionary

3 Improvededit statistics reported (LISTERR)

A provide automatic (user-specified) area break

B provide options for compilation and displaying edit statistics at various levels

C provide automatic (user-specified) tolerance checking of error rates by area

D automatically capture IDs of areas failing tolerance check

4 Clean up known bugs in code

5 Comprehensive testing

6 Clean up and enhance documentation

A reference manual more examples error message guide

B installation guide

C systems manual

APPENDIX B

EVALUATIVE CRITERIA

APPEiUIX B [4 si 11 T q tII

UNITED STATES GOVERNMENshy

Pfern0 KGa 7 dll TO DSPOPFPSD Robert H Haladay

DATE December 3 1979

DSPOPDEIO Liliane Floge

SUBJECT Criteria for Evaluation of SuCen COBOL CONCOR September 30thVersion as presented at January 1930 orkshop

The following points should be considered when evaluatingCONCOR the COBOLediting system during participation at the January 19380 workshyshop

1 Are the Installation anual the System ilanual and theUsers anual ready for use Are these manuals adequate fortheir purposes Are the manuals clear and consistentthey be understood by subject-matter personnel Can

as well program ers

as

2 Is the editing system capable of impleaenting all thefeatures as described in the documenit_tion 3 Does the system appc3Ar to be ely debugged andtested and ready for release to dLvlooing conris 4 Does the system appear to be fast enough to edit a

census in a reasonable airount of time 5 hat size core does tine sys t- equire

6 CorFm-ent in general on coole te-nss usefulness andefficiency of ir-nease usethe n of by developingcountry prograrrers and other c nsus personnel

cc Su-a-- e Olds APiA SDi OrWJulio Ortuizar luotolaCelta Systems

APPENDIX C

WORKSHOP ITINERARY

APPENDIX C

CONCOR Workshop Schedule January 7-18 1980

U S Bureau of the Census International Statistical Programs Center

Scuderi Building Room 209 4235 28th Avenue Marlow Heights Maryland

Monday January 7

930 am shy 1000 Welcoming Remarks

Overview of Workshop

1000 - 1000 Introduction to CONCOR - Purpose and function

- History of development - General computer

requirements

115 - 200 System Description

-Constraints in design of CONCOR

-Basic subsystems of CONCOR

-User interactions with system

-Examples of outputs produced

1030 shy 1045 Break

1045 - 1200 Editing Concepts - Ways to interrogate

data - Ways to correct data

- Editing housing and

population data - POPSTAN

- Advantages of CONCOR

1200 shy 115 pm Break

200 - 230

230 - 245

245 - 325

User Program Organization -Divisions - Sections

- Routines - Commands

Break

Command Language

Description -Types of statements -Format -Syntax

is

2

Tuesday January 8

Dictionary Division Command Statements

115 pm-2 15 Input-Record-Section930 am-10

30 Punctuation - Define-RecordInput data referencing

Working-Data-SectionWorking data declara-- New-Datations - Array-DataInternal data represen-

tation and storage Break215 - 230

1030 - 1045 Break 230 - 325 Dictionary Examples

- Minimum dictionary1045 - 1200 Dictionary-Attributes-

structureSection - Maximum dictionary- Dictionary-Name

structure

- Hand out dictionaryFile-Section problem- Input-File

- Output-File - Write-File

- Error-File

1200 - 115 pm Break

Wednesday January 9

115 pm-2 1 5 Execution Division Command

930 am-1030 Discussion of Partici-Statements (continued)pants Dictionary

- Routines of Edit-Specishyproblems fications-Section

Free work time -Prolog-Routine -Filter-Routine -Epilog-Routine

- Types and functions 1030 - 1045 Break

of edit specification

commands1045 - 1200 Execution Division

Command Statements - Range- Punctuation - Assert- Subscripting

- Internal Identifiers - Report-Control-Section

-Generate-Edit-Statistics 215 - 230 Break-Count-Imputes

-Examples 230 - 325 - PassFail clauses

- List1200 - 115 pm Break

3

Thursday January 10

930 am-1030 Discussion of Problems - If115 pm-215

Free work time - UntilExit - Stop

1030 - 1045 Break 215 - 230 Break

1045 shy 1200 Edit Specification Command Statements 230 - 325 - Drecode

(continued) - Grecode

- Allocate - Update - Let

1200 - 115pm Break

Friday January 11

930 am-1030 Edit Specification 115 pm-3 25 Hand out Edit Specishyfications as problemCommand Statements

(continued) Free work time

- Output - Write

1030 - 1045 Break

1045 - 1200 Report Division Command Statements - Display-Control-

Section -Display-Edit-Statistics

- Tolerance-Control-Section -Error-Rate-Check -Reject-File

-Report Examples

1200 - 115 pm Break

I

29

Monday January 14

930 am-1030 Discuss procedures for running problems on computer

1030-1045 Break

1045-1200 Component Programs of the CONCOR system

1200- 115 pm Break

Tuesday January 15

930 am - 325 pm Free work time

Wednesday January 16

930 am 1200 Free work time

1200- 115 pm Break

115 pm-215 How to Install CONCOR on IBM 360370 OS

215- 230 Break

230-325 Free work time

Thursday January 17

930 am-325 Free work time

115 pm-215 Future CONCOR Design Considerations - for census processing - for survey processing

- manual correction system

215- 230 Break

230 - 245 Evaluation Guidelines

- Hand out evaluation forms

245 - 325 Free work time

Friday January 18

930 am-1030 Free work time 115-325 Free work time

1030 - 1045 Break

1045 - 1200 Group Discussion on CONCOR - submission of written evaluations Distribution of IBM installation tapes Presentation of Certificates to Participants

1200-115 pm Break

APPENDIX D

PARTICIPANTS

APPENDIX D

CONCOR WORKSHOP January 7 - 18 1980 Marlow Heights MD

PARTICIPANTS

KENYA

James Midianga Government Computer Centre Central Bureau of Statistics PO Box 30266 Nairobi Kenya

PANAMA

Omar Rivera Rios Data Processing Specialist Contraloria General de la Republica de Panama Apartado Postal 5213 Panama 5 Panama

PHILIPPINES Guida T Capellan OIC Computer Programming Division National Census and Statistics Office Solicarel Bldg 1 Magsaysay Blvd

Sta Mesa Manila Philippines

THAILAND

Angsumal Sunalai National Statistical Office Bangkok 1 Thailand

EGYPT

Farag Sedky Mourad Ghaleb CAPMAS (Centeral Agency for Public

Mobilization and Statistics)NASR City Cairo Egypt

L

2 31

SAUDI ARABIA

Shargi R Al-Shargi National Computer Center PO Box 2534 Riyadh Saudi Arabia

Ahmed S Al-Ghamdi Dept Director of National Computer Center PO Box 2534 Riyadh Saudi Arabia

OTHER

Robert W ONeal USREPJECOR APO New York NY 09038

John N Adams Data Processing Advisor DACCA Department of State Washington DC 20520

OR co UNDP PO Box 224 Dacca Bangladesh

Joe Quasney US Census Bureau Washington DC 20233

Howard Brunsman 5715 N Ninth St Arlington VA 22205

Larry Shiller and Julio Ortuzar Delta Systems Consultants Inc 264 Alhambra Circle Coral Cables FL 33134

Nicholas Ourusoff Burpee Hill Road New London NH 03257 (United Nations)

Frederic J Grant IV Systems Development GeorgiaWorld Congress Institute 580 North Omni International Atlanta Georgia 30303

3 32

Rafael Samper Mary Jo Keenan Catherine B Gleason LACDR Room 2246 NS Washington DC 20523

John Marshall AIDSERDMPSE Rm 706C SA-12 Washington DC 20523

Susanne Bacon and Mary K Friday ISPCData Processing US Census Bureau Washington DC 20233

Leo Dougherty ISPCWorld Census Staff US Bureau of the Census Washington DC 20233

33

CON C OR WORKSHOP

January 7-18 1979 Washington DC

Staff

Robert R Bair

Luis Garcia

David Malkovsky

Sandra Mansfield

Selma Sawaya

Vivian Toro

APPENDIX E

CONCOR EVALUATION FORM

0

APPENDIX E

CONCOR WORKSHOP January 16 1980

BASIC COBOL VERSION 2

December 1979 Release

Participant Evaluation of Package

1 What is your impression of this version of CONCOR in terms of its usefulness and reliability for editing housing and population census data in less developed countries

2 If you are familiar with previous versions of CONCOR how does this current version compare to them

2D

35

2

3 How would you compare the utility of this version of CONCOR for less developed countries with other systems for editing data of which you are familiar

4 How well do you feel the documentation (Reference Manual and Diagnostic Messages Guide) supports the software

36

3

5 How well do you think this version of CONCOR would serve for editing other types of statistical censuses or surveys

6 If your organization does statistical data processing by computer would you recommend to your superiors that this software be used to edit the data Do you feel assistance would be required in the installation of this software on your organizations computer andor in training userR on the packages applicability to their work

37

7 In what way(s) could improvements or enhancements be made to this version

of the CONCOR software andor its documentation

8 Other comments

1shy

APPENDIX F

NEWOLD COMMAND COMPARISONS

0 0

OLD CONCOR VERSION 1 DECEMBER 1978

DATA-DICTIONARY

DEFINE INPUT-FILE

DEFINE ERROR-FILE

DEFINE COMMON-DATA

(contains the questionnaire-IDand record type location information)

DEFINE RECORD-TYPE=

DEFINE NEW-DATA

DEFINE ARRAY-DATA

APPENDIX F

NEW CONCOR VERSION 2 DECEMBER 1979

DICTIONARY-DIVISION

DICTIONARY-ATTRIBUTES-SECTION

DICTIONARY-NAME

FILE-SECTION

INPUT-FILE OUTPUT-FILE WRITE-FILE ERROR-FILE

DOENTIFICATION-CONTROL-SECTION

AREA-CONTROL

QUESTIONNAIRE-CONTROL

RECORD-CONTROL

INPUT-RECORD-SECTION

DEFINE-RECORD

WORKING-DATA-SECTION

NEW-DATA ARRAY-DATA

END-DiVISION

COMMENTS

In Version 1 all command keyword labels had to begin in column 1 ofthe coding sheet The position notation of Version 2 permits thecommand to begin in columns 1-72

The period preceding a statement

signifies that it is but a commentcard Accordingly it should be noted that while the END-DIVISION command must be present the division name identifier DICTIONARY-DIVISION

has not been implemented

Note Some parameters for each commandhave been altered even where general correspondence exists

OLD CONCOR VERSION 1 DECEMBER 1978

EDIT-SPECIFICATION

PROLOG

FILTER TYPE ( )

EPILOG

ALLOCATE (ALL) ASSERT (AST)

END ERROR

FILTER TYPE ( ) WITH

IF LET

RANGE (RNG) RECODE (REC) STOP --keyword

UPDATE (UPD) WRITE (WRT) XRECODE (XREC)

NEW CONCOR VERSION 2 DECEMBER 1979

EXECUTION-DIVISION

REPORT-CONTROL-SECTION

COUNT-IMPUTES GENERATE-EDIT-STATISTICS

EDIT-SPECIFICATION-SECTION

PROLOG-ROUTINE

FILTER-ROUTINE (name-of-record-type)

EPILOG-ROUTINE

ALLOCATE (ALLOC) ASSERT (ASRT) DRECODE (DRCD) END-DIVISION

EXIT

GRECODE (GRCD) IF LET LIST OUTPUT RANGE (RNG)

STOP UNTIL UPDATE (UPD) WRITE (WRT)

END-DIVISION

COMMENTS

Periods preceding division and section names signify that they are treated as comments by CONCOR

END-DIVISION signifies the termination of the CONCOR division organization and must always be present even though division identifiers have not been implemented

This figure additionally permits a comparison of the EDIT-SPECI-FICATION commands of both CONCOR language versions Note the changes in abbreviations and that some keyword options (not shown) have been altered even where general corresponshydence exists Individual commands are discussed on the following pages

CONCOR LANGUAGE COMMAND STATEMENTS

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

ALLOCATE (ALL) ALLOCATE (ALLOC) Allows for the assignment of values and with the UPDATE command providesthe imputation mechanism Major difference is this statement now providesfor a receiving-identifier list and statistics are generated as coded inthe GENERATE-EDIT-STATISTICS command option of the EXECUTION-DIVISION

ASSERT fAST) ASSERT (ASRT) The ASSERT command is the basic consistency editor command of CONCOR Command now provides for a NOERROR and MESSAGE option Additionallyby utilizinq the DUMPR DUMPO keywords provides the user with the option of using all the current value of all user-identifiers defined for the record type being processed Other changes to this command include the pass end-pass fail end-fail constructions for the P purpose of maintaining structured logic

(see RECODE) DRECODE (DRCD) Provides a method for recoding data items having a starting value of zero and continue in ascending sequence Replaces t previous XRECODE command

EXIT Provides the means to terminate EXECUTION (ESCAPE) of command statements within the UNTIL command

END END-IF END-THEN

A solitary END command is no lonqer permitted in conditional construction Note the similarity to COBOL pseudocode

END-ELSE

ERROR not implemented No longer supported in language

(continued)

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

FILTER TYPE ( ) WITH not implemented No longer are option statements supported in language This command once specified essentially a GOTO depending on construction

(see XRECODE) GRECODE (GRCD) This new command provides a method for the group recoding of input values

IF IF The IF statement has properties found virtually in all programming languages ie provides a means of evaluating a condition and controlling the execution of alternative logical statements Similar to pseudocode a CONCOR programmer must utilize the structured command IF THEN END-THEN ELSE END-ELSE construction in place of the IF THEN END ELSE END statements acceptable to the 1978 version

LET LET Virtually unchanged

LIST New command which allows for the generation of specific messages to be displayed in the report of edit statistics by questionnaire as well as specific individual identifiers

OUTPUT New command which provides the means by which records will be written to the output-file

(continuej)

4)bull

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 VERSION 1

RANGE (RNG)

RECODE (REC)

STOP- keyword

UPDATE (UPD)

(continued)

DECEMBER 1979 VERSION 2

RANGE (RNG)

name changed

STOP- keyword

UNTIL DO END-DO

UPDATE (UPD)

COMMENTS

As a basic editing command of CONCOR the December 1979 version permits the listing of the out-of-range values as well as the entire record or questionnaire through DUMPR and DUMPO optionswithin the command Another new option of this command involves the utilization of a PASS END-PASS FAIL END-FAIL construction permitting the specification of logicalcommand paths depending upon the outcome of the range test

Name changed to DRECODE Virtually identical with the DRC command in the COCENTS system

Unchanged

DO-END-DO one of the most significant and powerful enhancements to the CONCOR language this command provides a loopingcapability similar to most other high-level pogramminglanguages The number of iterations is under user control as well as the value initially assigned to the user identifier (counter) which is incremented with each successive execution of the command statements

Virtually unchanged it is the basic mechanism for maintaining arrays involved in imputation processes

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

WRITE (WRT) WRITE (WRT) The WRITE language statement while once utilized as the primary output command is now used to create an auxiliary or derivative file Statistics may be accumulated with the specification of the GENERATE-EDIT-STATISTICS option of the EXECUTION-DIVISION (See WRITE-FILE of DICTIONARY-DIVISION)

XRECODE (XREC) name changed Old command which has become the GRECODE command in the December 1979 version

0 0 0

DECEMBER 1978 VERSION 1

DECEMBER 1979 VERSION 2

Not implemented REPORT-DIViSION

DISPLAY-CONTROL-SECTION

DISPLAY-EDIT-STATISTICS

TOLERANCE-CONTROL-SECTION

ERROR-RATE-CHECK REJECT-FILE

END-DIVISION

COMMENTS

This new division (note division and section names treated as comments) provides the means to controlthe type of edit statistics reports to be generated Generally all reports produced by the commands ofthis division are organized according to the AREA-CONTROL command of the DICTIONARY-DIVISION Notethat this means the input data file must be preproshycessed (presorted external to CONCOR) on the basisof the AREA-CONTROL data field as CONCOR is incapableof merging uncontiguous records

There is currently no command to enable users to specify unique report headings on the listingfrom this division

APPENDIX G

ISPC FUTURE ENHANCEMENT LIST

G-1

Appendix G

Future Versions of CONCOR Design Considerations

The present version of the CONCOR system (Basic COBOL Version 2) was designed to provide adequate computing power to properly edit housing and population census da a using automatic correction techniquns As a result CONCOR should not be construed to be the ultimate software package for generalized data editing The developers of this version of CONCOR realize that there is a group of improvements that can be made to the system which would facilitate the census data editing process A second group of changes or enhancements also exist that could facilitate the editing of survey or other types of census data It should be noted that some of the modifications to meet these two objectives at times are mutually exclusive Some of the possible modifications and considerations are outlined below

1 Housing and Population Census Processing

1 1 Software

1111 Dictionary Division

(1) Implementation of the Dictionary Attributes Section This would allow the user to specify the dictionary size (number of pages) on disk and control other file characteristics such as volume specification and retention periods if applicable

(2) Addition of an optional input andor output file(s) which could be used to initialize and save values stored in arrays and used during the allocation process This feature would require additional syntax analysis and the creation of new commands to perform the movement of the data values

(3) An output record format area (Output Record Section) could be added This would allow the creation of edited output files in a different format and length than the file read into CONCOR This implementation would require other enhanceients to provide a simple means of moving values from th-iT i tato tile output file Unique naming of user-identifiers would also need to be addressed Another possiblity is the declaration of both the input and output location in the same command when specifying the contents of tile data record to be edited

(4) Addition of user-identifier naires to fields used in the controlling of summary statistics and unique questionnaire determination This would require modification to the system level data dictionary

46 G-2

(5) Increasing the number of area and questionnaire control fields

This implementation would require modification lo the

dictionary as tell as to the error records passed to the Report

Division Another problem would be the formatting of the extra

data values in the report headings

(6) Permit the mixing of difFerent data item types in the CONTROL

commands This would allow the greatest flexibilty in choosing

control fields In order to implement this enhancement the

dictionary and the generated COBOL program (EDITOR) would

require modification

(7) Give the user access to the values in the control fields during

the editing process Either the user would be prohibited from

changing these values (as is currently done with CONCOR

internal identifiers) to prevent the creation of new questionnaires or the summary information gathered on a

questionnaire basis would become meaningless Another

ramification of this enhancement is how to handle fields defined as alphanumeric as CONCOR internally works in binary

(8) Allow input data records with a similar format to be defined by

the same DEFINE-RECORD command statement This could be

accomplished by permitting multiple values in the RECORD-TYPE

clause of the DEFINE-RECORD command A possible problem here

is in the determination of which value to be move to the output

record when the record is to be written out using the OUTPUT

command 0

(9) Implement a COMMION-DATA concept to handle items command to all

record types This could be costly in terms of core storage

and additional conversion time if a storage area is allocated

for each identifier for each record type Only permitting one

common area for all records would require validation of the

data to ensure the values were exactly the same on each record

This wJould require the CONCOR program to perform many more

internal checks when reading the data file into the store area

(10) Permit the selective inclusion of data items found in the

DEFINE-RECORD command statement into the universe count used

for the determination of tolerance levels when using the COUNT-IIPUTES command statement

(11) Increase the number and types of data format referencing now

permitted External numeric data (type N) could be expanded to

handle 18 digit numbers Also signed numeric and packed

decimal data format could be added The signed numeric format

would allow both leading blanks and negative data values This

format is essential ihen dealing with data files generated by

FORTRAN programs

0

+

G-3 47

(12) Increase the number of comparison strings permitted for alphanumeric data items This would allow for easier processing of alphanumeric strings but would require modifications to the system data dictionary

(13) If a range of valid values were specified with the declaration of a user-identifier in the DEFINE-RECORD command an automatic range check could be performed prior to CONCOR giving the user access to the data items on the questionnaire The inclusion of such values are now done by many users in the form of comment statements This enhancement would also facilitate the use of the dictionary by tabulation software

(14) The language format used in the Dictionary Division could be modified to recognize both the space and comma character as valid delimiters The user would still be required to code two commas to receive the system default value for any parameter

(15) Provide a repetition factor in the initialization of arrays

(16) Print the data dictionary name in the titles of the dictionary documentation or provide a means by which the user may specify a title to appear in the listings

112 Execution Division

(M) Implement the Run Control Section This would allow the user to increase execution speed of the EDITOR program This would be accomplished by eliminating some of the system protection features such as division by zero and out of bounds checking In this secion the user could also specify new maximum values for the EDITOR error limits

(2) Provide an internal identifier (OCCURRENCE-PTR) which would contain the occurrence number of the record currently being filtered The only problem to resolve is how to treat this id-ntifier outside of the Filter Routines

(3) Implement a CREATE commnand This command would allow the user to create a record that Was not found on the input file This implementation would require special care as it gives the user the ability to change the valme of a CONCOR internal identifier (TYPE-CGUNT) and modify the contents of the store Another consequence of this command is what action is to be taken in the calculation of tolerance in terms of this new records inclusion into the universe of observations and number of changes

iK

48 G-4

(4) The code generated in the EDITOR program for the DRECODE

command should be modified to utilized a lookup table approach

This would decrease the execution time of the command but woul C

require additional communication between the GENED and GENDD

programs

(5) Permit the user to continue the message text field over

successive lines of code

(6) Implementation of a SET command command that would change

CONCORs default occurrence pointer for the duration of the SET

command This command would be ever helpful as the validation

of the existence of a record would only have to be done once

when the SET command was encounterd The major drawback to

this command is in the users ability to remember the action

taken during the last execution of the command when the

Execution Division command statements may cover many pages of

code

(7) Implement a method of specifying different data formats for

fields on the WRITE command statement This would give greater

flexibility to the user but would require major revision to the

routines now used to process the WRITE command

(8) Provide the user with two sets of internal identifiers The

first set currently exists in the system and is valid on the

basis of the current control area (if specified) being

executed The second set would be accumulated on a run total

basis and would require the creation of new CONCOR internal

ident ifiers

(9) Implement automatic array declarations for capturing allocation

frequency distributions This could be done by the system for

discrete values (especially if point number 13 above is

implemented) but a mechanism for handling continuous values

would need to be designed nother program would then be

required to format the information gathered into a readable

form

113 Report Division

(I) Provide a means by which the user may specify their own headings or titles -for the reports

(2) Give the user the ability to override page ejection as a means

of saving paper

(3) Provide a neans by which the statistics gathered may be

presented in other aqgregations other than the area-break and

total levels now pro ided This would require another

procedure to aggregate the statistics and pass them onto the

Report Division before the printing of the reports began Thi

extra procedure is required because of program size

cons i derat ions

G-5

49

114 System Level

(1) Generation of assembler (ALC) language code for the EDITOR program to be used on the IBM 360370 series type computers

(2) Modification of the GENSRC program i utilize a lookup table instead of the linear search technioa2 now used

(3) Modification of the RDMSGTXT program to look at continuation lines when searching for the text to be supplied by the system

(4) Hlodification to the system data dictionary to better arrange and make available information needed most frequently Also to examine the possibility of making the section dealing with the message text and report generation a separate file Investigate the industry standards for dictionary format and see how CONCOR meets the requirements set forth

(5) Make the parsing and syntacial routines interactive so as to increase the product ity of the CONCOR user This does not mean to imply that the editing process itself should be made interactive but rather only the development of the user code should be put online

12 Documentation

(1)The development of a self-teaching CONCOR manual

(2) The development of a CONCOR Users Guide

G-6

0

50

2 Survey or other Census Processing

21 Software

(1) Make optional the specification of the record type and questionnaire identification information as some files are not

hierarchical in nature

(2) Provide another input file as a means of doing a check-in procedure of sample cases expected This new input file would contain the master list of those oberservations selected to be

examined

(3) Allow floating point calculations

(4) Provide a mechanism for referencing repetitive sections of questionnaire This referencing (loop letter approach) could be accomplished in the same fashion as interrecord checking is now implemented Note that by using this approach two versions of the software would be required One version would be as CONCOR is now implemented and the second would accept flat data files where the subscript indicated a repetitive section on the survey document

(5) Add a weighting scheme to allow meaningful totals and summary

statistics based upon the sampling frame used to gather the data being edited

(6) Increase the number of user-identifiers allowed in the system because of the more detailed information found on survey forms

(7) Provide a means of manual correction of erroneous data items This correction scheme would utilize the data dictionary and allow for correction based upon the user-identifer name and

questionnaire identification CELADE has a COBOL version of

this program but it has not been finished nor tested

42

APPENDIX H

CONCOR SYSTEM INTERNAL VARIABLES

0 APPENO H

CONCOR SYSTEM INTERNAL VARIABLES

1 EOF-FLAG

2 ERROR-TN-QUESTIONNAIRE

3 CURRENT-POINTER-VALUE

4 TYPE-COUNT ( )

5 RECORD-COUNT

6 QUESTIONNAIRE-COUNT

7 RECORDS-IN-STORE

8 INVALID-RECORD-TYPE-COUNT

9 INCOMPLETE-FLAG

10 CONTINUATION-FLAG

11 INVALID-RECORD-FLAG

EOF-FLAG

ERRORS-IN-QUESTIONNAIRE

TYPE-COUNT ( )

IN-RECORD-COUNT

IN-QUESTIONNAIRE-COUNT

RECORDS-IN-STORE

INVALID-RECORD-TYPE-COUNT

INCOMPLETE-FLAG

CONTINUATION-FLAG

OUT-OF-RANGE

NOT-NUMBER

BLANK

(new internal counters) OUT-RECORD-COUNT OUTPUT-QUEST-COUNT WRITE-RECORD-COUNT

Change in spelling of variable

Not mentioned--possible ommission from manual

Not implemented

Note Current documentation equivocates when some variables

are reset--most are initialized with each control area break

Though implemented as reserved identifiers due to poor documentation of (old) CONCOR exact values and utility were unknown

These apparently new counters permit access to valuable totals within control areas

Note These are not cumulative counters Also outputshyquest-count violates naming conversion eg in out

APPENDIX I

CONCOR-EDITOR EXECUTION STATISTICS

C 0 N C o R - E D I T O R EXECUT ION S-TA-T1 ST I CS RUN DATEt 517I

APPENDIX I

CONTROL AREA 0 INPUTS RELD OUTPUTS BY OUTPUTCOHMAND OUTPUTS BY W0ITECOMP4JD NO VALI 0 SEQUENCE NUMBER 00000 00 0 RECTYPES 0

OUESTIONNAIRES RECORDS 0 QUEST KEYWORD- 0 ECORDS- RECORDS FORQUFSTshy

bullDIVISION BY ZERO LINE NUMBER i18 to DIVISION BY ZERO LINE NUMBER 118 bo DIVISION BY ZERO 00 LINE NUMBER 118

Oboo DIVISION BY ZERO 00 LINE NUMBER 118 LTNE |UJMP 000 DIVISION BY ZERO 000 LINE NUMBER 118

bo DIVISION BY ZERO 0 LINE NUMBER 118 00 RUN ABORTED DUE TO DIVISION BY ZERO ERRORS 00 1it LIST IR003 = Ro3t

1 6 42

112 LIST

Comments One of the most sever criticisms 114 LIST IDIVISIION BY ZErO ERROR Cw FOLLOWS of the old CONCOR package was that ]l LET RO01 003 = O it would cease processing for no

apparent reason--ABEND without generating any messages The most )16 LIST iRO01 R003 SET TO ZEROS Pou19 Ron3 conmon types of cessations are division by zero array coordinate 117 LIST RO0 0 = errors and user specified terminations A program to deliberately 118 LET P003 = R002 RO01 test CONCORS ability to providediagnostics was written The system 119 LIST LIST R003 R002RO01 =1 R0031 perfoned exceptionally well and correctly provided the line numbers in the 120 source program which caused the data exceptions The program text has been 121 superimposed upon the Execution Statistics page for illustration purposes 122 END TEST OF OPERANDS AND MSG

123

124

1 5 PEGISTERS RESET TO ZERO

125 LET R001 RO00 R003 = nt

0 N C 1 - Z D I T 0 it - ( r S T A T I S T I C S QJi DAT 0119

I ~ - ~APPENDIX b)APPENDIX TL I A Cr14A) 01 3TUTS JRtTF -0 VtLIQ AP-A I PUT R - 3ITD JTS LY UTPUT - Y C

I)ESTIjAAIES - PECLRW cU] I S T igry-t QEC UP )S irrkqS F] ) T

VLLII= FF= 15( C 1 u U-1 Tr JJ TI A)[ 9 )i 1= TI =WATE ) L i - U = VI L IJ) L T J3r-= I n

D 1 4A 1 EItR P IXSTI J A 1 L = 7 II I fIAT E I i Y C

= y 4 I 19f CZ U AT -Z-2j 1ESTIJh14 E CJ)PI) IATE= 2 V L 1E= L T I P1 =UM 11

= rU F r

F R L- -- I JE TI rjA OrPR eTE= rUMiF=E II V1 LIIF= LIT V M

tYfZU ttj gR R QUIESTI3Ii1 j Vi r IATE= LI n l FJw== 0R V J L T C ( -pJIT E~Q - cUESTT Y2ZjI 1= g rfiflUI)IATL= VeLUJE= P LIf PP= 1 3)

bull~ - A ED 011 1 Tn CCy -tENCE C rGR DI T FOR UP S )r 0 JCCUJ0EICF FJ71 ECGPO

173

175 UNTIL ROO1 gt 10 VARY P001 FROM 1 By Comments

In this example RO01 a 2 dimensional array 177 DO with 10 rows and 10 columns is attempting

178 UNTIL R002 - 10 VARY BO FROM 1 BY 1 to update the smaller (5 X 5) TA2 Nonshy- Uexistent element references caused the error

The program text has been superimposed on this

page for illustration purposes

180 ALLOCATE NOOP = TA1 (RO0] ROO)I

JAI UPDATE TA2 (RO01 R002) RO031

182 LIST N002 = R0037 NOO R003

1P3 LIST VALUES OF TA1 = RO03v TAl (RO01 R002)

184 LIST

185 LIST ITA2=TA2 RO01 Rn02)l

l6 END-DOI LIST

187 END-DO

deg r- -- shy

APPENJDIXI1 J( j

CONTROL AREA INPUTS READ OUTPUTS BY OUTPUT LOMANO OUTPUTS TEltCOMMNDfN

SEQUENCE NUMBER 11T A QUESTIONNAIRES RECORDS QUEST KEYWORD RECORDS SFf- 0 T rr-

JOB STOPPED DUETO USER STOP-RUN COMMAND ON LINC 215 - - -- 2 15 _ _ 4$ A

EDITOR -- NO RMAL END OF J B r T I1)

214 IF IN-RECORD-COUNT gt Comments In this example when the internal variable1215 ITHEN STOP-RUN END-THEN was than

215 TE END-THEN IN-RECORD-COUNT was greater than 500 processing was directed by the source COBOL CONCOR language

216 1 program to cease The program text has been - upon this page for illustrationI _-EC superimposed

S IF IN-RECORD-COUNT- lt 2 - - purposes ~~~~~~~~~~- bull - - ------I-

218~ THEN -LIST lt(THAN 20RECS INR

amp-~~~~~ ntai-beDcent

I

APPENDIX J

DIAGNOSTIC MEFSAGE GUIDE EXAMPLE

APPENDIX J DD-2

WARNING(DD-001) BLANK COMMAND STATEMENT LINE

EXPLANATION A blank command statement line was found in the user Dictionary Division command statement file

ACTION TAKEN Parsing continued with the next command statement line in the Dictionary Division command file

USER RESPONSE Put a comment indicator () in column 1 of the command statement line if spacing is desired if not remove the command statement line

ERROR (DD-002) ILLEGAL FIRST CHARACTER IN STRING

EXPLANATION A character not in the CONCOR legal character set (A-Z 0-9 - + the comma character () the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found as the first character in the string

ACTION TAKEN Parsing began again with the next string

USER RESPONSE Check for possible keying errors and ensure that the first character is a member of the valid CONCOR

character set

ERROR (DD-003) END OF LINE REACHED WITH NO DELIMITER FOR ILLEGAL STRING

EXPLANATION A string starting with a character not valid in the CONCOR legal character set (A-Z 0-9 - + the comma character ()p the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found without one of the delimiter characters ( =) when the end of the command line was reached

ACTION TAKEN Parsing began again with the next user command statement line in the Dictionary Division command file

USER RESPONSE Check for possible keying errors and ensure that each of the characters in the string is a member of the CONCOR valid character set (A-Z 0-9 - + the comma character () he delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ())

ERROR (DD-004) COMMENT INDICATOR () IN STRING

EXPLANATION The comment indicator character () was found embedded in a string

Page 19: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …

APPENDIX A

Bucen Enforcement Proposal

APPENDIX A

BUCEN IN-HOUSE REVIEW ENHANCEMENT PROPOSAL

1 Easy to use interrecord referencing

2 Improved output file capabilities

A provide overflow protection on WRITE command

B provide OUTPUT command to create corrected questionnaire file that matches IP file data dictionary

3 Improvededit statistics reported (LISTERR)

A provide automatic (user-specified) area break

B provide options for compilation and displaying edit statistics at various levels

C provide automatic (user-specified) tolerance checking of error rates by area

D automatically capture IDs of areas failing tolerance check

4 Clean up known bugs in code

5 Comprehensive testing

6 Clean up and enhance documentation

A reference manual more examples error message guide

B installation guide

C systems manual

APPENDIX B

EVALUATIVE CRITERIA

APPEiUIX B [4 si 11 T q tII

UNITED STATES GOVERNMENshy

Pfern0 KGa 7 dll TO DSPOPFPSD Robert H Haladay

DATE December 3 1979

DSPOPDEIO Liliane Floge

SUBJECT Criteria for Evaluation of SuCen COBOL CONCOR September 30thVersion as presented at January 1930 orkshop

The following points should be considered when evaluatingCONCOR the COBOLediting system during participation at the January 19380 workshyshop

1 Are the Installation anual the System ilanual and theUsers anual ready for use Are these manuals adequate fortheir purposes Are the manuals clear and consistentthey be understood by subject-matter personnel Can

as well program ers

as

2 Is the editing system capable of impleaenting all thefeatures as described in the documenit_tion 3 Does the system appc3Ar to be ely debugged andtested and ready for release to dLvlooing conris 4 Does the system appear to be fast enough to edit a

census in a reasonable airount of time 5 hat size core does tine sys t- equire

6 CorFm-ent in general on coole te-nss usefulness andefficiency of ir-nease usethe n of by developingcountry prograrrers and other c nsus personnel

cc Su-a-- e Olds APiA SDi OrWJulio Ortuizar luotolaCelta Systems

APPENDIX C

WORKSHOP ITINERARY

APPENDIX C

CONCOR Workshop Schedule January 7-18 1980

U S Bureau of the Census International Statistical Programs Center

Scuderi Building Room 209 4235 28th Avenue Marlow Heights Maryland

Monday January 7

930 am shy 1000 Welcoming Remarks

Overview of Workshop

1000 - 1000 Introduction to CONCOR - Purpose and function

- History of development - General computer

requirements

115 - 200 System Description

-Constraints in design of CONCOR

-Basic subsystems of CONCOR

-User interactions with system

-Examples of outputs produced

1030 shy 1045 Break

1045 - 1200 Editing Concepts - Ways to interrogate

data - Ways to correct data

- Editing housing and

population data - POPSTAN

- Advantages of CONCOR

1200 shy 115 pm Break

200 - 230

230 - 245

245 - 325

User Program Organization -Divisions - Sections

- Routines - Commands

Break

Command Language

Description -Types of statements -Format -Syntax

is

2

Tuesday January 8

Dictionary Division Command Statements

115 pm-2 15 Input-Record-Section930 am-10

30 Punctuation - Define-RecordInput data referencing

Working-Data-SectionWorking data declara-- New-Datations - Array-DataInternal data represen-

tation and storage Break215 - 230

1030 - 1045 Break 230 - 325 Dictionary Examples

- Minimum dictionary1045 - 1200 Dictionary-Attributes-

structureSection - Maximum dictionary- Dictionary-Name

structure

- Hand out dictionaryFile-Section problem- Input-File

- Output-File - Write-File

- Error-File

1200 - 115 pm Break

Wednesday January 9

115 pm-2 1 5 Execution Division Command

930 am-1030 Discussion of Partici-Statements (continued)pants Dictionary

- Routines of Edit-Specishyproblems fications-Section

Free work time -Prolog-Routine -Filter-Routine -Epilog-Routine

- Types and functions 1030 - 1045 Break

of edit specification

commands1045 - 1200 Execution Division

Command Statements - Range- Punctuation - Assert- Subscripting

- Internal Identifiers - Report-Control-Section

-Generate-Edit-Statistics 215 - 230 Break-Count-Imputes

-Examples 230 - 325 - PassFail clauses

- List1200 - 115 pm Break

3

Thursday January 10

930 am-1030 Discussion of Problems - If115 pm-215

Free work time - UntilExit - Stop

1030 - 1045 Break 215 - 230 Break

1045 shy 1200 Edit Specification Command Statements 230 - 325 - Drecode

(continued) - Grecode

- Allocate - Update - Let

1200 - 115pm Break

Friday January 11

930 am-1030 Edit Specification 115 pm-3 25 Hand out Edit Specishyfications as problemCommand Statements

(continued) Free work time

- Output - Write

1030 - 1045 Break

1045 - 1200 Report Division Command Statements - Display-Control-

Section -Display-Edit-Statistics

- Tolerance-Control-Section -Error-Rate-Check -Reject-File

-Report Examples

1200 - 115 pm Break

I

29

Monday January 14

930 am-1030 Discuss procedures for running problems on computer

1030-1045 Break

1045-1200 Component Programs of the CONCOR system

1200- 115 pm Break

Tuesday January 15

930 am - 325 pm Free work time

Wednesday January 16

930 am 1200 Free work time

1200- 115 pm Break

115 pm-215 How to Install CONCOR on IBM 360370 OS

215- 230 Break

230-325 Free work time

Thursday January 17

930 am-325 Free work time

115 pm-215 Future CONCOR Design Considerations - for census processing - for survey processing

- manual correction system

215- 230 Break

230 - 245 Evaluation Guidelines

- Hand out evaluation forms

245 - 325 Free work time

Friday January 18

930 am-1030 Free work time 115-325 Free work time

1030 - 1045 Break

1045 - 1200 Group Discussion on CONCOR - submission of written evaluations Distribution of IBM installation tapes Presentation of Certificates to Participants

1200-115 pm Break

APPENDIX D

PARTICIPANTS

APPENDIX D

CONCOR WORKSHOP January 7 - 18 1980 Marlow Heights MD

PARTICIPANTS

KENYA

James Midianga Government Computer Centre Central Bureau of Statistics PO Box 30266 Nairobi Kenya

PANAMA

Omar Rivera Rios Data Processing Specialist Contraloria General de la Republica de Panama Apartado Postal 5213 Panama 5 Panama

PHILIPPINES Guida T Capellan OIC Computer Programming Division National Census and Statistics Office Solicarel Bldg 1 Magsaysay Blvd

Sta Mesa Manila Philippines

THAILAND

Angsumal Sunalai National Statistical Office Bangkok 1 Thailand

EGYPT

Farag Sedky Mourad Ghaleb CAPMAS (Centeral Agency for Public

Mobilization and Statistics)NASR City Cairo Egypt

L

2 31

SAUDI ARABIA

Shargi R Al-Shargi National Computer Center PO Box 2534 Riyadh Saudi Arabia

Ahmed S Al-Ghamdi Dept Director of National Computer Center PO Box 2534 Riyadh Saudi Arabia

OTHER

Robert W ONeal USREPJECOR APO New York NY 09038

John N Adams Data Processing Advisor DACCA Department of State Washington DC 20520

OR co UNDP PO Box 224 Dacca Bangladesh

Joe Quasney US Census Bureau Washington DC 20233

Howard Brunsman 5715 N Ninth St Arlington VA 22205

Larry Shiller and Julio Ortuzar Delta Systems Consultants Inc 264 Alhambra Circle Coral Cables FL 33134

Nicholas Ourusoff Burpee Hill Road New London NH 03257 (United Nations)

Frederic J Grant IV Systems Development GeorgiaWorld Congress Institute 580 North Omni International Atlanta Georgia 30303

3 32

Rafael Samper Mary Jo Keenan Catherine B Gleason LACDR Room 2246 NS Washington DC 20523

John Marshall AIDSERDMPSE Rm 706C SA-12 Washington DC 20523

Susanne Bacon and Mary K Friday ISPCData Processing US Census Bureau Washington DC 20233

Leo Dougherty ISPCWorld Census Staff US Bureau of the Census Washington DC 20233

33

CON C OR WORKSHOP

January 7-18 1979 Washington DC

Staff

Robert R Bair

Luis Garcia

David Malkovsky

Sandra Mansfield

Selma Sawaya

Vivian Toro

APPENDIX E

CONCOR EVALUATION FORM

0

APPENDIX E

CONCOR WORKSHOP January 16 1980

BASIC COBOL VERSION 2

December 1979 Release

Participant Evaluation of Package

1 What is your impression of this version of CONCOR in terms of its usefulness and reliability for editing housing and population census data in less developed countries

2 If you are familiar with previous versions of CONCOR how does this current version compare to them

2D

35

2

3 How would you compare the utility of this version of CONCOR for less developed countries with other systems for editing data of which you are familiar

4 How well do you feel the documentation (Reference Manual and Diagnostic Messages Guide) supports the software

36

3

5 How well do you think this version of CONCOR would serve for editing other types of statistical censuses or surveys

6 If your organization does statistical data processing by computer would you recommend to your superiors that this software be used to edit the data Do you feel assistance would be required in the installation of this software on your organizations computer andor in training userR on the packages applicability to their work

37

7 In what way(s) could improvements or enhancements be made to this version

of the CONCOR software andor its documentation

8 Other comments

1shy

APPENDIX F

NEWOLD COMMAND COMPARISONS

0 0

OLD CONCOR VERSION 1 DECEMBER 1978

DATA-DICTIONARY

DEFINE INPUT-FILE

DEFINE ERROR-FILE

DEFINE COMMON-DATA

(contains the questionnaire-IDand record type location information)

DEFINE RECORD-TYPE=

DEFINE NEW-DATA

DEFINE ARRAY-DATA

APPENDIX F

NEW CONCOR VERSION 2 DECEMBER 1979

DICTIONARY-DIVISION

DICTIONARY-ATTRIBUTES-SECTION

DICTIONARY-NAME

FILE-SECTION

INPUT-FILE OUTPUT-FILE WRITE-FILE ERROR-FILE

DOENTIFICATION-CONTROL-SECTION

AREA-CONTROL

QUESTIONNAIRE-CONTROL

RECORD-CONTROL

INPUT-RECORD-SECTION

DEFINE-RECORD

WORKING-DATA-SECTION

NEW-DATA ARRAY-DATA

END-DiVISION

COMMENTS

In Version 1 all command keyword labels had to begin in column 1 ofthe coding sheet The position notation of Version 2 permits thecommand to begin in columns 1-72

The period preceding a statement

signifies that it is but a commentcard Accordingly it should be noted that while the END-DIVISION command must be present the division name identifier DICTIONARY-DIVISION

has not been implemented

Note Some parameters for each commandhave been altered even where general correspondence exists

OLD CONCOR VERSION 1 DECEMBER 1978

EDIT-SPECIFICATION

PROLOG

FILTER TYPE ( )

EPILOG

ALLOCATE (ALL) ASSERT (AST)

END ERROR

FILTER TYPE ( ) WITH

IF LET

RANGE (RNG) RECODE (REC) STOP --keyword

UPDATE (UPD) WRITE (WRT) XRECODE (XREC)

NEW CONCOR VERSION 2 DECEMBER 1979

EXECUTION-DIVISION

REPORT-CONTROL-SECTION

COUNT-IMPUTES GENERATE-EDIT-STATISTICS

EDIT-SPECIFICATION-SECTION

PROLOG-ROUTINE

FILTER-ROUTINE (name-of-record-type)

EPILOG-ROUTINE

ALLOCATE (ALLOC) ASSERT (ASRT) DRECODE (DRCD) END-DIVISION

EXIT

GRECODE (GRCD) IF LET LIST OUTPUT RANGE (RNG)

STOP UNTIL UPDATE (UPD) WRITE (WRT)

END-DIVISION

COMMENTS

Periods preceding division and section names signify that they are treated as comments by CONCOR

END-DIVISION signifies the termination of the CONCOR division organization and must always be present even though division identifiers have not been implemented

This figure additionally permits a comparison of the EDIT-SPECI-FICATION commands of both CONCOR language versions Note the changes in abbreviations and that some keyword options (not shown) have been altered even where general corresponshydence exists Individual commands are discussed on the following pages

CONCOR LANGUAGE COMMAND STATEMENTS

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

ALLOCATE (ALL) ALLOCATE (ALLOC) Allows for the assignment of values and with the UPDATE command providesthe imputation mechanism Major difference is this statement now providesfor a receiving-identifier list and statistics are generated as coded inthe GENERATE-EDIT-STATISTICS command option of the EXECUTION-DIVISION

ASSERT fAST) ASSERT (ASRT) The ASSERT command is the basic consistency editor command of CONCOR Command now provides for a NOERROR and MESSAGE option Additionallyby utilizinq the DUMPR DUMPO keywords provides the user with the option of using all the current value of all user-identifiers defined for the record type being processed Other changes to this command include the pass end-pass fail end-fail constructions for the P purpose of maintaining structured logic

(see RECODE) DRECODE (DRCD) Provides a method for recoding data items having a starting value of zero and continue in ascending sequence Replaces t previous XRECODE command

EXIT Provides the means to terminate EXECUTION (ESCAPE) of command statements within the UNTIL command

END END-IF END-THEN

A solitary END command is no lonqer permitted in conditional construction Note the similarity to COBOL pseudocode

END-ELSE

ERROR not implemented No longer supported in language

(continued)

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

FILTER TYPE ( ) WITH not implemented No longer are option statements supported in language This command once specified essentially a GOTO depending on construction

(see XRECODE) GRECODE (GRCD) This new command provides a method for the group recoding of input values

IF IF The IF statement has properties found virtually in all programming languages ie provides a means of evaluating a condition and controlling the execution of alternative logical statements Similar to pseudocode a CONCOR programmer must utilize the structured command IF THEN END-THEN ELSE END-ELSE construction in place of the IF THEN END ELSE END statements acceptable to the 1978 version

LET LET Virtually unchanged

LIST New command which allows for the generation of specific messages to be displayed in the report of edit statistics by questionnaire as well as specific individual identifiers

OUTPUT New command which provides the means by which records will be written to the output-file

(continuej)

4)bull

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 VERSION 1

RANGE (RNG)

RECODE (REC)

STOP- keyword

UPDATE (UPD)

(continued)

DECEMBER 1979 VERSION 2

RANGE (RNG)

name changed

STOP- keyword

UNTIL DO END-DO

UPDATE (UPD)

COMMENTS

As a basic editing command of CONCOR the December 1979 version permits the listing of the out-of-range values as well as the entire record or questionnaire through DUMPR and DUMPO optionswithin the command Another new option of this command involves the utilization of a PASS END-PASS FAIL END-FAIL construction permitting the specification of logicalcommand paths depending upon the outcome of the range test

Name changed to DRECODE Virtually identical with the DRC command in the COCENTS system

Unchanged

DO-END-DO one of the most significant and powerful enhancements to the CONCOR language this command provides a loopingcapability similar to most other high-level pogramminglanguages The number of iterations is under user control as well as the value initially assigned to the user identifier (counter) which is incremented with each successive execution of the command statements

Virtually unchanged it is the basic mechanism for maintaining arrays involved in imputation processes

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

WRITE (WRT) WRITE (WRT) The WRITE language statement while once utilized as the primary output command is now used to create an auxiliary or derivative file Statistics may be accumulated with the specification of the GENERATE-EDIT-STATISTICS option of the EXECUTION-DIVISION (See WRITE-FILE of DICTIONARY-DIVISION)

XRECODE (XREC) name changed Old command which has become the GRECODE command in the December 1979 version

0 0 0

DECEMBER 1978 VERSION 1

DECEMBER 1979 VERSION 2

Not implemented REPORT-DIViSION

DISPLAY-CONTROL-SECTION

DISPLAY-EDIT-STATISTICS

TOLERANCE-CONTROL-SECTION

ERROR-RATE-CHECK REJECT-FILE

END-DIVISION

COMMENTS

This new division (note division and section names treated as comments) provides the means to controlthe type of edit statistics reports to be generated Generally all reports produced by the commands ofthis division are organized according to the AREA-CONTROL command of the DICTIONARY-DIVISION Notethat this means the input data file must be preproshycessed (presorted external to CONCOR) on the basisof the AREA-CONTROL data field as CONCOR is incapableof merging uncontiguous records

There is currently no command to enable users to specify unique report headings on the listingfrom this division

APPENDIX G

ISPC FUTURE ENHANCEMENT LIST

G-1

Appendix G

Future Versions of CONCOR Design Considerations

The present version of the CONCOR system (Basic COBOL Version 2) was designed to provide adequate computing power to properly edit housing and population census da a using automatic correction techniquns As a result CONCOR should not be construed to be the ultimate software package for generalized data editing The developers of this version of CONCOR realize that there is a group of improvements that can be made to the system which would facilitate the census data editing process A second group of changes or enhancements also exist that could facilitate the editing of survey or other types of census data It should be noted that some of the modifications to meet these two objectives at times are mutually exclusive Some of the possible modifications and considerations are outlined below

1 Housing and Population Census Processing

1 1 Software

1111 Dictionary Division

(1) Implementation of the Dictionary Attributes Section This would allow the user to specify the dictionary size (number of pages) on disk and control other file characteristics such as volume specification and retention periods if applicable

(2) Addition of an optional input andor output file(s) which could be used to initialize and save values stored in arrays and used during the allocation process This feature would require additional syntax analysis and the creation of new commands to perform the movement of the data values

(3) An output record format area (Output Record Section) could be added This would allow the creation of edited output files in a different format and length than the file read into CONCOR This implementation would require other enhanceients to provide a simple means of moving values from th-iT i tato tile output file Unique naming of user-identifiers would also need to be addressed Another possiblity is the declaration of both the input and output location in the same command when specifying the contents of tile data record to be edited

(4) Addition of user-identifier naires to fields used in the controlling of summary statistics and unique questionnaire determination This would require modification to the system level data dictionary

46 G-2

(5) Increasing the number of area and questionnaire control fields

This implementation would require modification lo the

dictionary as tell as to the error records passed to the Report

Division Another problem would be the formatting of the extra

data values in the report headings

(6) Permit the mixing of difFerent data item types in the CONTROL

commands This would allow the greatest flexibilty in choosing

control fields In order to implement this enhancement the

dictionary and the generated COBOL program (EDITOR) would

require modification

(7) Give the user access to the values in the control fields during

the editing process Either the user would be prohibited from

changing these values (as is currently done with CONCOR

internal identifiers) to prevent the creation of new questionnaires or the summary information gathered on a

questionnaire basis would become meaningless Another

ramification of this enhancement is how to handle fields defined as alphanumeric as CONCOR internally works in binary

(8) Allow input data records with a similar format to be defined by

the same DEFINE-RECORD command statement This could be

accomplished by permitting multiple values in the RECORD-TYPE

clause of the DEFINE-RECORD command A possible problem here

is in the determination of which value to be move to the output

record when the record is to be written out using the OUTPUT

command 0

(9) Implement a COMMION-DATA concept to handle items command to all

record types This could be costly in terms of core storage

and additional conversion time if a storage area is allocated

for each identifier for each record type Only permitting one

common area for all records would require validation of the

data to ensure the values were exactly the same on each record

This wJould require the CONCOR program to perform many more

internal checks when reading the data file into the store area

(10) Permit the selective inclusion of data items found in the

DEFINE-RECORD command statement into the universe count used

for the determination of tolerance levels when using the COUNT-IIPUTES command statement

(11) Increase the number and types of data format referencing now

permitted External numeric data (type N) could be expanded to

handle 18 digit numbers Also signed numeric and packed

decimal data format could be added The signed numeric format

would allow both leading blanks and negative data values This

format is essential ihen dealing with data files generated by

FORTRAN programs

0

+

G-3 47

(12) Increase the number of comparison strings permitted for alphanumeric data items This would allow for easier processing of alphanumeric strings but would require modifications to the system data dictionary

(13) If a range of valid values were specified with the declaration of a user-identifier in the DEFINE-RECORD command an automatic range check could be performed prior to CONCOR giving the user access to the data items on the questionnaire The inclusion of such values are now done by many users in the form of comment statements This enhancement would also facilitate the use of the dictionary by tabulation software

(14) The language format used in the Dictionary Division could be modified to recognize both the space and comma character as valid delimiters The user would still be required to code two commas to receive the system default value for any parameter

(15) Provide a repetition factor in the initialization of arrays

(16) Print the data dictionary name in the titles of the dictionary documentation or provide a means by which the user may specify a title to appear in the listings

112 Execution Division

(M) Implement the Run Control Section This would allow the user to increase execution speed of the EDITOR program This would be accomplished by eliminating some of the system protection features such as division by zero and out of bounds checking In this secion the user could also specify new maximum values for the EDITOR error limits

(2) Provide an internal identifier (OCCURRENCE-PTR) which would contain the occurrence number of the record currently being filtered The only problem to resolve is how to treat this id-ntifier outside of the Filter Routines

(3) Implement a CREATE commnand This command would allow the user to create a record that Was not found on the input file This implementation would require special care as it gives the user the ability to change the valme of a CONCOR internal identifier (TYPE-CGUNT) and modify the contents of the store Another consequence of this command is what action is to be taken in the calculation of tolerance in terms of this new records inclusion into the universe of observations and number of changes

iK

48 G-4

(4) The code generated in the EDITOR program for the DRECODE

command should be modified to utilized a lookup table approach

This would decrease the execution time of the command but woul C

require additional communication between the GENED and GENDD

programs

(5) Permit the user to continue the message text field over

successive lines of code

(6) Implementation of a SET command command that would change

CONCORs default occurrence pointer for the duration of the SET

command This command would be ever helpful as the validation

of the existence of a record would only have to be done once

when the SET command was encounterd The major drawback to

this command is in the users ability to remember the action

taken during the last execution of the command when the

Execution Division command statements may cover many pages of

code

(7) Implement a method of specifying different data formats for

fields on the WRITE command statement This would give greater

flexibility to the user but would require major revision to the

routines now used to process the WRITE command

(8) Provide the user with two sets of internal identifiers The

first set currently exists in the system and is valid on the

basis of the current control area (if specified) being

executed The second set would be accumulated on a run total

basis and would require the creation of new CONCOR internal

ident ifiers

(9) Implement automatic array declarations for capturing allocation

frequency distributions This could be done by the system for

discrete values (especially if point number 13 above is

implemented) but a mechanism for handling continuous values

would need to be designed nother program would then be

required to format the information gathered into a readable

form

113 Report Division

(I) Provide a means by which the user may specify their own headings or titles -for the reports

(2) Give the user the ability to override page ejection as a means

of saving paper

(3) Provide a neans by which the statistics gathered may be

presented in other aqgregations other than the area-break and

total levels now pro ided This would require another

procedure to aggregate the statistics and pass them onto the

Report Division before the printing of the reports began Thi

extra procedure is required because of program size

cons i derat ions

G-5

49

114 System Level

(1) Generation of assembler (ALC) language code for the EDITOR program to be used on the IBM 360370 series type computers

(2) Modification of the GENSRC program i utilize a lookup table instead of the linear search technioa2 now used

(3) Modification of the RDMSGTXT program to look at continuation lines when searching for the text to be supplied by the system

(4) Hlodification to the system data dictionary to better arrange and make available information needed most frequently Also to examine the possibility of making the section dealing with the message text and report generation a separate file Investigate the industry standards for dictionary format and see how CONCOR meets the requirements set forth

(5) Make the parsing and syntacial routines interactive so as to increase the product ity of the CONCOR user This does not mean to imply that the editing process itself should be made interactive but rather only the development of the user code should be put online

12 Documentation

(1)The development of a self-teaching CONCOR manual

(2) The development of a CONCOR Users Guide

G-6

0

50

2 Survey or other Census Processing

21 Software

(1) Make optional the specification of the record type and questionnaire identification information as some files are not

hierarchical in nature

(2) Provide another input file as a means of doing a check-in procedure of sample cases expected This new input file would contain the master list of those oberservations selected to be

examined

(3) Allow floating point calculations

(4) Provide a mechanism for referencing repetitive sections of questionnaire This referencing (loop letter approach) could be accomplished in the same fashion as interrecord checking is now implemented Note that by using this approach two versions of the software would be required One version would be as CONCOR is now implemented and the second would accept flat data files where the subscript indicated a repetitive section on the survey document

(5) Add a weighting scheme to allow meaningful totals and summary

statistics based upon the sampling frame used to gather the data being edited

(6) Increase the number of user-identifiers allowed in the system because of the more detailed information found on survey forms

(7) Provide a means of manual correction of erroneous data items This correction scheme would utilize the data dictionary and allow for correction based upon the user-identifer name and

questionnaire identification CELADE has a COBOL version of

this program but it has not been finished nor tested

42

APPENDIX H

CONCOR SYSTEM INTERNAL VARIABLES

0 APPENO H

CONCOR SYSTEM INTERNAL VARIABLES

1 EOF-FLAG

2 ERROR-TN-QUESTIONNAIRE

3 CURRENT-POINTER-VALUE

4 TYPE-COUNT ( )

5 RECORD-COUNT

6 QUESTIONNAIRE-COUNT

7 RECORDS-IN-STORE

8 INVALID-RECORD-TYPE-COUNT

9 INCOMPLETE-FLAG

10 CONTINUATION-FLAG

11 INVALID-RECORD-FLAG

EOF-FLAG

ERRORS-IN-QUESTIONNAIRE

TYPE-COUNT ( )

IN-RECORD-COUNT

IN-QUESTIONNAIRE-COUNT

RECORDS-IN-STORE

INVALID-RECORD-TYPE-COUNT

INCOMPLETE-FLAG

CONTINUATION-FLAG

OUT-OF-RANGE

NOT-NUMBER

BLANK

(new internal counters) OUT-RECORD-COUNT OUTPUT-QUEST-COUNT WRITE-RECORD-COUNT

Change in spelling of variable

Not mentioned--possible ommission from manual

Not implemented

Note Current documentation equivocates when some variables

are reset--most are initialized with each control area break

Though implemented as reserved identifiers due to poor documentation of (old) CONCOR exact values and utility were unknown

These apparently new counters permit access to valuable totals within control areas

Note These are not cumulative counters Also outputshyquest-count violates naming conversion eg in out

APPENDIX I

CONCOR-EDITOR EXECUTION STATISTICS

C 0 N C o R - E D I T O R EXECUT ION S-TA-T1 ST I CS RUN DATEt 517I

APPENDIX I

CONTROL AREA 0 INPUTS RELD OUTPUTS BY OUTPUTCOHMAND OUTPUTS BY W0ITECOMP4JD NO VALI 0 SEQUENCE NUMBER 00000 00 0 RECTYPES 0

OUESTIONNAIRES RECORDS 0 QUEST KEYWORD- 0 ECORDS- RECORDS FORQUFSTshy

bullDIVISION BY ZERO LINE NUMBER i18 to DIVISION BY ZERO LINE NUMBER 118 bo DIVISION BY ZERO 00 LINE NUMBER 118

Oboo DIVISION BY ZERO 00 LINE NUMBER 118 LTNE |UJMP 000 DIVISION BY ZERO 000 LINE NUMBER 118

bo DIVISION BY ZERO 0 LINE NUMBER 118 00 RUN ABORTED DUE TO DIVISION BY ZERO ERRORS 00 1it LIST IR003 = Ro3t

1 6 42

112 LIST

Comments One of the most sever criticisms 114 LIST IDIVISIION BY ZErO ERROR Cw FOLLOWS of the old CONCOR package was that ]l LET RO01 003 = O it would cease processing for no

apparent reason--ABEND without generating any messages The most )16 LIST iRO01 R003 SET TO ZEROS Pou19 Ron3 conmon types of cessations are division by zero array coordinate 117 LIST RO0 0 = errors and user specified terminations A program to deliberately 118 LET P003 = R002 RO01 test CONCORS ability to providediagnostics was written The system 119 LIST LIST R003 R002RO01 =1 R0031 perfoned exceptionally well and correctly provided the line numbers in the 120 source program which caused the data exceptions The program text has been 121 superimposed upon the Execution Statistics page for illustration purposes 122 END TEST OF OPERANDS AND MSG

123

124

1 5 PEGISTERS RESET TO ZERO

125 LET R001 RO00 R003 = nt

0 N C 1 - Z D I T 0 it - ( r S T A T I S T I C S QJi DAT 0119

I ~ - ~APPENDIX b)APPENDIX TL I A Cr14A) 01 3TUTS JRtTF -0 VtLIQ AP-A I PUT R - 3ITD JTS LY UTPUT - Y C

I)ESTIjAAIES - PECLRW cU] I S T igry-t QEC UP )S irrkqS F] ) T

VLLII= FF= 15( C 1 u U-1 Tr JJ TI A)[ 9 )i 1= TI =WATE ) L i - U = VI L IJ) L T J3r-= I n

D 1 4A 1 EItR P IXSTI J A 1 L = 7 II I fIAT E I i Y C

= y 4 I 19f CZ U AT -Z-2j 1ESTIJh14 E CJ)PI) IATE= 2 V L 1E= L T I P1 =UM 11

= rU F r

F R L- -- I JE TI rjA OrPR eTE= rUMiF=E II V1 LIIF= LIT V M

tYfZU ttj gR R QUIESTI3Ii1 j Vi r IATE= LI n l FJw== 0R V J L T C ( -pJIT E~Q - cUESTT Y2ZjI 1= g rfiflUI)IATL= VeLUJE= P LIf PP= 1 3)

bull~ - A ED 011 1 Tn CCy -tENCE C rGR DI T FOR UP S )r 0 JCCUJ0EICF FJ71 ECGPO

173

175 UNTIL ROO1 gt 10 VARY P001 FROM 1 By Comments

In this example RO01 a 2 dimensional array 177 DO with 10 rows and 10 columns is attempting

178 UNTIL R002 - 10 VARY BO FROM 1 BY 1 to update the smaller (5 X 5) TA2 Nonshy- Uexistent element references caused the error

The program text has been superimposed on this

page for illustration purposes

180 ALLOCATE NOOP = TA1 (RO0] ROO)I

JAI UPDATE TA2 (RO01 R002) RO031

182 LIST N002 = R0037 NOO R003

1P3 LIST VALUES OF TA1 = RO03v TAl (RO01 R002)

184 LIST

185 LIST ITA2=TA2 RO01 Rn02)l

l6 END-DOI LIST

187 END-DO

deg r- -- shy

APPENJDIXI1 J( j

CONTROL AREA INPUTS READ OUTPUTS BY OUTPUT LOMANO OUTPUTS TEltCOMMNDfN

SEQUENCE NUMBER 11T A QUESTIONNAIRES RECORDS QUEST KEYWORD RECORDS SFf- 0 T rr-

JOB STOPPED DUETO USER STOP-RUN COMMAND ON LINC 215 - - -- 2 15 _ _ 4$ A

EDITOR -- NO RMAL END OF J B r T I1)

214 IF IN-RECORD-COUNT gt Comments In this example when the internal variable1215 ITHEN STOP-RUN END-THEN was than

215 TE END-THEN IN-RECORD-COUNT was greater than 500 processing was directed by the source COBOL CONCOR language

216 1 program to cease The program text has been - upon this page for illustrationI _-EC superimposed

S IF IN-RECORD-COUNT- lt 2 - - purposes ~~~~~~~~~~- bull - - ------I-

218~ THEN -LIST lt(THAN 20RECS INR

amp-~~~~~ ntai-beDcent

I

APPENDIX J

DIAGNOSTIC MEFSAGE GUIDE EXAMPLE

APPENDIX J DD-2

WARNING(DD-001) BLANK COMMAND STATEMENT LINE

EXPLANATION A blank command statement line was found in the user Dictionary Division command statement file

ACTION TAKEN Parsing continued with the next command statement line in the Dictionary Division command file

USER RESPONSE Put a comment indicator () in column 1 of the command statement line if spacing is desired if not remove the command statement line

ERROR (DD-002) ILLEGAL FIRST CHARACTER IN STRING

EXPLANATION A character not in the CONCOR legal character set (A-Z 0-9 - + the comma character () the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found as the first character in the string

ACTION TAKEN Parsing began again with the next string

USER RESPONSE Check for possible keying errors and ensure that the first character is a member of the valid CONCOR

character set

ERROR (DD-003) END OF LINE REACHED WITH NO DELIMITER FOR ILLEGAL STRING

EXPLANATION A string starting with a character not valid in the CONCOR legal character set (A-Z 0-9 - + the comma character ()p the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found without one of the delimiter characters ( =) when the end of the command line was reached

ACTION TAKEN Parsing began again with the next user command statement line in the Dictionary Division command file

USER RESPONSE Check for possible keying errors and ensure that each of the characters in the string is a member of the CONCOR valid character set (A-Z 0-9 - + the comma character () he delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ())

ERROR (DD-004) COMMENT INDICATOR () IN STRING

EXPLANATION The comment indicator character () was found embedded in a string

Page 20: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …

APPENDIX A

BUCEN IN-HOUSE REVIEW ENHANCEMENT PROPOSAL

1 Easy to use interrecord referencing

2 Improved output file capabilities

A provide overflow protection on WRITE command

B provide OUTPUT command to create corrected questionnaire file that matches IP file data dictionary

3 Improvededit statistics reported (LISTERR)

A provide automatic (user-specified) area break

B provide options for compilation and displaying edit statistics at various levels

C provide automatic (user-specified) tolerance checking of error rates by area

D automatically capture IDs of areas failing tolerance check

4 Clean up known bugs in code

5 Comprehensive testing

6 Clean up and enhance documentation

A reference manual more examples error message guide

B installation guide

C systems manual

APPENDIX B

EVALUATIVE CRITERIA

APPEiUIX B [4 si 11 T q tII

UNITED STATES GOVERNMENshy

Pfern0 KGa 7 dll TO DSPOPFPSD Robert H Haladay

DATE December 3 1979

DSPOPDEIO Liliane Floge

SUBJECT Criteria for Evaluation of SuCen COBOL CONCOR September 30thVersion as presented at January 1930 orkshop

The following points should be considered when evaluatingCONCOR the COBOLediting system during participation at the January 19380 workshyshop

1 Are the Installation anual the System ilanual and theUsers anual ready for use Are these manuals adequate fortheir purposes Are the manuals clear and consistentthey be understood by subject-matter personnel Can

as well program ers

as

2 Is the editing system capable of impleaenting all thefeatures as described in the documenit_tion 3 Does the system appc3Ar to be ely debugged andtested and ready for release to dLvlooing conris 4 Does the system appear to be fast enough to edit a

census in a reasonable airount of time 5 hat size core does tine sys t- equire

6 CorFm-ent in general on coole te-nss usefulness andefficiency of ir-nease usethe n of by developingcountry prograrrers and other c nsus personnel

cc Su-a-- e Olds APiA SDi OrWJulio Ortuizar luotolaCelta Systems

APPENDIX C

WORKSHOP ITINERARY

APPENDIX C

CONCOR Workshop Schedule January 7-18 1980

U S Bureau of the Census International Statistical Programs Center

Scuderi Building Room 209 4235 28th Avenue Marlow Heights Maryland

Monday January 7

930 am shy 1000 Welcoming Remarks

Overview of Workshop

1000 - 1000 Introduction to CONCOR - Purpose and function

- History of development - General computer

requirements

115 - 200 System Description

-Constraints in design of CONCOR

-Basic subsystems of CONCOR

-User interactions with system

-Examples of outputs produced

1030 shy 1045 Break

1045 - 1200 Editing Concepts - Ways to interrogate

data - Ways to correct data

- Editing housing and

population data - POPSTAN

- Advantages of CONCOR

1200 shy 115 pm Break

200 - 230

230 - 245

245 - 325

User Program Organization -Divisions - Sections

- Routines - Commands

Break

Command Language

Description -Types of statements -Format -Syntax

is

2

Tuesday January 8

Dictionary Division Command Statements

115 pm-2 15 Input-Record-Section930 am-10

30 Punctuation - Define-RecordInput data referencing

Working-Data-SectionWorking data declara-- New-Datations - Array-DataInternal data represen-

tation and storage Break215 - 230

1030 - 1045 Break 230 - 325 Dictionary Examples

- Minimum dictionary1045 - 1200 Dictionary-Attributes-

structureSection - Maximum dictionary- Dictionary-Name

structure

- Hand out dictionaryFile-Section problem- Input-File

- Output-File - Write-File

- Error-File

1200 - 115 pm Break

Wednesday January 9

115 pm-2 1 5 Execution Division Command

930 am-1030 Discussion of Partici-Statements (continued)pants Dictionary

- Routines of Edit-Specishyproblems fications-Section

Free work time -Prolog-Routine -Filter-Routine -Epilog-Routine

- Types and functions 1030 - 1045 Break

of edit specification

commands1045 - 1200 Execution Division

Command Statements - Range- Punctuation - Assert- Subscripting

- Internal Identifiers - Report-Control-Section

-Generate-Edit-Statistics 215 - 230 Break-Count-Imputes

-Examples 230 - 325 - PassFail clauses

- List1200 - 115 pm Break

3

Thursday January 10

930 am-1030 Discussion of Problems - If115 pm-215

Free work time - UntilExit - Stop

1030 - 1045 Break 215 - 230 Break

1045 shy 1200 Edit Specification Command Statements 230 - 325 - Drecode

(continued) - Grecode

- Allocate - Update - Let

1200 - 115pm Break

Friday January 11

930 am-1030 Edit Specification 115 pm-3 25 Hand out Edit Specishyfications as problemCommand Statements

(continued) Free work time

- Output - Write

1030 - 1045 Break

1045 - 1200 Report Division Command Statements - Display-Control-

Section -Display-Edit-Statistics

- Tolerance-Control-Section -Error-Rate-Check -Reject-File

-Report Examples

1200 - 115 pm Break

I

29

Monday January 14

930 am-1030 Discuss procedures for running problems on computer

1030-1045 Break

1045-1200 Component Programs of the CONCOR system

1200- 115 pm Break

Tuesday January 15

930 am - 325 pm Free work time

Wednesday January 16

930 am 1200 Free work time

1200- 115 pm Break

115 pm-215 How to Install CONCOR on IBM 360370 OS

215- 230 Break

230-325 Free work time

Thursday January 17

930 am-325 Free work time

115 pm-215 Future CONCOR Design Considerations - for census processing - for survey processing

- manual correction system

215- 230 Break

230 - 245 Evaluation Guidelines

- Hand out evaluation forms

245 - 325 Free work time

Friday January 18

930 am-1030 Free work time 115-325 Free work time

1030 - 1045 Break

1045 - 1200 Group Discussion on CONCOR - submission of written evaluations Distribution of IBM installation tapes Presentation of Certificates to Participants

1200-115 pm Break

APPENDIX D

PARTICIPANTS

APPENDIX D

CONCOR WORKSHOP January 7 - 18 1980 Marlow Heights MD

PARTICIPANTS

KENYA

James Midianga Government Computer Centre Central Bureau of Statistics PO Box 30266 Nairobi Kenya

PANAMA

Omar Rivera Rios Data Processing Specialist Contraloria General de la Republica de Panama Apartado Postal 5213 Panama 5 Panama

PHILIPPINES Guida T Capellan OIC Computer Programming Division National Census and Statistics Office Solicarel Bldg 1 Magsaysay Blvd

Sta Mesa Manila Philippines

THAILAND

Angsumal Sunalai National Statistical Office Bangkok 1 Thailand

EGYPT

Farag Sedky Mourad Ghaleb CAPMAS (Centeral Agency for Public

Mobilization and Statistics)NASR City Cairo Egypt

L

2 31

SAUDI ARABIA

Shargi R Al-Shargi National Computer Center PO Box 2534 Riyadh Saudi Arabia

Ahmed S Al-Ghamdi Dept Director of National Computer Center PO Box 2534 Riyadh Saudi Arabia

OTHER

Robert W ONeal USREPJECOR APO New York NY 09038

John N Adams Data Processing Advisor DACCA Department of State Washington DC 20520

OR co UNDP PO Box 224 Dacca Bangladesh

Joe Quasney US Census Bureau Washington DC 20233

Howard Brunsman 5715 N Ninth St Arlington VA 22205

Larry Shiller and Julio Ortuzar Delta Systems Consultants Inc 264 Alhambra Circle Coral Cables FL 33134

Nicholas Ourusoff Burpee Hill Road New London NH 03257 (United Nations)

Frederic J Grant IV Systems Development GeorgiaWorld Congress Institute 580 North Omni International Atlanta Georgia 30303

3 32

Rafael Samper Mary Jo Keenan Catherine B Gleason LACDR Room 2246 NS Washington DC 20523

John Marshall AIDSERDMPSE Rm 706C SA-12 Washington DC 20523

Susanne Bacon and Mary K Friday ISPCData Processing US Census Bureau Washington DC 20233

Leo Dougherty ISPCWorld Census Staff US Bureau of the Census Washington DC 20233

33

CON C OR WORKSHOP

January 7-18 1979 Washington DC

Staff

Robert R Bair

Luis Garcia

David Malkovsky

Sandra Mansfield

Selma Sawaya

Vivian Toro

APPENDIX E

CONCOR EVALUATION FORM

0

APPENDIX E

CONCOR WORKSHOP January 16 1980

BASIC COBOL VERSION 2

December 1979 Release

Participant Evaluation of Package

1 What is your impression of this version of CONCOR in terms of its usefulness and reliability for editing housing and population census data in less developed countries

2 If you are familiar with previous versions of CONCOR how does this current version compare to them

2D

35

2

3 How would you compare the utility of this version of CONCOR for less developed countries with other systems for editing data of which you are familiar

4 How well do you feel the documentation (Reference Manual and Diagnostic Messages Guide) supports the software

36

3

5 How well do you think this version of CONCOR would serve for editing other types of statistical censuses or surveys

6 If your organization does statistical data processing by computer would you recommend to your superiors that this software be used to edit the data Do you feel assistance would be required in the installation of this software on your organizations computer andor in training userR on the packages applicability to their work

37

7 In what way(s) could improvements or enhancements be made to this version

of the CONCOR software andor its documentation

8 Other comments

1shy

APPENDIX F

NEWOLD COMMAND COMPARISONS

0 0

OLD CONCOR VERSION 1 DECEMBER 1978

DATA-DICTIONARY

DEFINE INPUT-FILE

DEFINE ERROR-FILE

DEFINE COMMON-DATA

(contains the questionnaire-IDand record type location information)

DEFINE RECORD-TYPE=

DEFINE NEW-DATA

DEFINE ARRAY-DATA

APPENDIX F

NEW CONCOR VERSION 2 DECEMBER 1979

DICTIONARY-DIVISION

DICTIONARY-ATTRIBUTES-SECTION

DICTIONARY-NAME

FILE-SECTION

INPUT-FILE OUTPUT-FILE WRITE-FILE ERROR-FILE

DOENTIFICATION-CONTROL-SECTION

AREA-CONTROL

QUESTIONNAIRE-CONTROL

RECORD-CONTROL

INPUT-RECORD-SECTION

DEFINE-RECORD

WORKING-DATA-SECTION

NEW-DATA ARRAY-DATA

END-DiVISION

COMMENTS

In Version 1 all command keyword labels had to begin in column 1 ofthe coding sheet The position notation of Version 2 permits thecommand to begin in columns 1-72

The period preceding a statement

signifies that it is but a commentcard Accordingly it should be noted that while the END-DIVISION command must be present the division name identifier DICTIONARY-DIVISION

has not been implemented

Note Some parameters for each commandhave been altered even where general correspondence exists

OLD CONCOR VERSION 1 DECEMBER 1978

EDIT-SPECIFICATION

PROLOG

FILTER TYPE ( )

EPILOG

ALLOCATE (ALL) ASSERT (AST)

END ERROR

FILTER TYPE ( ) WITH

IF LET

RANGE (RNG) RECODE (REC) STOP --keyword

UPDATE (UPD) WRITE (WRT) XRECODE (XREC)

NEW CONCOR VERSION 2 DECEMBER 1979

EXECUTION-DIVISION

REPORT-CONTROL-SECTION

COUNT-IMPUTES GENERATE-EDIT-STATISTICS

EDIT-SPECIFICATION-SECTION

PROLOG-ROUTINE

FILTER-ROUTINE (name-of-record-type)

EPILOG-ROUTINE

ALLOCATE (ALLOC) ASSERT (ASRT) DRECODE (DRCD) END-DIVISION

EXIT

GRECODE (GRCD) IF LET LIST OUTPUT RANGE (RNG)

STOP UNTIL UPDATE (UPD) WRITE (WRT)

END-DIVISION

COMMENTS

Periods preceding division and section names signify that they are treated as comments by CONCOR

END-DIVISION signifies the termination of the CONCOR division organization and must always be present even though division identifiers have not been implemented

This figure additionally permits a comparison of the EDIT-SPECI-FICATION commands of both CONCOR language versions Note the changes in abbreviations and that some keyword options (not shown) have been altered even where general corresponshydence exists Individual commands are discussed on the following pages

CONCOR LANGUAGE COMMAND STATEMENTS

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

ALLOCATE (ALL) ALLOCATE (ALLOC) Allows for the assignment of values and with the UPDATE command providesthe imputation mechanism Major difference is this statement now providesfor a receiving-identifier list and statistics are generated as coded inthe GENERATE-EDIT-STATISTICS command option of the EXECUTION-DIVISION

ASSERT fAST) ASSERT (ASRT) The ASSERT command is the basic consistency editor command of CONCOR Command now provides for a NOERROR and MESSAGE option Additionallyby utilizinq the DUMPR DUMPO keywords provides the user with the option of using all the current value of all user-identifiers defined for the record type being processed Other changes to this command include the pass end-pass fail end-fail constructions for the P purpose of maintaining structured logic

(see RECODE) DRECODE (DRCD) Provides a method for recoding data items having a starting value of zero and continue in ascending sequence Replaces t previous XRECODE command

EXIT Provides the means to terminate EXECUTION (ESCAPE) of command statements within the UNTIL command

END END-IF END-THEN

A solitary END command is no lonqer permitted in conditional construction Note the similarity to COBOL pseudocode

END-ELSE

ERROR not implemented No longer supported in language

(continued)

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

FILTER TYPE ( ) WITH not implemented No longer are option statements supported in language This command once specified essentially a GOTO depending on construction

(see XRECODE) GRECODE (GRCD) This new command provides a method for the group recoding of input values

IF IF The IF statement has properties found virtually in all programming languages ie provides a means of evaluating a condition and controlling the execution of alternative logical statements Similar to pseudocode a CONCOR programmer must utilize the structured command IF THEN END-THEN ELSE END-ELSE construction in place of the IF THEN END ELSE END statements acceptable to the 1978 version

LET LET Virtually unchanged

LIST New command which allows for the generation of specific messages to be displayed in the report of edit statistics by questionnaire as well as specific individual identifiers

OUTPUT New command which provides the means by which records will be written to the output-file

(continuej)

4)bull

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 VERSION 1

RANGE (RNG)

RECODE (REC)

STOP- keyword

UPDATE (UPD)

(continued)

DECEMBER 1979 VERSION 2

RANGE (RNG)

name changed

STOP- keyword

UNTIL DO END-DO

UPDATE (UPD)

COMMENTS

As a basic editing command of CONCOR the December 1979 version permits the listing of the out-of-range values as well as the entire record or questionnaire through DUMPR and DUMPO optionswithin the command Another new option of this command involves the utilization of a PASS END-PASS FAIL END-FAIL construction permitting the specification of logicalcommand paths depending upon the outcome of the range test

Name changed to DRECODE Virtually identical with the DRC command in the COCENTS system

Unchanged

DO-END-DO one of the most significant and powerful enhancements to the CONCOR language this command provides a loopingcapability similar to most other high-level pogramminglanguages The number of iterations is under user control as well as the value initially assigned to the user identifier (counter) which is incremented with each successive execution of the command statements

Virtually unchanged it is the basic mechanism for maintaining arrays involved in imputation processes

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

WRITE (WRT) WRITE (WRT) The WRITE language statement while once utilized as the primary output command is now used to create an auxiliary or derivative file Statistics may be accumulated with the specification of the GENERATE-EDIT-STATISTICS option of the EXECUTION-DIVISION (See WRITE-FILE of DICTIONARY-DIVISION)

XRECODE (XREC) name changed Old command which has become the GRECODE command in the December 1979 version

0 0 0

DECEMBER 1978 VERSION 1

DECEMBER 1979 VERSION 2

Not implemented REPORT-DIViSION

DISPLAY-CONTROL-SECTION

DISPLAY-EDIT-STATISTICS

TOLERANCE-CONTROL-SECTION

ERROR-RATE-CHECK REJECT-FILE

END-DIVISION

COMMENTS

This new division (note division and section names treated as comments) provides the means to controlthe type of edit statistics reports to be generated Generally all reports produced by the commands ofthis division are organized according to the AREA-CONTROL command of the DICTIONARY-DIVISION Notethat this means the input data file must be preproshycessed (presorted external to CONCOR) on the basisof the AREA-CONTROL data field as CONCOR is incapableof merging uncontiguous records

There is currently no command to enable users to specify unique report headings on the listingfrom this division

APPENDIX G

ISPC FUTURE ENHANCEMENT LIST

G-1

Appendix G

Future Versions of CONCOR Design Considerations

The present version of the CONCOR system (Basic COBOL Version 2) was designed to provide adequate computing power to properly edit housing and population census da a using automatic correction techniquns As a result CONCOR should not be construed to be the ultimate software package for generalized data editing The developers of this version of CONCOR realize that there is a group of improvements that can be made to the system which would facilitate the census data editing process A second group of changes or enhancements also exist that could facilitate the editing of survey or other types of census data It should be noted that some of the modifications to meet these two objectives at times are mutually exclusive Some of the possible modifications and considerations are outlined below

1 Housing and Population Census Processing

1 1 Software

1111 Dictionary Division

(1) Implementation of the Dictionary Attributes Section This would allow the user to specify the dictionary size (number of pages) on disk and control other file characteristics such as volume specification and retention periods if applicable

(2) Addition of an optional input andor output file(s) which could be used to initialize and save values stored in arrays and used during the allocation process This feature would require additional syntax analysis and the creation of new commands to perform the movement of the data values

(3) An output record format area (Output Record Section) could be added This would allow the creation of edited output files in a different format and length than the file read into CONCOR This implementation would require other enhanceients to provide a simple means of moving values from th-iT i tato tile output file Unique naming of user-identifiers would also need to be addressed Another possiblity is the declaration of both the input and output location in the same command when specifying the contents of tile data record to be edited

(4) Addition of user-identifier naires to fields used in the controlling of summary statistics and unique questionnaire determination This would require modification to the system level data dictionary

46 G-2

(5) Increasing the number of area and questionnaire control fields

This implementation would require modification lo the

dictionary as tell as to the error records passed to the Report

Division Another problem would be the formatting of the extra

data values in the report headings

(6) Permit the mixing of difFerent data item types in the CONTROL

commands This would allow the greatest flexibilty in choosing

control fields In order to implement this enhancement the

dictionary and the generated COBOL program (EDITOR) would

require modification

(7) Give the user access to the values in the control fields during

the editing process Either the user would be prohibited from

changing these values (as is currently done with CONCOR

internal identifiers) to prevent the creation of new questionnaires or the summary information gathered on a

questionnaire basis would become meaningless Another

ramification of this enhancement is how to handle fields defined as alphanumeric as CONCOR internally works in binary

(8) Allow input data records with a similar format to be defined by

the same DEFINE-RECORD command statement This could be

accomplished by permitting multiple values in the RECORD-TYPE

clause of the DEFINE-RECORD command A possible problem here

is in the determination of which value to be move to the output

record when the record is to be written out using the OUTPUT

command 0

(9) Implement a COMMION-DATA concept to handle items command to all

record types This could be costly in terms of core storage

and additional conversion time if a storage area is allocated

for each identifier for each record type Only permitting one

common area for all records would require validation of the

data to ensure the values were exactly the same on each record

This wJould require the CONCOR program to perform many more

internal checks when reading the data file into the store area

(10) Permit the selective inclusion of data items found in the

DEFINE-RECORD command statement into the universe count used

for the determination of tolerance levels when using the COUNT-IIPUTES command statement

(11) Increase the number and types of data format referencing now

permitted External numeric data (type N) could be expanded to

handle 18 digit numbers Also signed numeric and packed

decimal data format could be added The signed numeric format

would allow both leading blanks and negative data values This

format is essential ihen dealing with data files generated by

FORTRAN programs

0

+

G-3 47

(12) Increase the number of comparison strings permitted for alphanumeric data items This would allow for easier processing of alphanumeric strings but would require modifications to the system data dictionary

(13) If a range of valid values were specified with the declaration of a user-identifier in the DEFINE-RECORD command an automatic range check could be performed prior to CONCOR giving the user access to the data items on the questionnaire The inclusion of such values are now done by many users in the form of comment statements This enhancement would also facilitate the use of the dictionary by tabulation software

(14) The language format used in the Dictionary Division could be modified to recognize both the space and comma character as valid delimiters The user would still be required to code two commas to receive the system default value for any parameter

(15) Provide a repetition factor in the initialization of arrays

(16) Print the data dictionary name in the titles of the dictionary documentation or provide a means by which the user may specify a title to appear in the listings

112 Execution Division

(M) Implement the Run Control Section This would allow the user to increase execution speed of the EDITOR program This would be accomplished by eliminating some of the system protection features such as division by zero and out of bounds checking In this secion the user could also specify new maximum values for the EDITOR error limits

(2) Provide an internal identifier (OCCURRENCE-PTR) which would contain the occurrence number of the record currently being filtered The only problem to resolve is how to treat this id-ntifier outside of the Filter Routines

(3) Implement a CREATE commnand This command would allow the user to create a record that Was not found on the input file This implementation would require special care as it gives the user the ability to change the valme of a CONCOR internal identifier (TYPE-CGUNT) and modify the contents of the store Another consequence of this command is what action is to be taken in the calculation of tolerance in terms of this new records inclusion into the universe of observations and number of changes

iK

48 G-4

(4) The code generated in the EDITOR program for the DRECODE

command should be modified to utilized a lookup table approach

This would decrease the execution time of the command but woul C

require additional communication between the GENED and GENDD

programs

(5) Permit the user to continue the message text field over

successive lines of code

(6) Implementation of a SET command command that would change

CONCORs default occurrence pointer for the duration of the SET

command This command would be ever helpful as the validation

of the existence of a record would only have to be done once

when the SET command was encounterd The major drawback to

this command is in the users ability to remember the action

taken during the last execution of the command when the

Execution Division command statements may cover many pages of

code

(7) Implement a method of specifying different data formats for

fields on the WRITE command statement This would give greater

flexibility to the user but would require major revision to the

routines now used to process the WRITE command

(8) Provide the user with two sets of internal identifiers The

first set currently exists in the system and is valid on the

basis of the current control area (if specified) being

executed The second set would be accumulated on a run total

basis and would require the creation of new CONCOR internal

ident ifiers

(9) Implement automatic array declarations for capturing allocation

frequency distributions This could be done by the system for

discrete values (especially if point number 13 above is

implemented) but a mechanism for handling continuous values

would need to be designed nother program would then be

required to format the information gathered into a readable

form

113 Report Division

(I) Provide a means by which the user may specify their own headings or titles -for the reports

(2) Give the user the ability to override page ejection as a means

of saving paper

(3) Provide a neans by which the statistics gathered may be

presented in other aqgregations other than the area-break and

total levels now pro ided This would require another

procedure to aggregate the statistics and pass them onto the

Report Division before the printing of the reports began Thi

extra procedure is required because of program size

cons i derat ions

G-5

49

114 System Level

(1) Generation of assembler (ALC) language code for the EDITOR program to be used on the IBM 360370 series type computers

(2) Modification of the GENSRC program i utilize a lookup table instead of the linear search technioa2 now used

(3) Modification of the RDMSGTXT program to look at continuation lines when searching for the text to be supplied by the system

(4) Hlodification to the system data dictionary to better arrange and make available information needed most frequently Also to examine the possibility of making the section dealing with the message text and report generation a separate file Investigate the industry standards for dictionary format and see how CONCOR meets the requirements set forth

(5) Make the parsing and syntacial routines interactive so as to increase the product ity of the CONCOR user This does not mean to imply that the editing process itself should be made interactive but rather only the development of the user code should be put online

12 Documentation

(1)The development of a self-teaching CONCOR manual

(2) The development of a CONCOR Users Guide

G-6

0

50

2 Survey or other Census Processing

21 Software

(1) Make optional the specification of the record type and questionnaire identification information as some files are not

hierarchical in nature

(2) Provide another input file as a means of doing a check-in procedure of sample cases expected This new input file would contain the master list of those oberservations selected to be

examined

(3) Allow floating point calculations

(4) Provide a mechanism for referencing repetitive sections of questionnaire This referencing (loop letter approach) could be accomplished in the same fashion as interrecord checking is now implemented Note that by using this approach two versions of the software would be required One version would be as CONCOR is now implemented and the second would accept flat data files where the subscript indicated a repetitive section on the survey document

(5) Add a weighting scheme to allow meaningful totals and summary

statistics based upon the sampling frame used to gather the data being edited

(6) Increase the number of user-identifiers allowed in the system because of the more detailed information found on survey forms

(7) Provide a means of manual correction of erroneous data items This correction scheme would utilize the data dictionary and allow for correction based upon the user-identifer name and

questionnaire identification CELADE has a COBOL version of

this program but it has not been finished nor tested

42

APPENDIX H

CONCOR SYSTEM INTERNAL VARIABLES

0 APPENO H

CONCOR SYSTEM INTERNAL VARIABLES

1 EOF-FLAG

2 ERROR-TN-QUESTIONNAIRE

3 CURRENT-POINTER-VALUE

4 TYPE-COUNT ( )

5 RECORD-COUNT

6 QUESTIONNAIRE-COUNT

7 RECORDS-IN-STORE

8 INVALID-RECORD-TYPE-COUNT

9 INCOMPLETE-FLAG

10 CONTINUATION-FLAG

11 INVALID-RECORD-FLAG

EOF-FLAG

ERRORS-IN-QUESTIONNAIRE

TYPE-COUNT ( )

IN-RECORD-COUNT

IN-QUESTIONNAIRE-COUNT

RECORDS-IN-STORE

INVALID-RECORD-TYPE-COUNT

INCOMPLETE-FLAG

CONTINUATION-FLAG

OUT-OF-RANGE

NOT-NUMBER

BLANK

(new internal counters) OUT-RECORD-COUNT OUTPUT-QUEST-COUNT WRITE-RECORD-COUNT

Change in spelling of variable

Not mentioned--possible ommission from manual

Not implemented

Note Current documentation equivocates when some variables

are reset--most are initialized with each control area break

Though implemented as reserved identifiers due to poor documentation of (old) CONCOR exact values and utility were unknown

These apparently new counters permit access to valuable totals within control areas

Note These are not cumulative counters Also outputshyquest-count violates naming conversion eg in out

APPENDIX I

CONCOR-EDITOR EXECUTION STATISTICS

C 0 N C o R - E D I T O R EXECUT ION S-TA-T1 ST I CS RUN DATEt 517I

APPENDIX I

CONTROL AREA 0 INPUTS RELD OUTPUTS BY OUTPUTCOHMAND OUTPUTS BY W0ITECOMP4JD NO VALI 0 SEQUENCE NUMBER 00000 00 0 RECTYPES 0

OUESTIONNAIRES RECORDS 0 QUEST KEYWORD- 0 ECORDS- RECORDS FORQUFSTshy

bullDIVISION BY ZERO LINE NUMBER i18 to DIVISION BY ZERO LINE NUMBER 118 bo DIVISION BY ZERO 00 LINE NUMBER 118

Oboo DIVISION BY ZERO 00 LINE NUMBER 118 LTNE |UJMP 000 DIVISION BY ZERO 000 LINE NUMBER 118

bo DIVISION BY ZERO 0 LINE NUMBER 118 00 RUN ABORTED DUE TO DIVISION BY ZERO ERRORS 00 1it LIST IR003 = Ro3t

1 6 42

112 LIST

Comments One of the most sever criticisms 114 LIST IDIVISIION BY ZErO ERROR Cw FOLLOWS of the old CONCOR package was that ]l LET RO01 003 = O it would cease processing for no

apparent reason--ABEND without generating any messages The most )16 LIST iRO01 R003 SET TO ZEROS Pou19 Ron3 conmon types of cessations are division by zero array coordinate 117 LIST RO0 0 = errors and user specified terminations A program to deliberately 118 LET P003 = R002 RO01 test CONCORS ability to providediagnostics was written The system 119 LIST LIST R003 R002RO01 =1 R0031 perfoned exceptionally well and correctly provided the line numbers in the 120 source program which caused the data exceptions The program text has been 121 superimposed upon the Execution Statistics page for illustration purposes 122 END TEST OF OPERANDS AND MSG

123

124

1 5 PEGISTERS RESET TO ZERO

125 LET R001 RO00 R003 = nt

0 N C 1 - Z D I T 0 it - ( r S T A T I S T I C S QJi DAT 0119

I ~ - ~APPENDIX b)APPENDIX TL I A Cr14A) 01 3TUTS JRtTF -0 VtLIQ AP-A I PUT R - 3ITD JTS LY UTPUT - Y C

I)ESTIjAAIES - PECLRW cU] I S T igry-t QEC UP )S irrkqS F] ) T

VLLII= FF= 15( C 1 u U-1 Tr JJ TI A)[ 9 )i 1= TI =WATE ) L i - U = VI L IJ) L T J3r-= I n

D 1 4A 1 EItR P IXSTI J A 1 L = 7 II I fIAT E I i Y C

= y 4 I 19f CZ U AT -Z-2j 1ESTIJh14 E CJ)PI) IATE= 2 V L 1E= L T I P1 =UM 11

= rU F r

F R L- -- I JE TI rjA OrPR eTE= rUMiF=E II V1 LIIF= LIT V M

tYfZU ttj gR R QUIESTI3Ii1 j Vi r IATE= LI n l FJw== 0R V J L T C ( -pJIT E~Q - cUESTT Y2ZjI 1= g rfiflUI)IATL= VeLUJE= P LIf PP= 1 3)

bull~ - A ED 011 1 Tn CCy -tENCE C rGR DI T FOR UP S )r 0 JCCUJ0EICF FJ71 ECGPO

173

175 UNTIL ROO1 gt 10 VARY P001 FROM 1 By Comments

In this example RO01 a 2 dimensional array 177 DO with 10 rows and 10 columns is attempting

178 UNTIL R002 - 10 VARY BO FROM 1 BY 1 to update the smaller (5 X 5) TA2 Nonshy- Uexistent element references caused the error

The program text has been superimposed on this

page for illustration purposes

180 ALLOCATE NOOP = TA1 (RO0] ROO)I

JAI UPDATE TA2 (RO01 R002) RO031

182 LIST N002 = R0037 NOO R003

1P3 LIST VALUES OF TA1 = RO03v TAl (RO01 R002)

184 LIST

185 LIST ITA2=TA2 RO01 Rn02)l

l6 END-DOI LIST

187 END-DO

deg r- -- shy

APPENJDIXI1 J( j

CONTROL AREA INPUTS READ OUTPUTS BY OUTPUT LOMANO OUTPUTS TEltCOMMNDfN

SEQUENCE NUMBER 11T A QUESTIONNAIRES RECORDS QUEST KEYWORD RECORDS SFf- 0 T rr-

JOB STOPPED DUETO USER STOP-RUN COMMAND ON LINC 215 - - -- 2 15 _ _ 4$ A

EDITOR -- NO RMAL END OF J B r T I1)

214 IF IN-RECORD-COUNT gt Comments In this example when the internal variable1215 ITHEN STOP-RUN END-THEN was than

215 TE END-THEN IN-RECORD-COUNT was greater than 500 processing was directed by the source COBOL CONCOR language

216 1 program to cease The program text has been - upon this page for illustrationI _-EC superimposed

S IF IN-RECORD-COUNT- lt 2 - - purposes ~~~~~~~~~~- bull - - ------I-

218~ THEN -LIST lt(THAN 20RECS INR

amp-~~~~~ ntai-beDcent

I

APPENDIX J

DIAGNOSTIC MEFSAGE GUIDE EXAMPLE

APPENDIX J DD-2

WARNING(DD-001) BLANK COMMAND STATEMENT LINE

EXPLANATION A blank command statement line was found in the user Dictionary Division command statement file

ACTION TAKEN Parsing continued with the next command statement line in the Dictionary Division command file

USER RESPONSE Put a comment indicator () in column 1 of the command statement line if spacing is desired if not remove the command statement line

ERROR (DD-002) ILLEGAL FIRST CHARACTER IN STRING

EXPLANATION A character not in the CONCOR legal character set (A-Z 0-9 - + the comma character () the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found as the first character in the string

ACTION TAKEN Parsing began again with the next string

USER RESPONSE Check for possible keying errors and ensure that the first character is a member of the valid CONCOR

character set

ERROR (DD-003) END OF LINE REACHED WITH NO DELIMITER FOR ILLEGAL STRING

EXPLANATION A string starting with a character not valid in the CONCOR legal character set (A-Z 0-9 - + the comma character ()p the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found without one of the delimiter characters ( =) when the end of the command line was reached

ACTION TAKEN Parsing began again with the next user command statement line in the Dictionary Division command file

USER RESPONSE Check for possible keying errors and ensure that each of the characters in the string is a member of the CONCOR valid character set (A-Z 0-9 - + the comma character () he delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ())

ERROR (DD-004) COMMENT INDICATOR () IN STRING

EXPLANATION The comment indicator character () was found embedded in a string

Page 21: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …

APPENDIX B

EVALUATIVE CRITERIA

APPEiUIX B [4 si 11 T q tII

UNITED STATES GOVERNMENshy

Pfern0 KGa 7 dll TO DSPOPFPSD Robert H Haladay

DATE December 3 1979

DSPOPDEIO Liliane Floge

SUBJECT Criteria for Evaluation of SuCen COBOL CONCOR September 30thVersion as presented at January 1930 orkshop

The following points should be considered when evaluatingCONCOR the COBOLediting system during participation at the January 19380 workshyshop

1 Are the Installation anual the System ilanual and theUsers anual ready for use Are these manuals adequate fortheir purposes Are the manuals clear and consistentthey be understood by subject-matter personnel Can

as well program ers

as

2 Is the editing system capable of impleaenting all thefeatures as described in the documenit_tion 3 Does the system appc3Ar to be ely debugged andtested and ready for release to dLvlooing conris 4 Does the system appear to be fast enough to edit a

census in a reasonable airount of time 5 hat size core does tine sys t- equire

6 CorFm-ent in general on coole te-nss usefulness andefficiency of ir-nease usethe n of by developingcountry prograrrers and other c nsus personnel

cc Su-a-- e Olds APiA SDi OrWJulio Ortuizar luotolaCelta Systems

APPENDIX C

WORKSHOP ITINERARY

APPENDIX C

CONCOR Workshop Schedule January 7-18 1980

U S Bureau of the Census International Statistical Programs Center

Scuderi Building Room 209 4235 28th Avenue Marlow Heights Maryland

Monday January 7

930 am shy 1000 Welcoming Remarks

Overview of Workshop

1000 - 1000 Introduction to CONCOR - Purpose and function

- History of development - General computer

requirements

115 - 200 System Description

-Constraints in design of CONCOR

-Basic subsystems of CONCOR

-User interactions with system

-Examples of outputs produced

1030 shy 1045 Break

1045 - 1200 Editing Concepts - Ways to interrogate

data - Ways to correct data

- Editing housing and

population data - POPSTAN

- Advantages of CONCOR

1200 shy 115 pm Break

200 - 230

230 - 245

245 - 325

User Program Organization -Divisions - Sections

- Routines - Commands

Break

Command Language

Description -Types of statements -Format -Syntax

is

2

Tuesday January 8

Dictionary Division Command Statements

115 pm-2 15 Input-Record-Section930 am-10

30 Punctuation - Define-RecordInput data referencing

Working-Data-SectionWorking data declara-- New-Datations - Array-DataInternal data represen-

tation and storage Break215 - 230

1030 - 1045 Break 230 - 325 Dictionary Examples

- Minimum dictionary1045 - 1200 Dictionary-Attributes-

structureSection - Maximum dictionary- Dictionary-Name

structure

- Hand out dictionaryFile-Section problem- Input-File

- Output-File - Write-File

- Error-File

1200 - 115 pm Break

Wednesday January 9

115 pm-2 1 5 Execution Division Command

930 am-1030 Discussion of Partici-Statements (continued)pants Dictionary

- Routines of Edit-Specishyproblems fications-Section

Free work time -Prolog-Routine -Filter-Routine -Epilog-Routine

- Types and functions 1030 - 1045 Break

of edit specification

commands1045 - 1200 Execution Division

Command Statements - Range- Punctuation - Assert- Subscripting

- Internal Identifiers - Report-Control-Section

-Generate-Edit-Statistics 215 - 230 Break-Count-Imputes

-Examples 230 - 325 - PassFail clauses

- List1200 - 115 pm Break

3

Thursday January 10

930 am-1030 Discussion of Problems - If115 pm-215

Free work time - UntilExit - Stop

1030 - 1045 Break 215 - 230 Break

1045 shy 1200 Edit Specification Command Statements 230 - 325 - Drecode

(continued) - Grecode

- Allocate - Update - Let

1200 - 115pm Break

Friday January 11

930 am-1030 Edit Specification 115 pm-3 25 Hand out Edit Specishyfications as problemCommand Statements

(continued) Free work time

- Output - Write

1030 - 1045 Break

1045 - 1200 Report Division Command Statements - Display-Control-

Section -Display-Edit-Statistics

- Tolerance-Control-Section -Error-Rate-Check -Reject-File

-Report Examples

1200 - 115 pm Break

I

29

Monday January 14

930 am-1030 Discuss procedures for running problems on computer

1030-1045 Break

1045-1200 Component Programs of the CONCOR system

1200- 115 pm Break

Tuesday January 15

930 am - 325 pm Free work time

Wednesday January 16

930 am 1200 Free work time

1200- 115 pm Break

115 pm-215 How to Install CONCOR on IBM 360370 OS

215- 230 Break

230-325 Free work time

Thursday January 17

930 am-325 Free work time

115 pm-215 Future CONCOR Design Considerations - for census processing - for survey processing

- manual correction system

215- 230 Break

230 - 245 Evaluation Guidelines

- Hand out evaluation forms

245 - 325 Free work time

Friday January 18

930 am-1030 Free work time 115-325 Free work time

1030 - 1045 Break

1045 - 1200 Group Discussion on CONCOR - submission of written evaluations Distribution of IBM installation tapes Presentation of Certificates to Participants

1200-115 pm Break

APPENDIX D

PARTICIPANTS

APPENDIX D

CONCOR WORKSHOP January 7 - 18 1980 Marlow Heights MD

PARTICIPANTS

KENYA

James Midianga Government Computer Centre Central Bureau of Statistics PO Box 30266 Nairobi Kenya

PANAMA

Omar Rivera Rios Data Processing Specialist Contraloria General de la Republica de Panama Apartado Postal 5213 Panama 5 Panama

PHILIPPINES Guida T Capellan OIC Computer Programming Division National Census and Statistics Office Solicarel Bldg 1 Magsaysay Blvd

Sta Mesa Manila Philippines

THAILAND

Angsumal Sunalai National Statistical Office Bangkok 1 Thailand

EGYPT

Farag Sedky Mourad Ghaleb CAPMAS (Centeral Agency for Public

Mobilization and Statistics)NASR City Cairo Egypt

L

2 31

SAUDI ARABIA

Shargi R Al-Shargi National Computer Center PO Box 2534 Riyadh Saudi Arabia

Ahmed S Al-Ghamdi Dept Director of National Computer Center PO Box 2534 Riyadh Saudi Arabia

OTHER

Robert W ONeal USREPJECOR APO New York NY 09038

John N Adams Data Processing Advisor DACCA Department of State Washington DC 20520

OR co UNDP PO Box 224 Dacca Bangladesh

Joe Quasney US Census Bureau Washington DC 20233

Howard Brunsman 5715 N Ninth St Arlington VA 22205

Larry Shiller and Julio Ortuzar Delta Systems Consultants Inc 264 Alhambra Circle Coral Cables FL 33134

Nicholas Ourusoff Burpee Hill Road New London NH 03257 (United Nations)

Frederic J Grant IV Systems Development GeorgiaWorld Congress Institute 580 North Omni International Atlanta Georgia 30303

3 32

Rafael Samper Mary Jo Keenan Catherine B Gleason LACDR Room 2246 NS Washington DC 20523

John Marshall AIDSERDMPSE Rm 706C SA-12 Washington DC 20523

Susanne Bacon and Mary K Friday ISPCData Processing US Census Bureau Washington DC 20233

Leo Dougherty ISPCWorld Census Staff US Bureau of the Census Washington DC 20233

33

CON C OR WORKSHOP

January 7-18 1979 Washington DC

Staff

Robert R Bair

Luis Garcia

David Malkovsky

Sandra Mansfield

Selma Sawaya

Vivian Toro

APPENDIX E

CONCOR EVALUATION FORM

0

APPENDIX E

CONCOR WORKSHOP January 16 1980

BASIC COBOL VERSION 2

December 1979 Release

Participant Evaluation of Package

1 What is your impression of this version of CONCOR in terms of its usefulness and reliability for editing housing and population census data in less developed countries

2 If you are familiar with previous versions of CONCOR how does this current version compare to them

2D

35

2

3 How would you compare the utility of this version of CONCOR for less developed countries with other systems for editing data of which you are familiar

4 How well do you feel the documentation (Reference Manual and Diagnostic Messages Guide) supports the software

36

3

5 How well do you think this version of CONCOR would serve for editing other types of statistical censuses or surveys

6 If your organization does statistical data processing by computer would you recommend to your superiors that this software be used to edit the data Do you feel assistance would be required in the installation of this software on your organizations computer andor in training userR on the packages applicability to their work

37

7 In what way(s) could improvements or enhancements be made to this version

of the CONCOR software andor its documentation

8 Other comments

1shy

APPENDIX F

NEWOLD COMMAND COMPARISONS

0 0

OLD CONCOR VERSION 1 DECEMBER 1978

DATA-DICTIONARY

DEFINE INPUT-FILE

DEFINE ERROR-FILE

DEFINE COMMON-DATA

(contains the questionnaire-IDand record type location information)

DEFINE RECORD-TYPE=

DEFINE NEW-DATA

DEFINE ARRAY-DATA

APPENDIX F

NEW CONCOR VERSION 2 DECEMBER 1979

DICTIONARY-DIVISION

DICTIONARY-ATTRIBUTES-SECTION

DICTIONARY-NAME

FILE-SECTION

INPUT-FILE OUTPUT-FILE WRITE-FILE ERROR-FILE

DOENTIFICATION-CONTROL-SECTION

AREA-CONTROL

QUESTIONNAIRE-CONTROL

RECORD-CONTROL

INPUT-RECORD-SECTION

DEFINE-RECORD

WORKING-DATA-SECTION

NEW-DATA ARRAY-DATA

END-DiVISION

COMMENTS

In Version 1 all command keyword labels had to begin in column 1 ofthe coding sheet The position notation of Version 2 permits thecommand to begin in columns 1-72

The period preceding a statement

signifies that it is but a commentcard Accordingly it should be noted that while the END-DIVISION command must be present the division name identifier DICTIONARY-DIVISION

has not been implemented

Note Some parameters for each commandhave been altered even where general correspondence exists

OLD CONCOR VERSION 1 DECEMBER 1978

EDIT-SPECIFICATION

PROLOG

FILTER TYPE ( )

EPILOG

ALLOCATE (ALL) ASSERT (AST)

END ERROR

FILTER TYPE ( ) WITH

IF LET

RANGE (RNG) RECODE (REC) STOP --keyword

UPDATE (UPD) WRITE (WRT) XRECODE (XREC)

NEW CONCOR VERSION 2 DECEMBER 1979

EXECUTION-DIVISION

REPORT-CONTROL-SECTION

COUNT-IMPUTES GENERATE-EDIT-STATISTICS

EDIT-SPECIFICATION-SECTION

PROLOG-ROUTINE

FILTER-ROUTINE (name-of-record-type)

EPILOG-ROUTINE

ALLOCATE (ALLOC) ASSERT (ASRT) DRECODE (DRCD) END-DIVISION

EXIT

GRECODE (GRCD) IF LET LIST OUTPUT RANGE (RNG)

STOP UNTIL UPDATE (UPD) WRITE (WRT)

END-DIVISION

COMMENTS

Periods preceding division and section names signify that they are treated as comments by CONCOR

END-DIVISION signifies the termination of the CONCOR division organization and must always be present even though division identifiers have not been implemented

This figure additionally permits a comparison of the EDIT-SPECI-FICATION commands of both CONCOR language versions Note the changes in abbreviations and that some keyword options (not shown) have been altered even where general corresponshydence exists Individual commands are discussed on the following pages

CONCOR LANGUAGE COMMAND STATEMENTS

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

ALLOCATE (ALL) ALLOCATE (ALLOC) Allows for the assignment of values and with the UPDATE command providesthe imputation mechanism Major difference is this statement now providesfor a receiving-identifier list and statistics are generated as coded inthe GENERATE-EDIT-STATISTICS command option of the EXECUTION-DIVISION

ASSERT fAST) ASSERT (ASRT) The ASSERT command is the basic consistency editor command of CONCOR Command now provides for a NOERROR and MESSAGE option Additionallyby utilizinq the DUMPR DUMPO keywords provides the user with the option of using all the current value of all user-identifiers defined for the record type being processed Other changes to this command include the pass end-pass fail end-fail constructions for the P purpose of maintaining structured logic

(see RECODE) DRECODE (DRCD) Provides a method for recoding data items having a starting value of zero and continue in ascending sequence Replaces t previous XRECODE command

EXIT Provides the means to terminate EXECUTION (ESCAPE) of command statements within the UNTIL command

END END-IF END-THEN

A solitary END command is no lonqer permitted in conditional construction Note the similarity to COBOL pseudocode

END-ELSE

ERROR not implemented No longer supported in language

(continued)

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

FILTER TYPE ( ) WITH not implemented No longer are option statements supported in language This command once specified essentially a GOTO depending on construction

(see XRECODE) GRECODE (GRCD) This new command provides a method for the group recoding of input values

IF IF The IF statement has properties found virtually in all programming languages ie provides a means of evaluating a condition and controlling the execution of alternative logical statements Similar to pseudocode a CONCOR programmer must utilize the structured command IF THEN END-THEN ELSE END-ELSE construction in place of the IF THEN END ELSE END statements acceptable to the 1978 version

LET LET Virtually unchanged

LIST New command which allows for the generation of specific messages to be displayed in the report of edit statistics by questionnaire as well as specific individual identifiers

OUTPUT New command which provides the means by which records will be written to the output-file

(continuej)

4)bull

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 VERSION 1

RANGE (RNG)

RECODE (REC)

STOP- keyword

UPDATE (UPD)

(continued)

DECEMBER 1979 VERSION 2

RANGE (RNG)

name changed

STOP- keyword

UNTIL DO END-DO

UPDATE (UPD)

COMMENTS

As a basic editing command of CONCOR the December 1979 version permits the listing of the out-of-range values as well as the entire record or questionnaire through DUMPR and DUMPO optionswithin the command Another new option of this command involves the utilization of a PASS END-PASS FAIL END-FAIL construction permitting the specification of logicalcommand paths depending upon the outcome of the range test

Name changed to DRECODE Virtually identical with the DRC command in the COCENTS system

Unchanged

DO-END-DO one of the most significant and powerful enhancements to the CONCOR language this command provides a loopingcapability similar to most other high-level pogramminglanguages The number of iterations is under user control as well as the value initially assigned to the user identifier (counter) which is incremented with each successive execution of the command statements

Virtually unchanged it is the basic mechanism for maintaining arrays involved in imputation processes

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

WRITE (WRT) WRITE (WRT) The WRITE language statement while once utilized as the primary output command is now used to create an auxiliary or derivative file Statistics may be accumulated with the specification of the GENERATE-EDIT-STATISTICS option of the EXECUTION-DIVISION (See WRITE-FILE of DICTIONARY-DIVISION)

XRECODE (XREC) name changed Old command which has become the GRECODE command in the December 1979 version

0 0 0

DECEMBER 1978 VERSION 1

DECEMBER 1979 VERSION 2

Not implemented REPORT-DIViSION

DISPLAY-CONTROL-SECTION

DISPLAY-EDIT-STATISTICS

TOLERANCE-CONTROL-SECTION

ERROR-RATE-CHECK REJECT-FILE

END-DIVISION

COMMENTS

This new division (note division and section names treated as comments) provides the means to controlthe type of edit statistics reports to be generated Generally all reports produced by the commands ofthis division are organized according to the AREA-CONTROL command of the DICTIONARY-DIVISION Notethat this means the input data file must be preproshycessed (presorted external to CONCOR) on the basisof the AREA-CONTROL data field as CONCOR is incapableof merging uncontiguous records

There is currently no command to enable users to specify unique report headings on the listingfrom this division

APPENDIX G

ISPC FUTURE ENHANCEMENT LIST

G-1

Appendix G

Future Versions of CONCOR Design Considerations

The present version of the CONCOR system (Basic COBOL Version 2) was designed to provide adequate computing power to properly edit housing and population census da a using automatic correction techniquns As a result CONCOR should not be construed to be the ultimate software package for generalized data editing The developers of this version of CONCOR realize that there is a group of improvements that can be made to the system which would facilitate the census data editing process A second group of changes or enhancements also exist that could facilitate the editing of survey or other types of census data It should be noted that some of the modifications to meet these two objectives at times are mutually exclusive Some of the possible modifications and considerations are outlined below

1 Housing and Population Census Processing

1 1 Software

1111 Dictionary Division

(1) Implementation of the Dictionary Attributes Section This would allow the user to specify the dictionary size (number of pages) on disk and control other file characteristics such as volume specification and retention periods if applicable

(2) Addition of an optional input andor output file(s) which could be used to initialize and save values stored in arrays and used during the allocation process This feature would require additional syntax analysis and the creation of new commands to perform the movement of the data values

(3) An output record format area (Output Record Section) could be added This would allow the creation of edited output files in a different format and length than the file read into CONCOR This implementation would require other enhanceients to provide a simple means of moving values from th-iT i tato tile output file Unique naming of user-identifiers would also need to be addressed Another possiblity is the declaration of both the input and output location in the same command when specifying the contents of tile data record to be edited

(4) Addition of user-identifier naires to fields used in the controlling of summary statistics and unique questionnaire determination This would require modification to the system level data dictionary

46 G-2

(5) Increasing the number of area and questionnaire control fields

This implementation would require modification lo the

dictionary as tell as to the error records passed to the Report

Division Another problem would be the formatting of the extra

data values in the report headings

(6) Permit the mixing of difFerent data item types in the CONTROL

commands This would allow the greatest flexibilty in choosing

control fields In order to implement this enhancement the

dictionary and the generated COBOL program (EDITOR) would

require modification

(7) Give the user access to the values in the control fields during

the editing process Either the user would be prohibited from

changing these values (as is currently done with CONCOR

internal identifiers) to prevent the creation of new questionnaires or the summary information gathered on a

questionnaire basis would become meaningless Another

ramification of this enhancement is how to handle fields defined as alphanumeric as CONCOR internally works in binary

(8) Allow input data records with a similar format to be defined by

the same DEFINE-RECORD command statement This could be

accomplished by permitting multiple values in the RECORD-TYPE

clause of the DEFINE-RECORD command A possible problem here

is in the determination of which value to be move to the output

record when the record is to be written out using the OUTPUT

command 0

(9) Implement a COMMION-DATA concept to handle items command to all

record types This could be costly in terms of core storage

and additional conversion time if a storage area is allocated

for each identifier for each record type Only permitting one

common area for all records would require validation of the

data to ensure the values were exactly the same on each record

This wJould require the CONCOR program to perform many more

internal checks when reading the data file into the store area

(10) Permit the selective inclusion of data items found in the

DEFINE-RECORD command statement into the universe count used

for the determination of tolerance levels when using the COUNT-IIPUTES command statement

(11) Increase the number and types of data format referencing now

permitted External numeric data (type N) could be expanded to

handle 18 digit numbers Also signed numeric and packed

decimal data format could be added The signed numeric format

would allow both leading blanks and negative data values This

format is essential ihen dealing with data files generated by

FORTRAN programs

0

+

G-3 47

(12) Increase the number of comparison strings permitted for alphanumeric data items This would allow for easier processing of alphanumeric strings but would require modifications to the system data dictionary

(13) If a range of valid values were specified with the declaration of a user-identifier in the DEFINE-RECORD command an automatic range check could be performed prior to CONCOR giving the user access to the data items on the questionnaire The inclusion of such values are now done by many users in the form of comment statements This enhancement would also facilitate the use of the dictionary by tabulation software

(14) The language format used in the Dictionary Division could be modified to recognize both the space and comma character as valid delimiters The user would still be required to code two commas to receive the system default value for any parameter

(15) Provide a repetition factor in the initialization of arrays

(16) Print the data dictionary name in the titles of the dictionary documentation or provide a means by which the user may specify a title to appear in the listings

112 Execution Division

(M) Implement the Run Control Section This would allow the user to increase execution speed of the EDITOR program This would be accomplished by eliminating some of the system protection features such as division by zero and out of bounds checking In this secion the user could also specify new maximum values for the EDITOR error limits

(2) Provide an internal identifier (OCCURRENCE-PTR) which would contain the occurrence number of the record currently being filtered The only problem to resolve is how to treat this id-ntifier outside of the Filter Routines

(3) Implement a CREATE commnand This command would allow the user to create a record that Was not found on the input file This implementation would require special care as it gives the user the ability to change the valme of a CONCOR internal identifier (TYPE-CGUNT) and modify the contents of the store Another consequence of this command is what action is to be taken in the calculation of tolerance in terms of this new records inclusion into the universe of observations and number of changes

iK

48 G-4

(4) The code generated in the EDITOR program for the DRECODE

command should be modified to utilized a lookup table approach

This would decrease the execution time of the command but woul C

require additional communication between the GENED and GENDD

programs

(5) Permit the user to continue the message text field over

successive lines of code

(6) Implementation of a SET command command that would change

CONCORs default occurrence pointer for the duration of the SET

command This command would be ever helpful as the validation

of the existence of a record would only have to be done once

when the SET command was encounterd The major drawback to

this command is in the users ability to remember the action

taken during the last execution of the command when the

Execution Division command statements may cover many pages of

code

(7) Implement a method of specifying different data formats for

fields on the WRITE command statement This would give greater

flexibility to the user but would require major revision to the

routines now used to process the WRITE command

(8) Provide the user with two sets of internal identifiers The

first set currently exists in the system and is valid on the

basis of the current control area (if specified) being

executed The second set would be accumulated on a run total

basis and would require the creation of new CONCOR internal

ident ifiers

(9) Implement automatic array declarations for capturing allocation

frequency distributions This could be done by the system for

discrete values (especially if point number 13 above is

implemented) but a mechanism for handling continuous values

would need to be designed nother program would then be

required to format the information gathered into a readable

form

113 Report Division

(I) Provide a means by which the user may specify their own headings or titles -for the reports

(2) Give the user the ability to override page ejection as a means

of saving paper

(3) Provide a neans by which the statistics gathered may be

presented in other aqgregations other than the area-break and

total levels now pro ided This would require another

procedure to aggregate the statistics and pass them onto the

Report Division before the printing of the reports began Thi

extra procedure is required because of program size

cons i derat ions

G-5

49

114 System Level

(1) Generation of assembler (ALC) language code for the EDITOR program to be used on the IBM 360370 series type computers

(2) Modification of the GENSRC program i utilize a lookup table instead of the linear search technioa2 now used

(3) Modification of the RDMSGTXT program to look at continuation lines when searching for the text to be supplied by the system

(4) Hlodification to the system data dictionary to better arrange and make available information needed most frequently Also to examine the possibility of making the section dealing with the message text and report generation a separate file Investigate the industry standards for dictionary format and see how CONCOR meets the requirements set forth

(5) Make the parsing and syntacial routines interactive so as to increase the product ity of the CONCOR user This does not mean to imply that the editing process itself should be made interactive but rather only the development of the user code should be put online

12 Documentation

(1)The development of a self-teaching CONCOR manual

(2) The development of a CONCOR Users Guide

G-6

0

50

2 Survey or other Census Processing

21 Software

(1) Make optional the specification of the record type and questionnaire identification information as some files are not

hierarchical in nature

(2) Provide another input file as a means of doing a check-in procedure of sample cases expected This new input file would contain the master list of those oberservations selected to be

examined

(3) Allow floating point calculations

(4) Provide a mechanism for referencing repetitive sections of questionnaire This referencing (loop letter approach) could be accomplished in the same fashion as interrecord checking is now implemented Note that by using this approach two versions of the software would be required One version would be as CONCOR is now implemented and the second would accept flat data files where the subscript indicated a repetitive section on the survey document

(5) Add a weighting scheme to allow meaningful totals and summary

statistics based upon the sampling frame used to gather the data being edited

(6) Increase the number of user-identifiers allowed in the system because of the more detailed information found on survey forms

(7) Provide a means of manual correction of erroneous data items This correction scheme would utilize the data dictionary and allow for correction based upon the user-identifer name and

questionnaire identification CELADE has a COBOL version of

this program but it has not been finished nor tested

42

APPENDIX H

CONCOR SYSTEM INTERNAL VARIABLES

0 APPENO H

CONCOR SYSTEM INTERNAL VARIABLES

1 EOF-FLAG

2 ERROR-TN-QUESTIONNAIRE

3 CURRENT-POINTER-VALUE

4 TYPE-COUNT ( )

5 RECORD-COUNT

6 QUESTIONNAIRE-COUNT

7 RECORDS-IN-STORE

8 INVALID-RECORD-TYPE-COUNT

9 INCOMPLETE-FLAG

10 CONTINUATION-FLAG

11 INVALID-RECORD-FLAG

EOF-FLAG

ERRORS-IN-QUESTIONNAIRE

TYPE-COUNT ( )

IN-RECORD-COUNT

IN-QUESTIONNAIRE-COUNT

RECORDS-IN-STORE

INVALID-RECORD-TYPE-COUNT

INCOMPLETE-FLAG

CONTINUATION-FLAG

OUT-OF-RANGE

NOT-NUMBER

BLANK

(new internal counters) OUT-RECORD-COUNT OUTPUT-QUEST-COUNT WRITE-RECORD-COUNT

Change in spelling of variable

Not mentioned--possible ommission from manual

Not implemented

Note Current documentation equivocates when some variables

are reset--most are initialized with each control area break

Though implemented as reserved identifiers due to poor documentation of (old) CONCOR exact values and utility were unknown

These apparently new counters permit access to valuable totals within control areas

Note These are not cumulative counters Also outputshyquest-count violates naming conversion eg in out

APPENDIX I

CONCOR-EDITOR EXECUTION STATISTICS

C 0 N C o R - E D I T O R EXECUT ION S-TA-T1 ST I CS RUN DATEt 517I

APPENDIX I

CONTROL AREA 0 INPUTS RELD OUTPUTS BY OUTPUTCOHMAND OUTPUTS BY W0ITECOMP4JD NO VALI 0 SEQUENCE NUMBER 00000 00 0 RECTYPES 0

OUESTIONNAIRES RECORDS 0 QUEST KEYWORD- 0 ECORDS- RECORDS FORQUFSTshy

bullDIVISION BY ZERO LINE NUMBER i18 to DIVISION BY ZERO LINE NUMBER 118 bo DIVISION BY ZERO 00 LINE NUMBER 118

Oboo DIVISION BY ZERO 00 LINE NUMBER 118 LTNE |UJMP 000 DIVISION BY ZERO 000 LINE NUMBER 118

bo DIVISION BY ZERO 0 LINE NUMBER 118 00 RUN ABORTED DUE TO DIVISION BY ZERO ERRORS 00 1it LIST IR003 = Ro3t

1 6 42

112 LIST

Comments One of the most sever criticisms 114 LIST IDIVISIION BY ZErO ERROR Cw FOLLOWS of the old CONCOR package was that ]l LET RO01 003 = O it would cease processing for no

apparent reason--ABEND without generating any messages The most )16 LIST iRO01 R003 SET TO ZEROS Pou19 Ron3 conmon types of cessations are division by zero array coordinate 117 LIST RO0 0 = errors and user specified terminations A program to deliberately 118 LET P003 = R002 RO01 test CONCORS ability to providediagnostics was written The system 119 LIST LIST R003 R002RO01 =1 R0031 perfoned exceptionally well and correctly provided the line numbers in the 120 source program which caused the data exceptions The program text has been 121 superimposed upon the Execution Statistics page for illustration purposes 122 END TEST OF OPERANDS AND MSG

123

124

1 5 PEGISTERS RESET TO ZERO

125 LET R001 RO00 R003 = nt

0 N C 1 - Z D I T 0 it - ( r S T A T I S T I C S QJi DAT 0119

I ~ - ~APPENDIX b)APPENDIX TL I A Cr14A) 01 3TUTS JRtTF -0 VtLIQ AP-A I PUT R - 3ITD JTS LY UTPUT - Y C

I)ESTIjAAIES - PECLRW cU] I S T igry-t QEC UP )S irrkqS F] ) T

VLLII= FF= 15( C 1 u U-1 Tr JJ TI A)[ 9 )i 1= TI =WATE ) L i - U = VI L IJ) L T J3r-= I n

D 1 4A 1 EItR P IXSTI J A 1 L = 7 II I fIAT E I i Y C

= y 4 I 19f CZ U AT -Z-2j 1ESTIJh14 E CJ)PI) IATE= 2 V L 1E= L T I P1 =UM 11

= rU F r

F R L- -- I JE TI rjA OrPR eTE= rUMiF=E II V1 LIIF= LIT V M

tYfZU ttj gR R QUIESTI3Ii1 j Vi r IATE= LI n l FJw== 0R V J L T C ( -pJIT E~Q - cUESTT Y2ZjI 1= g rfiflUI)IATL= VeLUJE= P LIf PP= 1 3)

bull~ - A ED 011 1 Tn CCy -tENCE C rGR DI T FOR UP S )r 0 JCCUJ0EICF FJ71 ECGPO

173

175 UNTIL ROO1 gt 10 VARY P001 FROM 1 By Comments

In this example RO01 a 2 dimensional array 177 DO with 10 rows and 10 columns is attempting

178 UNTIL R002 - 10 VARY BO FROM 1 BY 1 to update the smaller (5 X 5) TA2 Nonshy- Uexistent element references caused the error

The program text has been superimposed on this

page for illustration purposes

180 ALLOCATE NOOP = TA1 (RO0] ROO)I

JAI UPDATE TA2 (RO01 R002) RO031

182 LIST N002 = R0037 NOO R003

1P3 LIST VALUES OF TA1 = RO03v TAl (RO01 R002)

184 LIST

185 LIST ITA2=TA2 RO01 Rn02)l

l6 END-DOI LIST

187 END-DO

deg r- -- shy

APPENJDIXI1 J( j

CONTROL AREA INPUTS READ OUTPUTS BY OUTPUT LOMANO OUTPUTS TEltCOMMNDfN

SEQUENCE NUMBER 11T A QUESTIONNAIRES RECORDS QUEST KEYWORD RECORDS SFf- 0 T rr-

JOB STOPPED DUETO USER STOP-RUN COMMAND ON LINC 215 - - -- 2 15 _ _ 4$ A

EDITOR -- NO RMAL END OF J B r T I1)

214 IF IN-RECORD-COUNT gt Comments In this example when the internal variable1215 ITHEN STOP-RUN END-THEN was than

215 TE END-THEN IN-RECORD-COUNT was greater than 500 processing was directed by the source COBOL CONCOR language

216 1 program to cease The program text has been - upon this page for illustrationI _-EC superimposed

S IF IN-RECORD-COUNT- lt 2 - - purposes ~~~~~~~~~~- bull - - ------I-

218~ THEN -LIST lt(THAN 20RECS INR

amp-~~~~~ ntai-beDcent

I

APPENDIX J

DIAGNOSTIC MEFSAGE GUIDE EXAMPLE

APPENDIX J DD-2

WARNING(DD-001) BLANK COMMAND STATEMENT LINE

EXPLANATION A blank command statement line was found in the user Dictionary Division command statement file

ACTION TAKEN Parsing continued with the next command statement line in the Dictionary Division command file

USER RESPONSE Put a comment indicator () in column 1 of the command statement line if spacing is desired if not remove the command statement line

ERROR (DD-002) ILLEGAL FIRST CHARACTER IN STRING

EXPLANATION A character not in the CONCOR legal character set (A-Z 0-9 - + the comma character () the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found as the first character in the string

ACTION TAKEN Parsing began again with the next string

USER RESPONSE Check for possible keying errors and ensure that the first character is a member of the valid CONCOR

character set

ERROR (DD-003) END OF LINE REACHED WITH NO DELIMITER FOR ILLEGAL STRING

EXPLANATION A string starting with a character not valid in the CONCOR legal character set (A-Z 0-9 - + the comma character ()p the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found without one of the delimiter characters ( =) when the end of the command line was reached

ACTION TAKEN Parsing began again with the next user command statement line in the Dictionary Division command file

USER RESPONSE Check for possible keying errors and ensure that each of the characters in the string is a member of the CONCOR valid character set (A-Z 0-9 - + the comma character () he delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ())

ERROR (DD-004) COMMENT INDICATOR () IN STRING

EXPLANATION The comment indicator character () was found embedded in a string

Page 22: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …

APPEiUIX B [4 si 11 T q tII

UNITED STATES GOVERNMENshy

Pfern0 KGa 7 dll TO DSPOPFPSD Robert H Haladay

DATE December 3 1979

DSPOPDEIO Liliane Floge

SUBJECT Criteria for Evaluation of SuCen COBOL CONCOR September 30thVersion as presented at January 1930 orkshop

The following points should be considered when evaluatingCONCOR the COBOLediting system during participation at the January 19380 workshyshop

1 Are the Installation anual the System ilanual and theUsers anual ready for use Are these manuals adequate fortheir purposes Are the manuals clear and consistentthey be understood by subject-matter personnel Can

as well program ers

as

2 Is the editing system capable of impleaenting all thefeatures as described in the documenit_tion 3 Does the system appc3Ar to be ely debugged andtested and ready for release to dLvlooing conris 4 Does the system appear to be fast enough to edit a

census in a reasonable airount of time 5 hat size core does tine sys t- equire

6 CorFm-ent in general on coole te-nss usefulness andefficiency of ir-nease usethe n of by developingcountry prograrrers and other c nsus personnel

cc Su-a-- e Olds APiA SDi OrWJulio Ortuizar luotolaCelta Systems

APPENDIX C

WORKSHOP ITINERARY

APPENDIX C

CONCOR Workshop Schedule January 7-18 1980

U S Bureau of the Census International Statistical Programs Center

Scuderi Building Room 209 4235 28th Avenue Marlow Heights Maryland

Monday January 7

930 am shy 1000 Welcoming Remarks

Overview of Workshop

1000 - 1000 Introduction to CONCOR - Purpose and function

- History of development - General computer

requirements

115 - 200 System Description

-Constraints in design of CONCOR

-Basic subsystems of CONCOR

-User interactions with system

-Examples of outputs produced

1030 shy 1045 Break

1045 - 1200 Editing Concepts - Ways to interrogate

data - Ways to correct data

- Editing housing and

population data - POPSTAN

- Advantages of CONCOR

1200 shy 115 pm Break

200 - 230

230 - 245

245 - 325

User Program Organization -Divisions - Sections

- Routines - Commands

Break

Command Language

Description -Types of statements -Format -Syntax

is

2

Tuesday January 8

Dictionary Division Command Statements

115 pm-2 15 Input-Record-Section930 am-10

30 Punctuation - Define-RecordInput data referencing

Working-Data-SectionWorking data declara-- New-Datations - Array-DataInternal data represen-

tation and storage Break215 - 230

1030 - 1045 Break 230 - 325 Dictionary Examples

- Minimum dictionary1045 - 1200 Dictionary-Attributes-

structureSection - Maximum dictionary- Dictionary-Name

structure

- Hand out dictionaryFile-Section problem- Input-File

- Output-File - Write-File

- Error-File

1200 - 115 pm Break

Wednesday January 9

115 pm-2 1 5 Execution Division Command

930 am-1030 Discussion of Partici-Statements (continued)pants Dictionary

- Routines of Edit-Specishyproblems fications-Section

Free work time -Prolog-Routine -Filter-Routine -Epilog-Routine

- Types and functions 1030 - 1045 Break

of edit specification

commands1045 - 1200 Execution Division

Command Statements - Range- Punctuation - Assert- Subscripting

- Internal Identifiers - Report-Control-Section

-Generate-Edit-Statistics 215 - 230 Break-Count-Imputes

-Examples 230 - 325 - PassFail clauses

- List1200 - 115 pm Break

3

Thursday January 10

930 am-1030 Discussion of Problems - If115 pm-215

Free work time - UntilExit - Stop

1030 - 1045 Break 215 - 230 Break

1045 shy 1200 Edit Specification Command Statements 230 - 325 - Drecode

(continued) - Grecode

- Allocate - Update - Let

1200 - 115pm Break

Friday January 11

930 am-1030 Edit Specification 115 pm-3 25 Hand out Edit Specishyfications as problemCommand Statements

(continued) Free work time

- Output - Write

1030 - 1045 Break

1045 - 1200 Report Division Command Statements - Display-Control-

Section -Display-Edit-Statistics

- Tolerance-Control-Section -Error-Rate-Check -Reject-File

-Report Examples

1200 - 115 pm Break

I

29

Monday January 14

930 am-1030 Discuss procedures for running problems on computer

1030-1045 Break

1045-1200 Component Programs of the CONCOR system

1200- 115 pm Break

Tuesday January 15

930 am - 325 pm Free work time

Wednesday January 16

930 am 1200 Free work time

1200- 115 pm Break

115 pm-215 How to Install CONCOR on IBM 360370 OS

215- 230 Break

230-325 Free work time

Thursday January 17

930 am-325 Free work time

115 pm-215 Future CONCOR Design Considerations - for census processing - for survey processing

- manual correction system

215- 230 Break

230 - 245 Evaluation Guidelines

- Hand out evaluation forms

245 - 325 Free work time

Friday January 18

930 am-1030 Free work time 115-325 Free work time

1030 - 1045 Break

1045 - 1200 Group Discussion on CONCOR - submission of written evaluations Distribution of IBM installation tapes Presentation of Certificates to Participants

1200-115 pm Break

APPENDIX D

PARTICIPANTS

APPENDIX D

CONCOR WORKSHOP January 7 - 18 1980 Marlow Heights MD

PARTICIPANTS

KENYA

James Midianga Government Computer Centre Central Bureau of Statistics PO Box 30266 Nairobi Kenya

PANAMA

Omar Rivera Rios Data Processing Specialist Contraloria General de la Republica de Panama Apartado Postal 5213 Panama 5 Panama

PHILIPPINES Guida T Capellan OIC Computer Programming Division National Census and Statistics Office Solicarel Bldg 1 Magsaysay Blvd

Sta Mesa Manila Philippines

THAILAND

Angsumal Sunalai National Statistical Office Bangkok 1 Thailand

EGYPT

Farag Sedky Mourad Ghaleb CAPMAS (Centeral Agency for Public

Mobilization and Statistics)NASR City Cairo Egypt

L

2 31

SAUDI ARABIA

Shargi R Al-Shargi National Computer Center PO Box 2534 Riyadh Saudi Arabia

Ahmed S Al-Ghamdi Dept Director of National Computer Center PO Box 2534 Riyadh Saudi Arabia

OTHER

Robert W ONeal USREPJECOR APO New York NY 09038

John N Adams Data Processing Advisor DACCA Department of State Washington DC 20520

OR co UNDP PO Box 224 Dacca Bangladesh

Joe Quasney US Census Bureau Washington DC 20233

Howard Brunsman 5715 N Ninth St Arlington VA 22205

Larry Shiller and Julio Ortuzar Delta Systems Consultants Inc 264 Alhambra Circle Coral Cables FL 33134

Nicholas Ourusoff Burpee Hill Road New London NH 03257 (United Nations)

Frederic J Grant IV Systems Development GeorgiaWorld Congress Institute 580 North Omni International Atlanta Georgia 30303

3 32

Rafael Samper Mary Jo Keenan Catherine B Gleason LACDR Room 2246 NS Washington DC 20523

John Marshall AIDSERDMPSE Rm 706C SA-12 Washington DC 20523

Susanne Bacon and Mary K Friday ISPCData Processing US Census Bureau Washington DC 20233

Leo Dougherty ISPCWorld Census Staff US Bureau of the Census Washington DC 20233

33

CON C OR WORKSHOP

January 7-18 1979 Washington DC

Staff

Robert R Bair

Luis Garcia

David Malkovsky

Sandra Mansfield

Selma Sawaya

Vivian Toro

APPENDIX E

CONCOR EVALUATION FORM

0

APPENDIX E

CONCOR WORKSHOP January 16 1980

BASIC COBOL VERSION 2

December 1979 Release

Participant Evaluation of Package

1 What is your impression of this version of CONCOR in terms of its usefulness and reliability for editing housing and population census data in less developed countries

2 If you are familiar with previous versions of CONCOR how does this current version compare to them

2D

35

2

3 How would you compare the utility of this version of CONCOR for less developed countries with other systems for editing data of which you are familiar

4 How well do you feel the documentation (Reference Manual and Diagnostic Messages Guide) supports the software

36

3

5 How well do you think this version of CONCOR would serve for editing other types of statistical censuses or surveys

6 If your organization does statistical data processing by computer would you recommend to your superiors that this software be used to edit the data Do you feel assistance would be required in the installation of this software on your organizations computer andor in training userR on the packages applicability to their work

37

7 In what way(s) could improvements or enhancements be made to this version

of the CONCOR software andor its documentation

8 Other comments

1shy

APPENDIX F

NEWOLD COMMAND COMPARISONS

0 0

OLD CONCOR VERSION 1 DECEMBER 1978

DATA-DICTIONARY

DEFINE INPUT-FILE

DEFINE ERROR-FILE

DEFINE COMMON-DATA

(contains the questionnaire-IDand record type location information)

DEFINE RECORD-TYPE=

DEFINE NEW-DATA

DEFINE ARRAY-DATA

APPENDIX F

NEW CONCOR VERSION 2 DECEMBER 1979

DICTIONARY-DIVISION

DICTIONARY-ATTRIBUTES-SECTION

DICTIONARY-NAME

FILE-SECTION

INPUT-FILE OUTPUT-FILE WRITE-FILE ERROR-FILE

DOENTIFICATION-CONTROL-SECTION

AREA-CONTROL

QUESTIONNAIRE-CONTROL

RECORD-CONTROL

INPUT-RECORD-SECTION

DEFINE-RECORD

WORKING-DATA-SECTION

NEW-DATA ARRAY-DATA

END-DiVISION

COMMENTS

In Version 1 all command keyword labels had to begin in column 1 ofthe coding sheet The position notation of Version 2 permits thecommand to begin in columns 1-72

The period preceding a statement

signifies that it is but a commentcard Accordingly it should be noted that while the END-DIVISION command must be present the division name identifier DICTIONARY-DIVISION

has not been implemented

Note Some parameters for each commandhave been altered even where general correspondence exists

OLD CONCOR VERSION 1 DECEMBER 1978

EDIT-SPECIFICATION

PROLOG

FILTER TYPE ( )

EPILOG

ALLOCATE (ALL) ASSERT (AST)

END ERROR

FILTER TYPE ( ) WITH

IF LET

RANGE (RNG) RECODE (REC) STOP --keyword

UPDATE (UPD) WRITE (WRT) XRECODE (XREC)

NEW CONCOR VERSION 2 DECEMBER 1979

EXECUTION-DIVISION

REPORT-CONTROL-SECTION

COUNT-IMPUTES GENERATE-EDIT-STATISTICS

EDIT-SPECIFICATION-SECTION

PROLOG-ROUTINE

FILTER-ROUTINE (name-of-record-type)

EPILOG-ROUTINE

ALLOCATE (ALLOC) ASSERT (ASRT) DRECODE (DRCD) END-DIVISION

EXIT

GRECODE (GRCD) IF LET LIST OUTPUT RANGE (RNG)

STOP UNTIL UPDATE (UPD) WRITE (WRT)

END-DIVISION

COMMENTS

Periods preceding division and section names signify that they are treated as comments by CONCOR

END-DIVISION signifies the termination of the CONCOR division organization and must always be present even though division identifiers have not been implemented

This figure additionally permits a comparison of the EDIT-SPECI-FICATION commands of both CONCOR language versions Note the changes in abbreviations and that some keyword options (not shown) have been altered even where general corresponshydence exists Individual commands are discussed on the following pages

CONCOR LANGUAGE COMMAND STATEMENTS

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

ALLOCATE (ALL) ALLOCATE (ALLOC) Allows for the assignment of values and with the UPDATE command providesthe imputation mechanism Major difference is this statement now providesfor a receiving-identifier list and statistics are generated as coded inthe GENERATE-EDIT-STATISTICS command option of the EXECUTION-DIVISION

ASSERT fAST) ASSERT (ASRT) The ASSERT command is the basic consistency editor command of CONCOR Command now provides for a NOERROR and MESSAGE option Additionallyby utilizinq the DUMPR DUMPO keywords provides the user with the option of using all the current value of all user-identifiers defined for the record type being processed Other changes to this command include the pass end-pass fail end-fail constructions for the P purpose of maintaining structured logic

(see RECODE) DRECODE (DRCD) Provides a method for recoding data items having a starting value of zero and continue in ascending sequence Replaces t previous XRECODE command

EXIT Provides the means to terminate EXECUTION (ESCAPE) of command statements within the UNTIL command

END END-IF END-THEN

A solitary END command is no lonqer permitted in conditional construction Note the similarity to COBOL pseudocode

END-ELSE

ERROR not implemented No longer supported in language

(continued)

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

FILTER TYPE ( ) WITH not implemented No longer are option statements supported in language This command once specified essentially a GOTO depending on construction

(see XRECODE) GRECODE (GRCD) This new command provides a method for the group recoding of input values

IF IF The IF statement has properties found virtually in all programming languages ie provides a means of evaluating a condition and controlling the execution of alternative logical statements Similar to pseudocode a CONCOR programmer must utilize the structured command IF THEN END-THEN ELSE END-ELSE construction in place of the IF THEN END ELSE END statements acceptable to the 1978 version

LET LET Virtually unchanged

LIST New command which allows for the generation of specific messages to be displayed in the report of edit statistics by questionnaire as well as specific individual identifiers

OUTPUT New command which provides the means by which records will be written to the output-file

(continuej)

4)bull

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 VERSION 1

RANGE (RNG)

RECODE (REC)

STOP- keyword

UPDATE (UPD)

(continued)

DECEMBER 1979 VERSION 2

RANGE (RNG)

name changed

STOP- keyword

UNTIL DO END-DO

UPDATE (UPD)

COMMENTS

As a basic editing command of CONCOR the December 1979 version permits the listing of the out-of-range values as well as the entire record or questionnaire through DUMPR and DUMPO optionswithin the command Another new option of this command involves the utilization of a PASS END-PASS FAIL END-FAIL construction permitting the specification of logicalcommand paths depending upon the outcome of the range test

Name changed to DRECODE Virtually identical with the DRC command in the COCENTS system

Unchanged

DO-END-DO one of the most significant and powerful enhancements to the CONCOR language this command provides a loopingcapability similar to most other high-level pogramminglanguages The number of iterations is under user control as well as the value initially assigned to the user identifier (counter) which is incremented with each successive execution of the command statements

Virtually unchanged it is the basic mechanism for maintaining arrays involved in imputation processes

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

WRITE (WRT) WRITE (WRT) The WRITE language statement while once utilized as the primary output command is now used to create an auxiliary or derivative file Statistics may be accumulated with the specification of the GENERATE-EDIT-STATISTICS option of the EXECUTION-DIVISION (See WRITE-FILE of DICTIONARY-DIVISION)

XRECODE (XREC) name changed Old command which has become the GRECODE command in the December 1979 version

0 0 0

DECEMBER 1978 VERSION 1

DECEMBER 1979 VERSION 2

Not implemented REPORT-DIViSION

DISPLAY-CONTROL-SECTION

DISPLAY-EDIT-STATISTICS

TOLERANCE-CONTROL-SECTION

ERROR-RATE-CHECK REJECT-FILE

END-DIVISION

COMMENTS

This new division (note division and section names treated as comments) provides the means to controlthe type of edit statistics reports to be generated Generally all reports produced by the commands ofthis division are organized according to the AREA-CONTROL command of the DICTIONARY-DIVISION Notethat this means the input data file must be preproshycessed (presorted external to CONCOR) on the basisof the AREA-CONTROL data field as CONCOR is incapableof merging uncontiguous records

There is currently no command to enable users to specify unique report headings on the listingfrom this division

APPENDIX G

ISPC FUTURE ENHANCEMENT LIST

G-1

Appendix G

Future Versions of CONCOR Design Considerations

The present version of the CONCOR system (Basic COBOL Version 2) was designed to provide adequate computing power to properly edit housing and population census da a using automatic correction techniquns As a result CONCOR should not be construed to be the ultimate software package for generalized data editing The developers of this version of CONCOR realize that there is a group of improvements that can be made to the system which would facilitate the census data editing process A second group of changes or enhancements also exist that could facilitate the editing of survey or other types of census data It should be noted that some of the modifications to meet these two objectives at times are mutually exclusive Some of the possible modifications and considerations are outlined below

1 Housing and Population Census Processing

1 1 Software

1111 Dictionary Division

(1) Implementation of the Dictionary Attributes Section This would allow the user to specify the dictionary size (number of pages) on disk and control other file characteristics such as volume specification and retention periods if applicable

(2) Addition of an optional input andor output file(s) which could be used to initialize and save values stored in arrays and used during the allocation process This feature would require additional syntax analysis and the creation of new commands to perform the movement of the data values

(3) An output record format area (Output Record Section) could be added This would allow the creation of edited output files in a different format and length than the file read into CONCOR This implementation would require other enhanceients to provide a simple means of moving values from th-iT i tato tile output file Unique naming of user-identifiers would also need to be addressed Another possiblity is the declaration of both the input and output location in the same command when specifying the contents of tile data record to be edited

(4) Addition of user-identifier naires to fields used in the controlling of summary statistics and unique questionnaire determination This would require modification to the system level data dictionary

46 G-2

(5) Increasing the number of area and questionnaire control fields

This implementation would require modification lo the

dictionary as tell as to the error records passed to the Report

Division Another problem would be the formatting of the extra

data values in the report headings

(6) Permit the mixing of difFerent data item types in the CONTROL

commands This would allow the greatest flexibilty in choosing

control fields In order to implement this enhancement the

dictionary and the generated COBOL program (EDITOR) would

require modification

(7) Give the user access to the values in the control fields during

the editing process Either the user would be prohibited from

changing these values (as is currently done with CONCOR

internal identifiers) to prevent the creation of new questionnaires or the summary information gathered on a

questionnaire basis would become meaningless Another

ramification of this enhancement is how to handle fields defined as alphanumeric as CONCOR internally works in binary

(8) Allow input data records with a similar format to be defined by

the same DEFINE-RECORD command statement This could be

accomplished by permitting multiple values in the RECORD-TYPE

clause of the DEFINE-RECORD command A possible problem here

is in the determination of which value to be move to the output

record when the record is to be written out using the OUTPUT

command 0

(9) Implement a COMMION-DATA concept to handle items command to all

record types This could be costly in terms of core storage

and additional conversion time if a storage area is allocated

for each identifier for each record type Only permitting one

common area for all records would require validation of the

data to ensure the values were exactly the same on each record

This wJould require the CONCOR program to perform many more

internal checks when reading the data file into the store area

(10) Permit the selective inclusion of data items found in the

DEFINE-RECORD command statement into the universe count used

for the determination of tolerance levels when using the COUNT-IIPUTES command statement

(11) Increase the number and types of data format referencing now

permitted External numeric data (type N) could be expanded to

handle 18 digit numbers Also signed numeric and packed

decimal data format could be added The signed numeric format

would allow both leading blanks and negative data values This

format is essential ihen dealing with data files generated by

FORTRAN programs

0

+

G-3 47

(12) Increase the number of comparison strings permitted for alphanumeric data items This would allow for easier processing of alphanumeric strings but would require modifications to the system data dictionary

(13) If a range of valid values were specified with the declaration of a user-identifier in the DEFINE-RECORD command an automatic range check could be performed prior to CONCOR giving the user access to the data items on the questionnaire The inclusion of such values are now done by many users in the form of comment statements This enhancement would also facilitate the use of the dictionary by tabulation software

(14) The language format used in the Dictionary Division could be modified to recognize both the space and comma character as valid delimiters The user would still be required to code two commas to receive the system default value for any parameter

(15) Provide a repetition factor in the initialization of arrays

(16) Print the data dictionary name in the titles of the dictionary documentation or provide a means by which the user may specify a title to appear in the listings

112 Execution Division

(M) Implement the Run Control Section This would allow the user to increase execution speed of the EDITOR program This would be accomplished by eliminating some of the system protection features such as division by zero and out of bounds checking In this secion the user could also specify new maximum values for the EDITOR error limits

(2) Provide an internal identifier (OCCURRENCE-PTR) which would contain the occurrence number of the record currently being filtered The only problem to resolve is how to treat this id-ntifier outside of the Filter Routines

(3) Implement a CREATE commnand This command would allow the user to create a record that Was not found on the input file This implementation would require special care as it gives the user the ability to change the valme of a CONCOR internal identifier (TYPE-CGUNT) and modify the contents of the store Another consequence of this command is what action is to be taken in the calculation of tolerance in terms of this new records inclusion into the universe of observations and number of changes

iK

48 G-4

(4) The code generated in the EDITOR program for the DRECODE

command should be modified to utilized a lookup table approach

This would decrease the execution time of the command but woul C

require additional communication between the GENED and GENDD

programs

(5) Permit the user to continue the message text field over

successive lines of code

(6) Implementation of a SET command command that would change

CONCORs default occurrence pointer for the duration of the SET

command This command would be ever helpful as the validation

of the existence of a record would only have to be done once

when the SET command was encounterd The major drawback to

this command is in the users ability to remember the action

taken during the last execution of the command when the

Execution Division command statements may cover many pages of

code

(7) Implement a method of specifying different data formats for

fields on the WRITE command statement This would give greater

flexibility to the user but would require major revision to the

routines now used to process the WRITE command

(8) Provide the user with two sets of internal identifiers The

first set currently exists in the system and is valid on the

basis of the current control area (if specified) being

executed The second set would be accumulated on a run total

basis and would require the creation of new CONCOR internal

ident ifiers

(9) Implement automatic array declarations for capturing allocation

frequency distributions This could be done by the system for

discrete values (especially if point number 13 above is

implemented) but a mechanism for handling continuous values

would need to be designed nother program would then be

required to format the information gathered into a readable

form

113 Report Division

(I) Provide a means by which the user may specify their own headings or titles -for the reports

(2) Give the user the ability to override page ejection as a means

of saving paper

(3) Provide a neans by which the statistics gathered may be

presented in other aqgregations other than the area-break and

total levels now pro ided This would require another

procedure to aggregate the statistics and pass them onto the

Report Division before the printing of the reports began Thi

extra procedure is required because of program size

cons i derat ions

G-5

49

114 System Level

(1) Generation of assembler (ALC) language code for the EDITOR program to be used on the IBM 360370 series type computers

(2) Modification of the GENSRC program i utilize a lookup table instead of the linear search technioa2 now used

(3) Modification of the RDMSGTXT program to look at continuation lines when searching for the text to be supplied by the system

(4) Hlodification to the system data dictionary to better arrange and make available information needed most frequently Also to examine the possibility of making the section dealing with the message text and report generation a separate file Investigate the industry standards for dictionary format and see how CONCOR meets the requirements set forth

(5) Make the parsing and syntacial routines interactive so as to increase the product ity of the CONCOR user This does not mean to imply that the editing process itself should be made interactive but rather only the development of the user code should be put online

12 Documentation

(1)The development of a self-teaching CONCOR manual

(2) The development of a CONCOR Users Guide

G-6

0

50

2 Survey or other Census Processing

21 Software

(1) Make optional the specification of the record type and questionnaire identification information as some files are not

hierarchical in nature

(2) Provide another input file as a means of doing a check-in procedure of sample cases expected This new input file would contain the master list of those oberservations selected to be

examined

(3) Allow floating point calculations

(4) Provide a mechanism for referencing repetitive sections of questionnaire This referencing (loop letter approach) could be accomplished in the same fashion as interrecord checking is now implemented Note that by using this approach two versions of the software would be required One version would be as CONCOR is now implemented and the second would accept flat data files where the subscript indicated a repetitive section on the survey document

(5) Add a weighting scheme to allow meaningful totals and summary

statistics based upon the sampling frame used to gather the data being edited

(6) Increase the number of user-identifiers allowed in the system because of the more detailed information found on survey forms

(7) Provide a means of manual correction of erroneous data items This correction scheme would utilize the data dictionary and allow for correction based upon the user-identifer name and

questionnaire identification CELADE has a COBOL version of

this program but it has not been finished nor tested

42

APPENDIX H

CONCOR SYSTEM INTERNAL VARIABLES

0 APPENO H

CONCOR SYSTEM INTERNAL VARIABLES

1 EOF-FLAG

2 ERROR-TN-QUESTIONNAIRE

3 CURRENT-POINTER-VALUE

4 TYPE-COUNT ( )

5 RECORD-COUNT

6 QUESTIONNAIRE-COUNT

7 RECORDS-IN-STORE

8 INVALID-RECORD-TYPE-COUNT

9 INCOMPLETE-FLAG

10 CONTINUATION-FLAG

11 INVALID-RECORD-FLAG

EOF-FLAG

ERRORS-IN-QUESTIONNAIRE

TYPE-COUNT ( )

IN-RECORD-COUNT

IN-QUESTIONNAIRE-COUNT

RECORDS-IN-STORE

INVALID-RECORD-TYPE-COUNT

INCOMPLETE-FLAG

CONTINUATION-FLAG

OUT-OF-RANGE

NOT-NUMBER

BLANK

(new internal counters) OUT-RECORD-COUNT OUTPUT-QUEST-COUNT WRITE-RECORD-COUNT

Change in spelling of variable

Not mentioned--possible ommission from manual

Not implemented

Note Current documentation equivocates when some variables

are reset--most are initialized with each control area break

Though implemented as reserved identifiers due to poor documentation of (old) CONCOR exact values and utility were unknown

These apparently new counters permit access to valuable totals within control areas

Note These are not cumulative counters Also outputshyquest-count violates naming conversion eg in out

APPENDIX I

CONCOR-EDITOR EXECUTION STATISTICS

C 0 N C o R - E D I T O R EXECUT ION S-TA-T1 ST I CS RUN DATEt 517I

APPENDIX I

CONTROL AREA 0 INPUTS RELD OUTPUTS BY OUTPUTCOHMAND OUTPUTS BY W0ITECOMP4JD NO VALI 0 SEQUENCE NUMBER 00000 00 0 RECTYPES 0

OUESTIONNAIRES RECORDS 0 QUEST KEYWORD- 0 ECORDS- RECORDS FORQUFSTshy

bullDIVISION BY ZERO LINE NUMBER i18 to DIVISION BY ZERO LINE NUMBER 118 bo DIVISION BY ZERO 00 LINE NUMBER 118

Oboo DIVISION BY ZERO 00 LINE NUMBER 118 LTNE |UJMP 000 DIVISION BY ZERO 000 LINE NUMBER 118

bo DIVISION BY ZERO 0 LINE NUMBER 118 00 RUN ABORTED DUE TO DIVISION BY ZERO ERRORS 00 1it LIST IR003 = Ro3t

1 6 42

112 LIST

Comments One of the most sever criticisms 114 LIST IDIVISIION BY ZErO ERROR Cw FOLLOWS of the old CONCOR package was that ]l LET RO01 003 = O it would cease processing for no

apparent reason--ABEND without generating any messages The most )16 LIST iRO01 R003 SET TO ZEROS Pou19 Ron3 conmon types of cessations are division by zero array coordinate 117 LIST RO0 0 = errors and user specified terminations A program to deliberately 118 LET P003 = R002 RO01 test CONCORS ability to providediagnostics was written The system 119 LIST LIST R003 R002RO01 =1 R0031 perfoned exceptionally well and correctly provided the line numbers in the 120 source program which caused the data exceptions The program text has been 121 superimposed upon the Execution Statistics page for illustration purposes 122 END TEST OF OPERANDS AND MSG

123

124

1 5 PEGISTERS RESET TO ZERO

125 LET R001 RO00 R003 = nt

0 N C 1 - Z D I T 0 it - ( r S T A T I S T I C S QJi DAT 0119

I ~ - ~APPENDIX b)APPENDIX TL I A Cr14A) 01 3TUTS JRtTF -0 VtLIQ AP-A I PUT R - 3ITD JTS LY UTPUT - Y C

I)ESTIjAAIES - PECLRW cU] I S T igry-t QEC UP )S irrkqS F] ) T

VLLII= FF= 15( C 1 u U-1 Tr JJ TI A)[ 9 )i 1= TI =WATE ) L i - U = VI L IJ) L T J3r-= I n

D 1 4A 1 EItR P IXSTI J A 1 L = 7 II I fIAT E I i Y C

= y 4 I 19f CZ U AT -Z-2j 1ESTIJh14 E CJ)PI) IATE= 2 V L 1E= L T I P1 =UM 11

= rU F r

F R L- -- I JE TI rjA OrPR eTE= rUMiF=E II V1 LIIF= LIT V M

tYfZU ttj gR R QUIESTI3Ii1 j Vi r IATE= LI n l FJw== 0R V J L T C ( -pJIT E~Q - cUESTT Y2ZjI 1= g rfiflUI)IATL= VeLUJE= P LIf PP= 1 3)

bull~ - A ED 011 1 Tn CCy -tENCE C rGR DI T FOR UP S )r 0 JCCUJ0EICF FJ71 ECGPO

173

175 UNTIL ROO1 gt 10 VARY P001 FROM 1 By Comments

In this example RO01 a 2 dimensional array 177 DO with 10 rows and 10 columns is attempting

178 UNTIL R002 - 10 VARY BO FROM 1 BY 1 to update the smaller (5 X 5) TA2 Nonshy- Uexistent element references caused the error

The program text has been superimposed on this

page for illustration purposes

180 ALLOCATE NOOP = TA1 (RO0] ROO)I

JAI UPDATE TA2 (RO01 R002) RO031

182 LIST N002 = R0037 NOO R003

1P3 LIST VALUES OF TA1 = RO03v TAl (RO01 R002)

184 LIST

185 LIST ITA2=TA2 RO01 Rn02)l

l6 END-DOI LIST

187 END-DO

deg r- -- shy

APPENJDIXI1 J( j

CONTROL AREA INPUTS READ OUTPUTS BY OUTPUT LOMANO OUTPUTS TEltCOMMNDfN

SEQUENCE NUMBER 11T A QUESTIONNAIRES RECORDS QUEST KEYWORD RECORDS SFf- 0 T rr-

JOB STOPPED DUETO USER STOP-RUN COMMAND ON LINC 215 - - -- 2 15 _ _ 4$ A

EDITOR -- NO RMAL END OF J B r T I1)

214 IF IN-RECORD-COUNT gt Comments In this example when the internal variable1215 ITHEN STOP-RUN END-THEN was than

215 TE END-THEN IN-RECORD-COUNT was greater than 500 processing was directed by the source COBOL CONCOR language

216 1 program to cease The program text has been - upon this page for illustrationI _-EC superimposed

S IF IN-RECORD-COUNT- lt 2 - - purposes ~~~~~~~~~~- bull - - ------I-

218~ THEN -LIST lt(THAN 20RECS INR

amp-~~~~~ ntai-beDcent

I

APPENDIX J

DIAGNOSTIC MEFSAGE GUIDE EXAMPLE

APPENDIX J DD-2

WARNING(DD-001) BLANK COMMAND STATEMENT LINE

EXPLANATION A blank command statement line was found in the user Dictionary Division command statement file

ACTION TAKEN Parsing continued with the next command statement line in the Dictionary Division command file

USER RESPONSE Put a comment indicator () in column 1 of the command statement line if spacing is desired if not remove the command statement line

ERROR (DD-002) ILLEGAL FIRST CHARACTER IN STRING

EXPLANATION A character not in the CONCOR legal character set (A-Z 0-9 - + the comma character () the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found as the first character in the string

ACTION TAKEN Parsing began again with the next string

USER RESPONSE Check for possible keying errors and ensure that the first character is a member of the valid CONCOR

character set

ERROR (DD-003) END OF LINE REACHED WITH NO DELIMITER FOR ILLEGAL STRING

EXPLANATION A string starting with a character not valid in the CONCOR legal character set (A-Z 0-9 - + the comma character ()p the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found without one of the delimiter characters ( =) when the end of the command line was reached

ACTION TAKEN Parsing began again with the next user command statement line in the Dictionary Division command file

USER RESPONSE Check for possible keying errors and ensure that each of the characters in the string is a member of the CONCOR valid character set (A-Z 0-9 - + the comma character () he delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ())

ERROR (DD-004) COMMENT INDICATOR () IN STRING

EXPLANATION The comment indicator character () was found embedded in a string

Page 23: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …

APPENDIX C

WORKSHOP ITINERARY

APPENDIX C

CONCOR Workshop Schedule January 7-18 1980

U S Bureau of the Census International Statistical Programs Center

Scuderi Building Room 209 4235 28th Avenue Marlow Heights Maryland

Monday January 7

930 am shy 1000 Welcoming Remarks

Overview of Workshop

1000 - 1000 Introduction to CONCOR - Purpose and function

- History of development - General computer

requirements

115 - 200 System Description

-Constraints in design of CONCOR

-Basic subsystems of CONCOR

-User interactions with system

-Examples of outputs produced

1030 shy 1045 Break

1045 - 1200 Editing Concepts - Ways to interrogate

data - Ways to correct data

- Editing housing and

population data - POPSTAN

- Advantages of CONCOR

1200 shy 115 pm Break

200 - 230

230 - 245

245 - 325

User Program Organization -Divisions - Sections

- Routines - Commands

Break

Command Language

Description -Types of statements -Format -Syntax

is

2

Tuesday January 8

Dictionary Division Command Statements

115 pm-2 15 Input-Record-Section930 am-10

30 Punctuation - Define-RecordInput data referencing

Working-Data-SectionWorking data declara-- New-Datations - Array-DataInternal data represen-

tation and storage Break215 - 230

1030 - 1045 Break 230 - 325 Dictionary Examples

- Minimum dictionary1045 - 1200 Dictionary-Attributes-

structureSection - Maximum dictionary- Dictionary-Name

structure

- Hand out dictionaryFile-Section problem- Input-File

- Output-File - Write-File

- Error-File

1200 - 115 pm Break

Wednesday January 9

115 pm-2 1 5 Execution Division Command

930 am-1030 Discussion of Partici-Statements (continued)pants Dictionary

- Routines of Edit-Specishyproblems fications-Section

Free work time -Prolog-Routine -Filter-Routine -Epilog-Routine

- Types and functions 1030 - 1045 Break

of edit specification

commands1045 - 1200 Execution Division

Command Statements - Range- Punctuation - Assert- Subscripting

- Internal Identifiers - Report-Control-Section

-Generate-Edit-Statistics 215 - 230 Break-Count-Imputes

-Examples 230 - 325 - PassFail clauses

- List1200 - 115 pm Break

3

Thursday January 10

930 am-1030 Discussion of Problems - If115 pm-215

Free work time - UntilExit - Stop

1030 - 1045 Break 215 - 230 Break

1045 shy 1200 Edit Specification Command Statements 230 - 325 - Drecode

(continued) - Grecode

- Allocate - Update - Let

1200 - 115pm Break

Friday January 11

930 am-1030 Edit Specification 115 pm-3 25 Hand out Edit Specishyfications as problemCommand Statements

(continued) Free work time

- Output - Write

1030 - 1045 Break

1045 - 1200 Report Division Command Statements - Display-Control-

Section -Display-Edit-Statistics

- Tolerance-Control-Section -Error-Rate-Check -Reject-File

-Report Examples

1200 - 115 pm Break

I

29

Monday January 14

930 am-1030 Discuss procedures for running problems on computer

1030-1045 Break

1045-1200 Component Programs of the CONCOR system

1200- 115 pm Break

Tuesday January 15

930 am - 325 pm Free work time

Wednesday January 16

930 am 1200 Free work time

1200- 115 pm Break

115 pm-215 How to Install CONCOR on IBM 360370 OS

215- 230 Break

230-325 Free work time

Thursday January 17

930 am-325 Free work time

115 pm-215 Future CONCOR Design Considerations - for census processing - for survey processing

- manual correction system

215- 230 Break

230 - 245 Evaluation Guidelines

- Hand out evaluation forms

245 - 325 Free work time

Friday January 18

930 am-1030 Free work time 115-325 Free work time

1030 - 1045 Break

1045 - 1200 Group Discussion on CONCOR - submission of written evaluations Distribution of IBM installation tapes Presentation of Certificates to Participants

1200-115 pm Break

APPENDIX D

PARTICIPANTS

APPENDIX D

CONCOR WORKSHOP January 7 - 18 1980 Marlow Heights MD

PARTICIPANTS

KENYA

James Midianga Government Computer Centre Central Bureau of Statistics PO Box 30266 Nairobi Kenya

PANAMA

Omar Rivera Rios Data Processing Specialist Contraloria General de la Republica de Panama Apartado Postal 5213 Panama 5 Panama

PHILIPPINES Guida T Capellan OIC Computer Programming Division National Census and Statistics Office Solicarel Bldg 1 Magsaysay Blvd

Sta Mesa Manila Philippines

THAILAND

Angsumal Sunalai National Statistical Office Bangkok 1 Thailand

EGYPT

Farag Sedky Mourad Ghaleb CAPMAS (Centeral Agency for Public

Mobilization and Statistics)NASR City Cairo Egypt

L

2 31

SAUDI ARABIA

Shargi R Al-Shargi National Computer Center PO Box 2534 Riyadh Saudi Arabia

Ahmed S Al-Ghamdi Dept Director of National Computer Center PO Box 2534 Riyadh Saudi Arabia

OTHER

Robert W ONeal USREPJECOR APO New York NY 09038

John N Adams Data Processing Advisor DACCA Department of State Washington DC 20520

OR co UNDP PO Box 224 Dacca Bangladesh

Joe Quasney US Census Bureau Washington DC 20233

Howard Brunsman 5715 N Ninth St Arlington VA 22205

Larry Shiller and Julio Ortuzar Delta Systems Consultants Inc 264 Alhambra Circle Coral Cables FL 33134

Nicholas Ourusoff Burpee Hill Road New London NH 03257 (United Nations)

Frederic J Grant IV Systems Development GeorgiaWorld Congress Institute 580 North Omni International Atlanta Georgia 30303

3 32

Rafael Samper Mary Jo Keenan Catherine B Gleason LACDR Room 2246 NS Washington DC 20523

John Marshall AIDSERDMPSE Rm 706C SA-12 Washington DC 20523

Susanne Bacon and Mary K Friday ISPCData Processing US Census Bureau Washington DC 20233

Leo Dougherty ISPCWorld Census Staff US Bureau of the Census Washington DC 20233

33

CON C OR WORKSHOP

January 7-18 1979 Washington DC

Staff

Robert R Bair

Luis Garcia

David Malkovsky

Sandra Mansfield

Selma Sawaya

Vivian Toro

APPENDIX E

CONCOR EVALUATION FORM

0

APPENDIX E

CONCOR WORKSHOP January 16 1980

BASIC COBOL VERSION 2

December 1979 Release

Participant Evaluation of Package

1 What is your impression of this version of CONCOR in terms of its usefulness and reliability for editing housing and population census data in less developed countries

2 If you are familiar with previous versions of CONCOR how does this current version compare to them

2D

35

2

3 How would you compare the utility of this version of CONCOR for less developed countries with other systems for editing data of which you are familiar

4 How well do you feel the documentation (Reference Manual and Diagnostic Messages Guide) supports the software

36

3

5 How well do you think this version of CONCOR would serve for editing other types of statistical censuses or surveys

6 If your organization does statistical data processing by computer would you recommend to your superiors that this software be used to edit the data Do you feel assistance would be required in the installation of this software on your organizations computer andor in training userR on the packages applicability to their work

37

7 In what way(s) could improvements or enhancements be made to this version

of the CONCOR software andor its documentation

8 Other comments

1shy

APPENDIX F

NEWOLD COMMAND COMPARISONS

0 0

OLD CONCOR VERSION 1 DECEMBER 1978

DATA-DICTIONARY

DEFINE INPUT-FILE

DEFINE ERROR-FILE

DEFINE COMMON-DATA

(contains the questionnaire-IDand record type location information)

DEFINE RECORD-TYPE=

DEFINE NEW-DATA

DEFINE ARRAY-DATA

APPENDIX F

NEW CONCOR VERSION 2 DECEMBER 1979

DICTIONARY-DIVISION

DICTIONARY-ATTRIBUTES-SECTION

DICTIONARY-NAME

FILE-SECTION

INPUT-FILE OUTPUT-FILE WRITE-FILE ERROR-FILE

DOENTIFICATION-CONTROL-SECTION

AREA-CONTROL

QUESTIONNAIRE-CONTROL

RECORD-CONTROL

INPUT-RECORD-SECTION

DEFINE-RECORD

WORKING-DATA-SECTION

NEW-DATA ARRAY-DATA

END-DiVISION

COMMENTS

In Version 1 all command keyword labels had to begin in column 1 ofthe coding sheet The position notation of Version 2 permits thecommand to begin in columns 1-72

The period preceding a statement

signifies that it is but a commentcard Accordingly it should be noted that while the END-DIVISION command must be present the division name identifier DICTIONARY-DIVISION

has not been implemented

Note Some parameters for each commandhave been altered even where general correspondence exists

OLD CONCOR VERSION 1 DECEMBER 1978

EDIT-SPECIFICATION

PROLOG

FILTER TYPE ( )

EPILOG

ALLOCATE (ALL) ASSERT (AST)

END ERROR

FILTER TYPE ( ) WITH

IF LET

RANGE (RNG) RECODE (REC) STOP --keyword

UPDATE (UPD) WRITE (WRT) XRECODE (XREC)

NEW CONCOR VERSION 2 DECEMBER 1979

EXECUTION-DIVISION

REPORT-CONTROL-SECTION

COUNT-IMPUTES GENERATE-EDIT-STATISTICS

EDIT-SPECIFICATION-SECTION

PROLOG-ROUTINE

FILTER-ROUTINE (name-of-record-type)

EPILOG-ROUTINE

ALLOCATE (ALLOC) ASSERT (ASRT) DRECODE (DRCD) END-DIVISION

EXIT

GRECODE (GRCD) IF LET LIST OUTPUT RANGE (RNG)

STOP UNTIL UPDATE (UPD) WRITE (WRT)

END-DIVISION

COMMENTS

Periods preceding division and section names signify that they are treated as comments by CONCOR

END-DIVISION signifies the termination of the CONCOR division organization and must always be present even though division identifiers have not been implemented

This figure additionally permits a comparison of the EDIT-SPECI-FICATION commands of both CONCOR language versions Note the changes in abbreviations and that some keyword options (not shown) have been altered even where general corresponshydence exists Individual commands are discussed on the following pages

CONCOR LANGUAGE COMMAND STATEMENTS

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

ALLOCATE (ALL) ALLOCATE (ALLOC) Allows for the assignment of values and with the UPDATE command providesthe imputation mechanism Major difference is this statement now providesfor a receiving-identifier list and statistics are generated as coded inthe GENERATE-EDIT-STATISTICS command option of the EXECUTION-DIVISION

ASSERT fAST) ASSERT (ASRT) The ASSERT command is the basic consistency editor command of CONCOR Command now provides for a NOERROR and MESSAGE option Additionallyby utilizinq the DUMPR DUMPO keywords provides the user with the option of using all the current value of all user-identifiers defined for the record type being processed Other changes to this command include the pass end-pass fail end-fail constructions for the P purpose of maintaining structured logic

(see RECODE) DRECODE (DRCD) Provides a method for recoding data items having a starting value of zero and continue in ascending sequence Replaces t previous XRECODE command

EXIT Provides the means to terminate EXECUTION (ESCAPE) of command statements within the UNTIL command

END END-IF END-THEN

A solitary END command is no lonqer permitted in conditional construction Note the similarity to COBOL pseudocode

END-ELSE

ERROR not implemented No longer supported in language

(continued)

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

FILTER TYPE ( ) WITH not implemented No longer are option statements supported in language This command once specified essentially a GOTO depending on construction

(see XRECODE) GRECODE (GRCD) This new command provides a method for the group recoding of input values

IF IF The IF statement has properties found virtually in all programming languages ie provides a means of evaluating a condition and controlling the execution of alternative logical statements Similar to pseudocode a CONCOR programmer must utilize the structured command IF THEN END-THEN ELSE END-ELSE construction in place of the IF THEN END ELSE END statements acceptable to the 1978 version

LET LET Virtually unchanged

LIST New command which allows for the generation of specific messages to be displayed in the report of edit statistics by questionnaire as well as specific individual identifiers

OUTPUT New command which provides the means by which records will be written to the output-file

(continuej)

4)bull

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 VERSION 1

RANGE (RNG)

RECODE (REC)

STOP- keyword

UPDATE (UPD)

(continued)

DECEMBER 1979 VERSION 2

RANGE (RNG)

name changed

STOP- keyword

UNTIL DO END-DO

UPDATE (UPD)

COMMENTS

As a basic editing command of CONCOR the December 1979 version permits the listing of the out-of-range values as well as the entire record or questionnaire through DUMPR and DUMPO optionswithin the command Another new option of this command involves the utilization of a PASS END-PASS FAIL END-FAIL construction permitting the specification of logicalcommand paths depending upon the outcome of the range test

Name changed to DRECODE Virtually identical with the DRC command in the COCENTS system

Unchanged

DO-END-DO one of the most significant and powerful enhancements to the CONCOR language this command provides a loopingcapability similar to most other high-level pogramminglanguages The number of iterations is under user control as well as the value initially assigned to the user identifier (counter) which is incremented with each successive execution of the command statements

Virtually unchanged it is the basic mechanism for maintaining arrays involved in imputation processes

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

WRITE (WRT) WRITE (WRT) The WRITE language statement while once utilized as the primary output command is now used to create an auxiliary or derivative file Statistics may be accumulated with the specification of the GENERATE-EDIT-STATISTICS option of the EXECUTION-DIVISION (See WRITE-FILE of DICTIONARY-DIVISION)

XRECODE (XREC) name changed Old command which has become the GRECODE command in the December 1979 version

0 0 0

DECEMBER 1978 VERSION 1

DECEMBER 1979 VERSION 2

Not implemented REPORT-DIViSION

DISPLAY-CONTROL-SECTION

DISPLAY-EDIT-STATISTICS

TOLERANCE-CONTROL-SECTION

ERROR-RATE-CHECK REJECT-FILE

END-DIVISION

COMMENTS

This new division (note division and section names treated as comments) provides the means to controlthe type of edit statistics reports to be generated Generally all reports produced by the commands ofthis division are organized according to the AREA-CONTROL command of the DICTIONARY-DIVISION Notethat this means the input data file must be preproshycessed (presorted external to CONCOR) on the basisof the AREA-CONTROL data field as CONCOR is incapableof merging uncontiguous records

There is currently no command to enable users to specify unique report headings on the listingfrom this division

APPENDIX G

ISPC FUTURE ENHANCEMENT LIST

G-1

Appendix G

Future Versions of CONCOR Design Considerations

The present version of the CONCOR system (Basic COBOL Version 2) was designed to provide adequate computing power to properly edit housing and population census da a using automatic correction techniquns As a result CONCOR should not be construed to be the ultimate software package for generalized data editing The developers of this version of CONCOR realize that there is a group of improvements that can be made to the system which would facilitate the census data editing process A second group of changes or enhancements also exist that could facilitate the editing of survey or other types of census data It should be noted that some of the modifications to meet these two objectives at times are mutually exclusive Some of the possible modifications and considerations are outlined below

1 Housing and Population Census Processing

1 1 Software

1111 Dictionary Division

(1) Implementation of the Dictionary Attributes Section This would allow the user to specify the dictionary size (number of pages) on disk and control other file characteristics such as volume specification and retention periods if applicable

(2) Addition of an optional input andor output file(s) which could be used to initialize and save values stored in arrays and used during the allocation process This feature would require additional syntax analysis and the creation of new commands to perform the movement of the data values

(3) An output record format area (Output Record Section) could be added This would allow the creation of edited output files in a different format and length than the file read into CONCOR This implementation would require other enhanceients to provide a simple means of moving values from th-iT i tato tile output file Unique naming of user-identifiers would also need to be addressed Another possiblity is the declaration of both the input and output location in the same command when specifying the contents of tile data record to be edited

(4) Addition of user-identifier naires to fields used in the controlling of summary statistics and unique questionnaire determination This would require modification to the system level data dictionary

46 G-2

(5) Increasing the number of area and questionnaire control fields

This implementation would require modification lo the

dictionary as tell as to the error records passed to the Report

Division Another problem would be the formatting of the extra

data values in the report headings

(6) Permit the mixing of difFerent data item types in the CONTROL

commands This would allow the greatest flexibilty in choosing

control fields In order to implement this enhancement the

dictionary and the generated COBOL program (EDITOR) would

require modification

(7) Give the user access to the values in the control fields during

the editing process Either the user would be prohibited from

changing these values (as is currently done with CONCOR

internal identifiers) to prevent the creation of new questionnaires or the summary information gathered on a

questionnaire basis would become meaningless Another

ramification of this enhancement is how to handle fields defined as alphanumeric as CONCOR internally works in binary

(8) Allow input data records with a similar format to be defined by

the same DEFINE-RECORD command statement This could be

accomplished by permitting multiple values in the RECORD-TYPE

clause of the DEFINE-RECORD command A possible problem here

is in the determination of which value to be move to the output

record when the record is to be written out using the OUTPUT

command 0

(9) Implement a COMMION-DATA concept to handle items command to all

record types This could be costly in terms of core storage

and additional conversion time if a storage area is allocated

for each identifier for each record type Only permitting one

common area for all records would require validation of the

data to ensure the values were exactly the same on each record

This wJould require the CONCOR program to perform many more

internal checks when reading the data file into the store area

(10) Permit the selective inclusion of data items found in the

DEFINE-RECORD command statement into the universe count used

for the determination of tolerance levels when using the COUNT-IIPUTES command statement

(11) Increase the number and types of data format referencing now

permitted External numeric data (type N) could be expanded to

handle 18 digit numbers Also signed numeric and packed

decimal data format could be added The signed numeric format

would allow both leading blanks and negative data values This

format is essential ihen dealing with data files generated by

FORTRAN programs

0

+

G-3 47

(12) Increase the number of comparison strings permitted for alphanumeric data items This would allow for easier processing of alphanumeric strings but would require modifications to the system data dictionary

(13) If a range of valid values were specified with the declaration of a user-identifier in the DEFINE-RECORD command an automatic range check could be performed prior to CONCOR giving the user access to the data items on the questionnaire The inclusion of such values are now done by many users in the form of comment statements This enhancement would also facilitate the use of the dictionary by tabulation software

(14) The language format used in the Dictionary Division could be modified to recognize both the space and comma character as valid delimiters The user would still be required to code two commas to receive the system default value for any parameter

(15) Provide a repetition factor in the initialization of arrays

(16) Print the data dictionary name in the titles of the dictionary documentation or provide a means by which the user may specify a title to appear in the listings

112 Execution Division

(M) Implement the Run Control Section This would allow the user to increase execution speed of the EDITOR program This would be accomplished by eliminating some of the system protection features such as division by zero and out of bounds checking In this secion the user could also specify new maximum values for the EDITOR error limits

(2) Provide an internal identifier (OCCURRENCE-PTR) which would contain the occurrence number of the record currently being filtered The only problem to resolve is how to treat this id-ntifier outside of the Filter Routines

(3) Implement a CREATE commnand This command would allow the user to create a record that Was not found on the input file This implementation would require special care as it gives the user the ability to change the valme of a CONCOR internal identifier (TYPE-CGUNT) and modify the contents of the store Another consequence of this command is what action is to be taken in the calculation of tolerance in terms of this new records inclusion into the universe of observations and number of changes

iK

48 G-4

(4) The code generated in the EDITOR program for the DRECODE

command should be modified to utilized a lookup table approach

This would decrease the execution time of the command but woul C

require additional communication between the GENED and GENDD

programs

(5) Permit the user to continue the message text field over

successive lines of code

(6) Implementation of a SET command command that would change

CONCORs default occurrence pointer for the duration of the SET

command This command would be ever helpful as the validation

of the existence of a record would only have to be done once

when the SET command was encounterd The major drawback to

this command is in the users ability to remember the action

taken during the last execution of the command when the

Execution Division command statements may cover many pages of

code

(7) Implement a method of specifying different data formats for

fields on the WRITE command statement This would give greater

flexibility to the user but would require major revision to the

routines now used to process the WRITE command

(8) Provide the user with two sets of internal identifiers The

first set currently exists in the system and is valid on the

basis of the current control area (if specified) being

executed The second set would be accumulated on a run total

basis and would require the creation of new CONCOR internal

ident ifiers

(9) Implement automatic array declarations for capturing allocation

frequency distributions This could be done by the system for

discrete values (especially if point number 13 above is

implemented) but a mechanism for handling continuous values

would need to be designed nother program would then be

required to format the information gathered into a readable

form

113 Report Division

(I) Provide a means by which the user may specify their own headings or titles -for the reports

(2) Give the user the ability to override page ejection as a means

of saving paper

(3) Provide a neans by which the statistics gathered may be

presented in other aqgregations other than the area-break and

total levels now pro ided This would require another

procedure to aggregate the statistics and pass them onto the

Report Division before the printing of the reports began Thi

extra procedure is required because of program size

cons i derat ions

G-5

49

114 System Level

(1) Generation of assembler (ALC) language code for the EDITOR program to be used on the IBM 360370 series type computers

(2) Modification of the GENSRC program i utilize a lookup table instead of the linear search technioa2 now used

(3) Modification of the RDMSGTXT program to look at continuation lines when searching for the text to be supplied by the system

(4) Hlodification to the system data dictionary to better arrange and make available information needed most frequently Also to examine the possibility of making the section dealing with the message text and report generation a separate file Investigate the industry standards for dictionary format and see how CONCOR meets the requirements set forth

(5) Make the parsing and syntacial routines interactive so as to increase the product ity of the CONCOR user This does not mean to imply that the editing process itself should be made interactive but rather only the development of the user code should be put online

12 Documentation

(1)The development of a self-teaching CONCOR manual

(2) The development of a CONCOR Users Guide

G-6

0

50

2 Survey or other Census Processing

21 Software

(1) Make optional the specification of the record type and questionnaire identification information as some files are not

hierarchical in nature

(2) Provide another input file as a means of doing a check-in procedure of sample cases expected This new input file would contain the master list of those oberservations selected to be

examined

(3) Allow floating point calculations

(4) Provide a mechanism for referencing repetitive sections of questionnaire This referencing (loop letter approach) could be accomplished in the same fashion as interrecord checking is now implemented Note that by using this approach two versions of the software would be required One version would be as CONCOR is now implemented and the second would accept flat data files where the subscript indicated a repetitive section on the survey document

(5) Add a weighting scheme to allow meaningful totals and summary

statistics based upon the sampling frame used to gather the data being edited

(6) Increase the number of user-identifiers allowed in the system because of the more detailed information found on survey forms

(7) Provide a means of manual correction of erroneous data items This correction scheme would utilize the data dictionary and allow for correction based upon the user-identifer name and

questionnaire identification CELADE has a COBOL version of

this program but it has not been finished nor tested

42

APPENDIX H

CONCOR SYSTEM INTERNAL VARIABLES

0 APPENO H

CONCOR SYSTEM INTERNAL VARIABLES

1 EOF-FLAG

2 ERROR-TN-QUESTIONNAIRE

3 CURRENT-POINTER-VALUE

4 TYPE-COUNT ( )

5 RECORD-COUNT

6 QUESTIONNAIRE-COUNT

7 RECORDS-IN-STORE

8 INVALID-RECORD-TYPE-COUNT

9 INCOMPLETE-FLAG

10 CONTINUATION-FLAG

11 INVALID-RECORD-FLAG

EOF-FLAG

ERRORS-IN-QUESTIONNAIRE

TYPE-COUNT ( )

IN-RECORD-COUNT

IN-QUESTIONNAIRE-COUNT

RECORDS-IN-STORE

INVALID-RECORD-TYPE-COUNT

INCOMPLETE-FLAG

CONTINUATION-FLAG

OUT-OF-RANGE

NOT-NUMBER

BLANK

(new internal counters) OUT-RECORD-COUNT OUTPUT-QUEST-COUNT WRITE-RECORD-COUNT

Change in spelling of variable

Not mentioned--possible ommission from manual

Not implemented

Note Current documentation equivocates when some variables

are reset--most are initialized with each control area break

Though implemented as reserved identifiers due to poor documentation of (old) CONCOR exact values and utility were unknown

These apparently new counters permit access to valuable totals within control areas

Note These are not cumulative counters Also outputshyquest-count violates naming conversion eg in out

APPENDIX I

CONCOR-EDITOR EXECUTION STATISTICS

C 0 N C o R - E D I T O R EXECUT ION S-TA-T1 ST I CS RUN DATEt 517I

APPENDIX I

CONTROL AREA 0 INPUTS RELD OUTPUTS BY OUTPUTCOHMAND OUTPUTS BY W0ITECOMP4JD NO VALI 0 SEQUENCE NUMBER 00000 00 0 RECTYPES 0

OUESTIONNAIRES RECORDS 0 QUEST KEYWORD- 0 ECORDS- RECORDS FORQUFSTshy

bullDIVISION BY ZERO LINE NUMBER i18 to DIVISION BY ZERO LINE NUMBER 118 bo DIVISION BY ZERO 00 LINE NUMBER 118

Oboo DIVISION BY ZERO 00 LINE NUMBER 118 LTNE |UJMP 000 DIVISION BY ZERO 000 LINE NUMBER 118

bo DIVISION BY ZERO 0 LINE NUMBER 118 00 RUN ABORTED DUE TO DIVISION BY ZERO ERRORS 00 1it LIST IR003 = Ro3t

1 6 42

112 LIST

Comments One of the most sever criticisms 114 LIST IDIVISIION BY ZErO ERROR Cw FOLLOWS of the old CONCOR package was that ]l LET RO01 003 = O it would cease processing for no

apparent reason--ABEND without generating any messages The most )16 LIST iRO01 R003 SET TO ZEROS Pou19 Ron3 conmon types of cessations are division by zero array coordinate 117 LIST RO0 0 = errors and user specified terminations A program to deliberately 118 LET P003 = R002 RO01 test CONCORS ability to providediagnostics was written The system 119 LIST LIST R003 R002RO01 =1 R0031 perfoned exceptionally well and correctly provided the line numbers in the 120 source program which caused the data exceptions The program text has been 121 superimposed upon the Execution Statistics page for illustration purposes 122 END TEST OF OPERANDS AND MSG

123

124

1 5 PEGISTERS RESET TO ZERO

125 LET R001 RO00 R003 = nt

0 N C 1 - Z D I T 0 it - ( r S T A T I S T I C S QJi DAT 0119

I ~ - ~APPENDIX b)APPENDIX TL I A Cr14A) 01 3TUTS JRtTF -0 VtLIQ AP-A I PUT R - 3ITD JTS LY UTPUT - Y C

I)ESTIjAAIES - PECLRW cU] I S T igry-t QEC UP )S irrkqS F] ) T

VLLII= FF= 15( C 1 u U-1 Tr JJ TI A)[ 9 )i 1= TI =WATE ) L i - U = VI L IJ) L T J3r-= I n

D 1 4A 1 EItR P IXSTI J A 1 L = 7 II I fIAT E I i Y C

= y 4 I 19f CZ U AT -Z-2j 1ESTIJh14 E CJ)PI) IATE= 2 V L 1E= L T I P1 =UM 11

= rU F r

F R L- -- I JE TI rjA OrPR eTE= rUMiF=E II V1 LIIF= LIT V M

tYfZU ttj gR R QUIESTI3Ii1 j Vi r IATE= LI n l FJw== 0R V J L T C ( -pJIT E~Q - cUESTT Y2ZjI 1= g rfiflUI)IATL= VeLUJE= P LIf PP= 1 3)

bull~ - A ED 011 1 Tn CCy -tENCE C rGR DI T FOR UP S )r 0 JCCUJ0EICF FJ71 ECGPO

173

175 UNTIL ROO1 gt 10 VARY P001 FROM 1 By Comments

In this example RO01 a 2 dimensional array 177 DO with 10 rows and 10 columns is attempting

178 UNTIL R002 - 10 VARY BO FROM 1 BY 1 to update the smaller (5 X 5) TA2 Nonshy- Uexistent element references caused the error

The program text has been superimposed on this

page for illustration purposes

180 ALLOCATE NOOP = TA1 (RO0] ROO)I

JAI UPDATE TA2 (RO01 R002) RO031

182 LIST N002 = R0037 NOO R003

1P3 LIST VALUES OF TA1 = RO03v TAl (RO01 R002)

184 LIST

185 LIST ITA2=TA2 RO01 Rn02)l

l6 END-DOI LIST

187 END-DO

deg r- -- shy

APPENJDIXI1 J( j

CONTROL AREA INPUTS READ OUTPUTS BY OUTPUT LOMANO OUTPUTS TEltCOMMNDfN

SEQUENCE NUMBER 11T A QUESTIONNAIRES RECORDS QUEST KEYWORD RECORDS SFf- 0 T rr-

JOB STOPPED DUETO USER STOP-RUN COMMAND ON LINC 215 - - -- 2 15 _ _ 4$ A

EDITOR -- NO RMAL END OF J B r T I1)

214 IF IN-RECORD-COUNT gt Comments In this example when the internal variable1215 ITHEN STOP-RUN END-THEN was than

215 TE END-THEN IN-RECORD-COUNT was greater than 500 processing was directed by the source COBOL CONCOR language

216 1 program to cease The program text has been - upon this page for illustrationI _-EC superimposed

S IF IN-RECORD-COUNT- lt 2 - - purposes ~~~~~~~~~~- bull - - ------I-

218~ THEN -LIST lt(THAN 20RECS INR

amp-~~~~~ ntai-beDcent

I

APPENDIX J

DIAGNOSTIC MEFSAGE GUIDE EXAMPLE

APPENDIX J DD-2

WARNING(DD-001) BLANK COMMAND STATEMENT LINE

EXPLANATION A blank command statement line was found in the user Dictionary Division command statement file

ACTION TAKEN Parsing continued with the next command statement line in the Dictionary Division command file

USER RESPONSE Put a comment indicator () in column 1 of the command statement line if spacing is desired if not remove the command statement line

ERROR (DD-002) ILLEGAL FIRST CHARACTER IN STRING

EXPLANATION A character not in the CONCOR legal character set (A-Z 0-9 - + the comma character () the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found as the first character in the string

ACTION TAKEN Parsing began again with the next string

USER RESPONSE Check for possible keying errors and ensure that the first character is a member of the valid CONCOR

character set

ERROR (DD-003) END OF LINE REACHED WITH NO DELIMITER FOR ILLEGAL STRING

EXPLANATION A string starting with a character not valid in the CONCOR legal character set (A-Z 0-9 - + the comma character ()p the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found without one of the delimiter characters ( =) when the end of the command line was reached

ACTION TAKEN Parsing began again with the next user command statement line in the Dictionary Division command file

USER RESPONSE Check for possible keying errors and ensure that each of the characters in the string is a member of the CONCOR valid character set (A-Z 0-9 - + the comma character () he delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ())

ERROR (DD-004) COMMENT INDICATOR () IN STRING

EXPLANATION The comment indicator character () was found embedded in a string

Page 24: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …

APPENDIX C

CONCOR Workshop Schedule January 7-18 1980

U S Bureau of the Census International Statistical Programs Center

Scuderi Building Room 209 4235 28th Avenue Marlow Heights Maryland

Monday January 7

930 am shy 1000 Welcoming Remarks

Overview of Workshop

1000 - 1000 Introduction to CONCOR - Purpose and function

- History of development - General computer

requirements

115 - 200 System Description

-Constraints in design of CONCOR

-Basic subsystems of CONCOR

-User interactions with system

-Examples of outputs produced

1030 shy 1045 Break

1045 - 1200 Editing Concepts - Ways to interrogate

data - Ways to correct data

- Editing housing and

population data - POPSTAN

- Advantages of CONCOR

1200 shy 115 pm Break

200 - 230

230 - 245

245 - 325

User Program Organization -Divisions - Sections

- Routines - Commands

Break

Command Language

Description -Types of statements -Format -Syntax

is

2

Tuesday January 8

Dictionary Division Command Statements

115 pm-2 15 Input-Record-Section930 am-10

30 Punctuation - Define-RecordInput data referencing

Working-Data-SectionWorking data declara-- New-Datations - Array-DataInternal data represen-

tation and storage Break215 - 230

1030 - 1045 Break 230 - 325 Dictionary Examples

- Minimum dictionary1045 - 1200 Dictionary-Attributes-

structureSection - Maximum dictionary- Dictionary-Name

structure

- Hand out dictionaryFile-Section problem- Input-File

- Output-File - Write-File

- Error-File

1200 - 115 pm Break

Wednesday January 9

115 pm-2 1 5 Execution Division Command

930 am-1030 Discussion of Partici-Statements (continued)pants Dictionary

- Routines of Edit-Specishyproblems fications-Section

Free work time -Prolog-Routine -Filter-Routine -Epilog-Routine

- Types and functions 1030 - 1045 Break

of edit specification

commands1045 - 1200 Execution Division

Command Statements - Range- Punctuation - Assert- Subscripting

- Internal Identifiers - Report-Control-Section

-Generate-Edit-Statistics 215 - 230 Break-Count-Imputes

-Examples 230 - 325 - PassFail clauses

- List1200 - 115 pm Break

3

Thursday January 10

930 am-1030 Discussion of Problems - If115 pm-215

Free work time - UntilExit - Stop

1030 - 1045 Break 215 - 230 Break

1045 shy 1200 Edit Specification Command Statements 230 - 325 - Drecode

(continued) - Grecode

- Allocate - Update - Let

1200 - 115pm Break

Friday January 11

930 am-1030 Edit Specification 115 pm-3 25 Hand out Edit Specishyfications as problemCommand Statements

(continued) Free work time

- Output - Write

1030 - 1045 Break

1045 - 1200 Report Division Command Statements - Display-Control-

Section -Display-Edit-Statistics

- Tolerance-Control-Section -Error-Rate-Check -Reject-File

-Report Examples

1200 - 115 pm Break

I

29

Monday January 14

930 am-1030 Discuss procedures for running problems on computer

1030-1045 Break

1045-1200 Component Programs of the CONCOR system

1200- 115 pm Break

Tuesday January 15

930 am - 325 pm Free work time

Wednesday January 16

930 am 1200 Free work time

1200- 115 pm Break

115 pm-215 How to Install CONCOR on IBM 360370 OS

215- 230 Break

230-325 Free work time

Thursday January 17

930 am-325 Free work time

115 pm-215 Future CONCOR Design Considerations - for census processing - for survey processing

- manual correction system

215- 230 Break

230 - 245 Evaluation Guidelines

- Hand out evaluation forms

245 - 325 Free work time

Friday January 18

930 am-1030 Free work time 115-325 Free work time

1030 - 1045 Break

1045 - 1200 Group Discussion on CONCOR - submission of written evaluations Distribution of IBM installation tapes Presentation of Certificates to Participants

1200-115 pm Break

APPENDIX D

PARTICIPANTS

APPENDIX D

CONCOR WORKSHOP January 7 - 18 1980 Marlow Heights MD

PARTICIPANTS

KENYA

James Midianga Government Computer Centre Central Bureau of Statistics PO Box 30266 Nairobi Kenya

PANAMA

Omar Rivera Rios Data Processing Specialist Contraloria General de la Republica de Panama Apartado Postal 5213 Panama 5 Panama

PHILIPPINES Guida T Capellan OIC Computer Programming Division National Census and Statistics Office Solicarel Bldg 1 Magsaysay Blvd

Sta Mesa Manila Philippines

THAILAND

Angsumal Sunalai National Statistical Office Bangkok 1 Thailand

EGYPT

Farag Sedky Mourad Ghaleb CAPMAS (Centeral Agency for Public

Mobilization and Statistics)NASR City Cairo Egypt

L

2 31

SAUDI ARABIA

Shargi R Al-Shargi National Computer Center PO Box 2534 Riyadh Saudi Arabia

Ahmed S Al-Ghamdi Dept Director of National Computer Center PO Box 2534 Riyadh Saudi Arabia

OTHER

Robert W ONeal USREPJECOR APO New York NY 09038

John N Adams Data Processing Advisor DACCA Department of State Washington DC 20520

OR co UNDP PO Box 224 Dacca Bangladesh

Joe Quasney US Census Bureau Washington DC 20233

Howard Brunsman 5715 N Ninth St Arlington VA 22205

Larry Shiller and Julio Ortuzar Delta Systems Consultants Inc 264 Alhambra Circle Coral Cables FL 33134

Nicholas Ourusoff Burpee Hill Road New London NH 03257 (United Nations)

Frederic J Grant IV Systems Development GeorgiaWorld Congress Institute 580 North Omni International Atlanta Georgia 30303

3 32

Rafael Samper Mary Jo Keenan Catherine B Gleason LACDR Room 2246 NS Washington DC 20523

John Marshall AIDSERDMPSE Rm 706C SA-12 Washington DC 20523

Susanne Bacon and Mary K Friday ISPCData Processing US Census Bureau Washington DC 20233

Leo Dougherty ISPCWorld Census Staff US Bureau of the Census Washington DC 20233

33

CON C OR WORKSHOP

January 7-18 1979 Washington DC

Staff

Robert R Bair

Luis Garcia

David Malkovsky

Sandra Mansfield

Selma Sawaya

Vivian Toro

APPENDIX E

CONCOR EVALUATION FORM

0

APPENDIX E

CONCOR WORKSHOP January 16 1980

BASIC COBOL VERSION 2

December 1979 Release

Participant Evaluation of Package

1 What is your impression of this version of CONCOR in terms of its usefulness and reliability for editing housing and population census data in less developed countries

2 If you are familiar with previous versions of CONCOR how does this current version compare to them

2D

35

2

3 How would you compare the utility of this version of CONCOR for less developed countries with other systems for editing data of which you are familiar

4 How well do you feel the documentation (Reference Manual and Diagnostic Messages Guide) supports the software

36

3

5 How well do you think this version of CONCOR would serve for editing other types of statistical censuses or surveys

6 If your organization does statistical data processing by computer would you recommend to your superiors that this software be used to edit the data Do you feel assistance would be required in the installation of this software on your organizations computer andor in training userR on the packages applicability to their work

37

7 In what way(s) could improvements or enhancements be made to this version

of the CONCOR software andor its documentation

8 Other comments

1shy

APPENDIX F

NEWOLD COMMAND COMPARISONS

0 0

OLD CONCOR VERSION 1 DECEMBER 1978

DATA-DICTIONARY

DEFINE INPUT-FILE

DEFINE ERROR-FILE

DEFINE COMMON-DATA

(contains the questionnaire-IDand record type location information)

DEFINE RECORD-TYPE=

DEFINE NEW-DATA

DEFINE ARRAY-DATA

APPENDIX F

NEW CONCOR VERSION 2 DECEMBER 1979

DICTIONARY-DIVISION

DICTIONARY-ATTRIBUTES-SECTION

DICTIONARY-NAME

FILE-SECTION

INPUT-FILE OUTPUT-FILE WRITE-FILE ERROR-FILE

DOENTIFICATION-CONTROL-SECTION

AREA-CONTROL

QUESTIONNAIRE-CONTROL

RECORD-CONTROL

INPUT-RECORD-SECTION

DEFINE-RECORD

WORKING-DATA-SECTION

NEW-DATA ARRAY-DATA

END-DiVISION

COMMENTS

In Version 1 all command keyword labels had to begin in column 1 ofthe coding sheet The position notation of Version 2 permits thecommand to begin in columns 1-72

The period preceding a statement

signifies that it is but a commentcard Accordingly it should be noted that while the END-DIVISION command must be present the division name identifier DICTIONARY-DIVISION

has not been implemented

Note Some parameters for each commandhave been altered even where general correspondence exists

OLD CONCOR VERSION 1 DECEMBER 1978

EDIT-SPECIFICATION

PROLOG

FILTER TYPE ( )

EPILOG

ALLOCATE (ALL) ASSERT (AST)

END ERROR

FILTER TYPE ( ) WITH

IF LET

RANGE (RNG) RECODE (REC) STOP --keyword

UPDATE (UPD) WRITE (WRT) XRECODE (XREC)

NEW CONCOR VERSION 2 DECEMBER 1979

EXECUTION-DIVISION

REPORT-CONTROL-SECTION

COUNT-IMPUTES GENERATE-EDIT-STATISTICS

EDIT-SPECIFICATION-SECTION

PROLOG-ROUTINE

FILTER-ROUTINE (name-of-record-type)

EPILOG-ROUTINE

ALLOCATE (ALLOC) ASSERT (ASRT) DRECODE (DRCD) END-DIVISION

EXIT

GRECODE (GRCD) IF LET LIST OUTPUT RANGE (RNG)

STOP UNTIL UPDATE (UPD) WRITE (WRT)

END-DIVISION

COMMENTS

Periods preceding division and section names signify that they are treated as comments by CONCOR

END-DIVISION signifies the termination of the CONCOR division organization and must always be present even though division identifiers have not been implemented

This figure additionally permits a comparison of the EDIT-SPECI-FICATION commands of both CONCOR language versions Note the changes in abbreviations and that some keyword options (not shown) have been altered even where general corresponshydence exists Individual commands are discussed on the following pages

CONCOR LANGUAGE COMMAND STATEMENTS

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

ALLOCATE (ALL) ALLOCATE (ALLOC) Allows for the assignment of values and with the UPDATE command providesthe imputation mechanism Major difference is this statement now providesfor a receiving-identifier list and statistics are generated as coded inthe GENERATE-EDIT-STATISTICS command option of the EXECUTION-DIVISION

ASSERT fAST) ASSERT (ASRT) The ASSERT command is the basic consistency editor command of CONCOR Command now provides for a NOERROR and MESSAGE option Additionallyby utilizinq the DUMPR DUMPO keywords provides the user with the option of using all the current value of all user-identifiers defined for the record type being processed Other changes to this command include the pass end-pass fail end-fail constructions for the P purpose of maintaining structured logic

(see RECODE) DRECODE (DRCD) Provides a method for recoding data items having a starting value of zero and continue in ascending sequence Replaces t previous XRECODE command

EXIT Provides the means to terminate EXECUTION (ESCAPE) of command statements within the UNTIL command

END END-IF END-THEN

A solitary END command is no lonqer permitted in conditional construction Note the similarity to COBOL pseudocode

END-ELSE

ERROR not implemented No longer supported in language

(continued)

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

FILTER TYPE ( ) WITH not implemented No longer are option statements supported in language This command once specified essentially a GOTO depending on construction

(see XRECODE) GRECODE (GRCD) This new command provides a method for the group recoding of input values

IF IF The IF statement has properties found virtually in all programming languages ie provides a means of evaluating a condition and controlling the execution of alternative logical statements Similar to pseudocode a CONCOR programmer must utilize the structured command IF THEN END-THEN ELSE END-ELSE construction in place of the IF THEN END ELSE END statements acceptable to the 1978 version

LET LET Virtually unchanged

LIST New command which allows for the generation of specific messages to be displayed in the report of edit statistics by questionnaire as well as specific individual identifiers

OUTPUT New command which provides the means by which records will be written to the output-file

(continuej)

4)bull

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 VERSION 1

RANGE (RNG)

RECODE (REC)

STOP- keyword

UPDATE (UPD)

(continued)

DECEMBER 1979 VERSION 2

RANGE (RNG)

name changed

STOP- keyword

UNTIL DO END-DO

UPDATE (UPD)

COMMENTS

As a basic editing command of CONCOR the December 1979 version permits the listing of the out-of-range values as well as the entire record or questionnaire through DUMPR and DUMPO optionswithin the command Another new option of this command involves the utilization of a PASS END-PASS FAIL END-FAIL construction permitting the specification of logicalcommand paths depending upon the outcome of the range test

Name changed to DRECODE Virtually identical with the DRC command in the COCENTS system

Unchanged

DO-END-DO one of the most significant and powerful enhancements to the CONCOR language this command provides a loopingcapability similar to most other high-level pogramminglanguages The number of iterations is under user control as well as the value initially assigned to the user identifier (counter) which is incremented with each successive execution of the command statements

Virtually unchanged it is the basic mechanism for maintaining arrays involved in imputation processes

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

WRITE (WRT) WRITE (WRT) The WRITE language statement while once utilized as the primary output command is now used to create an auxiliary or derivative file Statistics may be accumulated with the specification of the GENERATE-EDIT-STATISTICS option of the EXECUTION-DIVISION (See WRITE-FILE of DICTIONARY-DIVISION)

XRECODE (XREC) name changed Old command which has become the GRECODE command in the December 1979 version

0 0 0

DECEMBER 1978 VERSION 1

DECEMBER 1979 VERSION 2

Not implemented REPORT-DIViSION

DISPLAY-CONTROL-SECTION

DISPLAY-EDIT-STATISTICS

TOLERANCE-CONTROL-SECTION

ERROR-RATE-CHECK REJECT-FILE

END-DIVISION

COMMENTS

This new division (note division and section names treated as comments) provides the means to controlthe type of edit statistics reports to be generated Generally all reports produced by the commands ofthis division are organized according to the AREA-CONTROL command of the DICTIONARY-DIVISION Notethat this means the input data file must be preproshycessed (presorted external to CONCOR) on the basisof the AREA-CONTROL data field as CONCOR is incapableof merging uncontiguous records

There is currently no command to enable users to specify unique report headings on the listingfrom this division

APPENDIX G

ISPC FUTURE ENHANCEMENT LIST

G-1

Appendix G

Future Versions of CONCOR Design Considerations

The present version of the CONCOR system (Basic COBOL Version 2) was designed to provide adequate computing power to properly edit housing and population census da a using automatic correction techniquns As a result CONCOR should not be construed to be the ultimate software package for generalized data editing The developers of this version of CONCOR realize that there is a group of improvements that can be made to the system which would facilitate the census data editing process A second group of changes or enhancements also exist that could facilitate the editing of survey or other types of census data It should be noted that some of the modifications to meet these two objectives at times are mutually exclusive Some of the possible modifications and considerations are outlined below

1 Housing and Population Census Processing

1 1 Software

1111 Dictionary Division

(1) Implementation of the Dictionary Attributes Section This would allow the user to specify the dictionary size (number of pages) on disk and control other file characteristics such as volume specification and retention periods if applicable

(2) Addition of an optional input andor output file(s) which could be used to initialize and save values stored in arrays and used during the allocation process This feature would require additional syntax analysis and the creation of new commands to perform the movement of the data values

(3) An output record format area (Output Record Section) could be added This would allow the creation of edited output files in a different format and length than the file read into CONCOR This implementation would require other enhanceients to provide a simple means of moving values from th-iT i tato tile output file Unique naming of user-identifiers would also need to be addressed Another possiblity is the declaration of both the input and output location in the same command when specifying the contents of tile data record to be edited

(4) Addition of user-identifier naires to fields used in the controlling of summary statistics and unique questionnaire determination This would require modification to the system level data dictionary

46 G-2

(5) Increasing the number of area and questionnaire control fields

This implementation would require modification lo the

dictionary as tell as to the error records passed to the Report

Division Another problem would be the formatting of the extra

data values in the report headings

(6) Permit the mixing of difFerent data item types in the CONTROL

commands This would allow the greatest flexibilty in choosing

control fields In order to implement this enhancement the

dictionary and the generated COBOL program (EDITOR) would

require modification

(7) Give the user access to the values in the control fields during

the editing process Either the user would be prohibited from

changing these values (as is currently done with CONCOR

internal identifiers) to prevent the creation of new questionnaires or the summary information gathered on a

questionnaire basis would become meaningless Another

ramification of this enhancement is how to handle fields defined as alphanumeric as CONCOR internally works in binary

(8) Allow input data records with a similar format to be defined by

the same DEFINE-RECORD command statement This could be

accomplished by permitting multiple values in the RECORD-TYPE

clause of the DEFINE-RECORD command A possible problem here

is in the determination of which value to be move to the output

record when the record is to be written out using the OUTPUT

command 0

(9) Implement a COMMION-DATA concept to handle items command to all

record types This could be costly in terms of core storage

and additional conversion time if a storage area is allocated

for each identifier for each record type Only permitting one

common area for all records would require validation of the

data to ensure the values were exactly the same on each record

This wJould require the CONCOR program to perform many more

internal checks when reading the data file into the store area

(10) Permit the selective inclusion of data items found in the

DEFINE-RECORD command statement into the universe count used

for the determination of tolerance levels when using the COUNT-IIPUTES command statement

(11) Increase the number and types of data format referencing now

permitted External numeric data (type N) could be expanded to

handle 18 digit numbers Also signed numeric and packed

decimal data format could be added The signed numeric format

would allow both leading blanks and negative data values This

format is essential ihen dealing with data files generated by

FORTRAN programs

0

+

G-3 47

(12) Increase the number of comparison strings permitted for alphanumeric data items This would allow for easier processing of alphanumeric strings but would require modifications to the system data dictionary

(13) If a range of valid values were specified with the declaration of a user-identifier in the DEFINE-RECORD command an automatic range check could be performed prior to CONCOR giving the user access to the data items on the questionnaire The inclusion of such values are now done by many users in the form of comment statements This enhancement would also facilitate the use of the dictionary by tabulation software

(14) The language format used in the Dictionary Division could be modified to recognize both the space and comma character as valid delimiters The user would still be required to code two commas to receive the system default value for any parameter

(15) Provide a repetition factor in the initialization of arrays

(16) Print the data dictionary name in the titles of the dictionary documentation or provide a means by which the user may specify a title to appear in the listings

112 Execution Division

(M) Implement the Run Control Section This would allow the user to increase execution speed of the EDITOR program This would be accomplished by eliminating some of the system protection features such as division by zero and out of bounds checking In this secion the user could also specify new maximum values for the EDITOR error limits

(2) Provide an internal identifier (OCCURRENCE-PTR) which would contain the occurrence number of the record currently being filtered The only problem to resolve is how to treat this id-ntifier outside of the Filter Routines

(3) Implement a CREATE commnand This command would allow the user to create a record that Was not found on the input file This implementation would require special care as it gives the user the ability to change the valme of a CONCOR internal identifier (TYPE-CGUNT) and modify the contents of the store Another consequence of this command is what action is to be taken in the calculation of tolerance in terms of this new records inclusion into the universe of observations and number of changes

iK

48 G-4

(4) The code generated in the EDITOR program for the DRECODE

command should be modified to utilized a lookup table approach

This would decrease the execution time of the command but woul C

require additional communication between the GENED and GENDD

programs

(5) Permit the user to continue the message text field over

successive lines of code

(6) Implementation of a SET command command that would change

CONCORs default occurrence pointer for the duration of the SET

command This command would be ever helpful as the validation

of the existence of a record would only have to be done once

when the SET command was encounterd The major drawback to

this command is in the users ability to remember the action

taken during the last execution of the command when the

Execution Division command statements may cover many pages of

code

(7) Implement a method of specifying different data formats for

fields on the WRITE command statement This would give greater

flexibility to the user but would require major revision to the

routines now used to process the WRITE command

(8) Provide the user with two sets of internal identifiers The

first set currently exists in the system and is valid on the

basis of the current control area (if specified) being

executed The second set would be accumulated on a run total

basis and would require the creation of new CONCOR internal

ident ifiers

(9) Implement automatic array declarations for capturing allocation

frequency distributions This could be done by the system for

discrete values (especially if point number 13 above is

implemented) but a mechanism for handling continuous values

would need to be designed nother program would then be

required to format the information gathered into a readable

form

113 Report Division

(I) Provide a means by which the user may specify their own headings or titles -for the reports

(2) Give the user the ability to override page ejection as a means

of saving paper

(3) Provide a neans by which the statistics gathered may be

presented in other aqgregations other than the area-break and

total levels now pro ided This would require another

procedure to aggregate the statistics and pass them onto the

Report Division before the printing of the reports began Thi

extra procedure is required because of program size

cons i derat ions

G-5

49

114 System Level

(1) Generation of assembler (ALC) language code for the EDITOR program to be used on the IBM 360370 series type computers

(2) Modification of the GENSRC program i utilize a lookup table instead of the linear search technioa2 now used

(3) Modification of the RDMSGTXT program to look at continuation lines when searching for the text to be supplied by the system

(4) Hlodification to the system data dictionary to better arrange and make available information needed most frequently Also to examine the possibility of making the section dealing with the message text and report generation a separate file Investigate the industry standards for dictionary format and see how CONCOR meets the requirements set forth

(5) Make the parsing and syntacial routines interactive so as to increase the product ity of the CONCOR user This does not mean to imply that the editing process itself should be made interactive but rather only the development of the user code should be put online

12 Documentation

(1)The development of a self-teaching CONCOR manual

(2) The development of a CONCOR Users Guide

G-6

0

50

2 Survey or other Census Processing

21 Software

(1) Make optional the specification of the record type and questionnaire identification information as some files are not

hierarchical in nature

(2) Provide another input file as a means of doing a check-in procedure of sample cases expected This new input file would contain the master list of those oberservations selected to be

examined

(3) Allow floating point calculations

(4) Provide a mechanism for referencing repetitive sections of questionnaire This referencing (loop letter approach) could be accomplished in the same fashion as interrecord checking is now implemented Note that by using this approach two versions of the software would be required One version would be as CONCOR is now implemented and the second would accept flat data files where the subscript indicated a repetitive section on the survey document

(5) Add a weighting scheme to allow meaningful totals and summary

statistics based upon the sampling frame used to gather the data being edited

(6) Increase the number of user-identifiers allowed in the system because of the more detailed information found on survey forms

(7) Provide a means of manual correction of erroneous data items This correction scheme would utilize the data dictionary and allow for correction based upon the user-identifer name and

questionnaire identification CELADE has a COBOL version of

this program but it has not been finished nor tested

42

APPENDIX H

CONCOR SYSTEM INTERNAL VARIABLES

0 APPENO H

CONCOR SYSTEM INTERNAL VARIABLES

1 EOF-FLAG

2 ERROR-TN-QUESTIONNAIRE

3 CURRENT-POINTER-VALUE

4 TYPE-COUNT ( )

5 RECORD-COUNT

6 QUESTIONNAIRE-COUNT

7 RECORDS-IN-STORE

8 INVALID-RECORD-TYPE-COUNT

9 INCOMPLETE-FLAG

10 CONTINUATION-FLAG

11 INVALID-RECORD-FLAG

EOF-FLAG

ERRORS-IN-QUESTIONNAIRE

TYPE-COUNT ( )

IN-RECORD-COUNT

IN-QUESTIONNAIRE-COUNT

RECORDS-IN-STORE

INVALID-RECORD-TYPE-COUNT

INCOMPLETE-FLAG

CONTINUATION-FLAG

OUT-OF-RANGE

NOT-NUMBER

BLANK

(new internal counters) OUT-RECORD-COUNT OUTPUT-QUEST-COUNT WRITE-RECORD-COUNT

Change in spelling of variable

Not mentioned--possible ommission from manual

Not implemented

Note Current documentation equivocates when some variables

are reset--most are initialized with each control area break

Though implemented as reserved identifiers due to poor documentation of (old) CONCOR exact values and utility were unknown

These apparently new counters permit access to valuable totals within control areas

Note These are not cumulative counters Also outputshyquest-count violates naming conversion eg in out

APPENDIX I

CONCOR-EDITOR EXECUTION STATISTICS

C 0 N C o R - E D I T O R EXECUT ION S-TA-T1 ST I CS RUN DATEt 517I

APPENDIX I

CONTROL AREA 0 INPUTS RELD OUTPUTS BY OUTPUTCOHMAND OUTPUTS BY W0ITECOMP4JD NO VALI 0 SEQUENCE NUMBER 00000 00 0 RECTYPES 0

OUESTIONNAIRES RECORDS 0 QUEST KEYWORD- 0 ECORDS- RECORDS FORQUFSTshy

bullDIVISION BY ZERO LINE NUMBER i18 to DIVISION BY ZERO LINE NUMBER 118 bo DIVISION BY ZERO 00 LINE NUMBER 118

Oboo DIVISION BY ZERO 00 LINE NUMBER 118 LTNE |UJMP 000 DIVISION BY ZERO 000 LINE NUMBER 118

bo DIVISION BY ZERO 0 LINE NUMBER 118 00 RUN ABORTED DUE TO DIVISION BY ZERO ERRORS 00 1it LIST IR003 = Ro3t

1 6 42

112 LIST

Comments One of the most sever criticisms 114 LIST IDIVISIION BY ZErO ERROR Cw FOLLOWS of the old CONCOR package was that ]l LET RO01 003 = O it would cease processing for no

apparent reason--ABEND without generating any messages The most )16 LIST iRO01 R003 SET TO ZEROS Pou19 Ron3 conmon types of cessations are division by zero array coordinate 117 LIST RO0 0 = errors and user specified terminations A program to deliberately 118 LET P003 = R002 RO01 test CONCORS ability to providediagnostics was written The system 119 LIST LIST R003 R002RO01 =1 R0031 perfoned exceptionally well and correctly provided the line numbers in the 120 source program which caused the data exceptions The program text has been 121 superimposed upon the Execution Statistics page for illustration purposes 122 END TEST OF OPERANDS AND MSG

123

124

1 5 PEGISTERS RESET TO ZERO

125 LET R001 RO00 R003 = nt

0 N C 1 - Z D I T 0 it - ( r S T A T I S T I C S QJi DAT 0119

I ~ - ~APPENDIX b)APPENDIX TL I A Cr14A) 01 3TUTS JRtTF -0 VtLIQ AP-A I PUT R - 3ITD JTS LY UTPUT - Y C

I)ESTIjAAIES - PECLRW cU] I S T igry-t QEC UP )S irrkqS F] ) T

VLLII= FF= 15( C 1 u U-1 Tr JJ TI A)[ 9 )i 1= TI =WATE ) L i - U = VI L IJ) L T J3r-= I n

D 1 4A 1 EItR P IXSTI J A 1 L = 7 II I fIAT E I i Y C

= y 4 I 19f CZ U AT -Z-2j 1ESTIJh14 E CJ)PI) IATE= 2 V L 1E= L T I P1 =UM 11

= rU F r

F R L- -- I JE TI rjA OrPR eTE= rUMiF=E II V1 LIIF= LIT V M

tYfZU ttj gR R QUIESTI3Ii1 j Vi r IATE= LI n l FJw== 0R V J L T C ( -pJIT E~Q - cUESTT Y2ZjI 1= g rfiflUI)IATL= VeLUJE= P LIf PP= 1 3)

bull~ - A ED 011 1 Tn CCy -tENCE C rGR DI T FOR UP S )r 0 JCCUJ0EICF FJ71 ECGPO

173

175 UNTIL ROO1 gt 10 VARY P001 FROM 1 By Comments

In this example RO01 a 2 dimensional array 177 DO with 10 rows and 10 columns is attempting

178 UNTIL R002 - 10 VARY BO FROM 1 BY 1 to update the smaller (5 X 5) TA2 Nonshy- Uexistent element references caused the error

The program text has been superimposed on this

page for illustration purposes

180 ALLOCATE NOOP = TA1 (RO0] ROO)I

JAI UPDATE TA2 (RO01 R002) RO031

182 LIST N002 = R0037 NOO R003

1P3 LIST VALUES OF TA1 = RO03v TAl (RO01 R002)

184 LIST

185 LIST ITA2=TA2 RO01 Rn02)l

l6 END-DOI LIST

187 END-DO

deg r- -- shy

APPENJDIXI1 J( j

CONTROL AREA INPUTS READ OUTPUTS BY OUTPUT LOMANO OUTPUTS TEltCOMMNDfN

SEQUENCE NUMBER 11T A QUESTIONNAIRES RECORDS QUEST KEYWORD RECORDS SFf- 0 T rr-

JOB STOPPED DUETO USER STOP-RUN COMMAND ON LINC 215 - - -- 2 15 _ _ 4$ A

EDITOR -- NO RMAL END OF J B r T I1)

214 IF IN-RECORD-COUNT gt Comments In this example when the internal variable1215 ITHEN STOP-RUN END-THEN was than

215 TE END-THEN IN-RECORD-COUNT was greater than 500 processing was directed by the source COBOL CONCOR language

216 1 program to cease The program text has been - upon this page for illustrationI _-EC superimposed

S IF IN-RECORD-COUNT- lt 2 - - purposes ~~~~~~~~~~- bull - - ------I-

218~ THEN -LIST lt(THAN 20RECS INR

amp-~~~~~ ntai-beDcent

I

APPENDIX J

DIAGNOSTIC MEFSAGE GUIDE EXAMPLE

APPENDIX J DD-2

WARNING(DD-001) BLANK COMMAND STATEMENT LINE

EXPLANATION A blank command statement line was found in the user Dictionary Division command statement file

ACTION TAKEN Parsing continued with the next command statement line in the Dictionary Division command file

USER RESPONSE Put a comment indicator () in column 1 of the command statement line if spacing is desired if not remove the command statement line

ERROR (DD-002) ILLEGAL FIRST CHARACTER IN STRING

EXPLANATION A character not in the CONCOR legal character set (A-Z 0-9 - + the comma character () the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found as the first character in the string

ACTION TAKEN Parsing began again with the next string

USER RESPONSE Check for possible keying errors and ensure that the first character is a member of the valid CONCOR

character set

ERROR (DD-003) END OF LINE REACHED WITH NO DELIMITER FOR ILLEGAL STRING

EXPLANATION A string starting with a character not valid in the CONCOR legal character set (A-Z 0-9 - + the comma character ()p the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found without one of the delimiter characters ( =) when the end of the command line was reached

ACTION TAKEN Parsing began again with the next user command statement line in the Dictionary Division command file

USER RESPONSE Check for possible keying errors and ensure that each of the characters in the string is a member of the CONCOR valid character set (A-Z 0-9 - + the comma character () he delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ())

ERROR (DD-004) COMMENT INDICATOR () IN STRING

EXPLANATION The comment indicator character () was found embedded in a string

Page 25: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …

2

Tuesday January 8

Dictionary Division Command Statements

115 pm-2 15 Input-Record-Section930 am-10

30 Punctuation - Define-RecordInput data referencing

Working-Data-SectionWorking data declara-- New-Datations - Array-DataInternal data represen-

tation and storage Break215 - 230

1030 - 1045 Break 230 - 325 Dictionary Examples

- Minimum dictionary1045 - 1200 Dictionary-Attributes-

structureSection - Maximum dictionary- Dictionary-Name

structure

- Hand out dictionaryFile-Section problem- Input-File

- Output-File - Write-File

- Error-File

1200 - 115 pm Break

Wednesday January 9

115 pm-2 1 5 Execution Division Command

930 am-1030 Discussion of Partici-Statements (continued)pants Dictionary

- Routines of Edit-Specishyproblems fications-Section

Free work time -Prolog-Routine -Filter-Routine -Epilog-Routine

- Types and functions 1030 - 1045 Break

of edit specification

commands1045 - 1200 Execution Division

Command Statements - Range- Punctuation - Assert- Subscripting

- Internal Identifiers - Report-Control-Section

-Generate-Edit-Statistics 215 - 230 Break-Count-Imputes

-Examples 230 - 325 - PassFail clauses

- List1200 - 115 pm Break

3

Thursday January 10

930 am-1030 Discussion of Problems - If115 pm-215

Free work time - UntilExit - Stop

1030 - 1045 Break 215 - 230 Break

1045 shy 1200 Edit Specification Command Statements 230 - 325 - Drecode

(continued) - Grecode

- Allocate - Update - Let

1200 - 115pm Break

Friday January 11

930 am-1030 Edit Specification 115 pm-3 25 Hand out Edit Specishyfications as problemCommand Statements

(continued) Free work time

- Output - Write

1030 - 1045 Break

1045 - 1200 Report Division Command Statements - Display-Control-

Section -Display-Edit-Statistics

- Tolerance-Control-Section -Error-Rate-Check -Reject-File

-Report Examples

1200 - 115 pm Break

I

29

Monday January 14

930 am-1030 Discuss procedures for running problems on computer

1030-1045 Break

1045-1200 Component Programs of the CONCOR system

1200- 115 pm Break

Tuesday January 15

930 am - 325 pm Free work time

Wednesday January 16

930 am 1200 Free work time

1200- 115 pm Break

115 pm-215 How to Install CONCOR on IBM 360370 OS

215- 230 Break

230-325 Free work time

Thursday January 17

930 am-325 Free work time

115 pm-215 Future CONCOR Design Considerations - for census processing - for survey processing

- manual correction system

215- 230 Break

230 - 245 Evaluation Guidelines

- Hand out evaluation forms

245 - 325 Free work time

Friday January 18

930 am-1030 Free work time 115-325 Free work time

1030 - 1045 Break

1045 - 1200 Group Discussion on CONCOR - submission of written evaluations Distribution of IBM installation tapes Presentation of Certificates to Participants

1200-115 pm Break

APPENDIX D

PARTICIPANTS

APPENDIX D

CONCOR WORKSHOP January 7 - 18 1980 Marlow Heights MD

PARTICIPANTS

KENYA

James Midianga Government Computer Centre Central Bureau of Statistics PO Box 30266 Nairobi Kenya

PANAMA

Omar Rivera Rios Data Processing Specialist Contraloria General de la Republica de Panama Apartado Postal 5213 Panama 5 Panama

PHILIPPINES Guida T Capellan OIC Computer Programming Division National Census and Statistics Office Solicarel Bldg 1 Magsaysay Blvd

Sta Mesa Manila Philippines

THAILAND

Angsumal Sunalai National Statistical Office Bangkok 1 Thailand

EGYPT

Farag Sedky Mourad Ghaleb CAPMAS (Centeral Agency for Public

Mobilization and Statistics)NASR City Cairo Egypt

L

2 31

SAUDI ARABIA

Shargi R Al-Shargi National Computer Center PO Box 2534 Riyadh Saudi Arabia

Ahmed S Al-Ghamdi Dept Director of National Computer Center PO Box 2534 Riyadh Saudi Arabia

OTHER

Robert W ONeal USREPJECOR APO New York NY 09038

John N Adams Data Processing Advisor DACCA Department of State Washington DC 20520

OR co UNDP PO Box 224 Dacca Bangladesh

Joe Quasney US Census Bureau Washington DC 20233

Howard Brunsman 5715 N Ninth St Arlington VA 22205

Larry Shiller and Julio Ortuzar Delta Systems Consultants Inc 264 Alhambra Circle Coral Cables FL 33134

Nicholas Ourusoff Burpee Hill Road New London NH 03257 (United Nations)

Frederic J Grant IV Systems Development GeorgiaWorld Congress Institute 580 North Omni International Atlanta Georgia 30303

3 32

Rafael Samper Mary Jo Keenan Catherine B Gleason LACDR Room 2246 NS Washington DC 20523

John Marshall AIDSERDMPSE Rm 706C SA-12 Washington DC 20523

Susanne Bacon and Mary K Friday ISPCData Processing US Census Bureau Washington DC 20233

Leo Dougherty ISPCWorld Census Staff US Bureau of the Census Washington DC 20233

33

CON C OR WORKSHOP

January 7-18 1979 Washington DC

Staff

Robert R Bair

Luis Garcia

David Malkovsky

Sandra Mansfield

Selma Sawaya

Vivian Toro

APPENDIX E

CONCOR EVALUATION FORM

0

APPENDIX E

CONCOR WORKSHOP January 16 1980

BASIC COBOL VERSION 2

December 1979 Release

Participant Evaluation of Package

1 What is your impression of this version of CONCOR in terms of its usefulness and reliability for editing housing and population census data in less developed countries

2 If you are familiar with previous versions of CONCOR how does this current version compare to them

2D

35

2

3 How would you compare the utility of this version of CONCOR for less developed countries with other systems for editing data of which you are familiar

4 How well do you feel the documentation (Reference Manual and Diagnostic Messages Guide) supports the software

36

3

5 How well do you think this version of CONCOR would serve for editing other types of statistical censuses or surveys

6 If your organization does statistical data processing by computer would you recommend to your superiors that this software be used to edit the data Do you feel assistance would be required in the installation of this software on your organizations computer andor in training userR on the packages applicability to their work

37

7 In what way(s) could improvements or enhancements be made to this version

of the CONCOR software andor its documentation

8 Other comments

1shy

APPENDIX F

NEWOLD COMMAND COMPARISONS

0 0

OLD CONCOR VERSION 1 DECEMBER 1978

DATA-DICTIONARY

DEFINE INPUT-FILE

DEFINE ERROR-FILE

DEFINE COMMON-DATA

(contains the questionnaire-IDand record type location information)

DEFINE RECORD-TYPE=

DEFINE NEW-DATA

DEFINE ARRAY-DATA

APPENDIX F

NEW CONCOR VERSION 2 DECEMBER 1979

DICTIONARY-DIVISION

DICTIONARY-ATTRIBUTES-SECTION

DICTIONARY-NAME

FILE-SECTION

INPUT-FILE OUTPUT-FILE WRITE-FILE ERROR-FILE

DOENTIFICATION-CONTROL-SECTION

AREA-CONTROL

QUESTIONNAIRE-CONTROL

RECORD-CONTROL

INPUT-RECORD-SECTION

DEFINE-RECORD

WORKING-DATA-SECTION

NEW-DATA ARRAY-DATA

END-DiVISION

COMMENTS

In Version 1 all command keyword labels had to begin in column 1 ofthe coding sheet The position notation of Version 2 permits thecommand to begin in columns 1-72

The period preceding a statement

signifies that it is but a commentcard Accordingly it should be noted that while the END-DIVISION command must be present the division name identifier DICTIONARY-DIVISION

has not been implemented

Note Some parameters for each commandhave been altered even where general correspondence exists

OLD CONCOR VERSION 1 DECEMBER 1978

EDIT-SPECIFICATION

PROLOG

FILTER TYPE ( )

EPILOG

ALLOCATE (ALL) ASSERT (AST)

END ERROR

FILTER TYPE ( ) WITH

IF LET

RANGE (RNG) RECODE (REC) STOP --keyword

UPDATE (UPD) WRITE (WRT) XRECODE (XREC)

NEW CONCOR VERSION 2 DECEMBER 1979

EXECUTION-DIVISION

REPORT-CONTROL-SECTION

COUNT-IMPUTES GENERATE-EDIT-STATISTICS

EDIT-SPECIFICATION-SECTION

PROLOG-ROUTINE

FILTER-ROUTINE (name-of-record-type)

EPILOG-ROUTINE

ALLOCATE (ALLOC) ASSERT (ASRT) DRECODE (DRCD) END-DIVISION

EXIT

GRECODE (GRCD) IF LET LIST OUTPUT RANGE (RNG)

STOP UNTIL UPDATE (UPD) WRITE (WRT)

END-DIVISION

COMMENTS

Periods preceding division and section names signify that they are treated as comments by CONCOR

END-DIVISION signifies the termination of the CONCOR division organization and must always be present even though division identifiers have not been implemented

This figure additionally permits a comparison of the EDIT-SPECI-FICATION commands of both CONCOR language versions Note the changes in abbreviations and that some keyword options (not shown) have been altered even where general corresponshydence exists Individual commands are discussed on the following pages

CONCOR LANGUAGE COMMAND STATEMENTS

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

ALLOCATE (ALL) ALLOCATE (ALLOC) Allows for the assignment of values and with the UPDATE command providesthe imputation mechanism Major difference is this statement now providesfor a receiving-identifier list and statistics are generated as coded inthe GENERATE-EDIT-STATISTICS command option of the EXECUTION-DIVISION

ASSERT fAST) ASSERT (ASRT) The ASSERT command is the basic consistency editor command of CONCOR Command now provides for a NOERROR and MESSAGE option Additionallyby utilizinq the DUMPR DUMPO keywords provides the user with the option of using all the current value of all user-identifiers defined for the record type being processed Other changes to this command include the pass end-pass fail end-fail constructions for the P purpose of maintaining structured logic

(see RECODE) DRECODE (DRCD) Provides a method for recoding data items having a starting value of zero and continue in ascending sequence Replaces t previous XRECODE command

EXIT Provides the means to terminate EXECUTION (ESCAPE) of command statements within the UNTIL command

END END-IF END-THEN

A solitary END command is no lonqer permitted in conditional construction Note the similarity to COBOL pseudocode

END-ELSE

ERROR not implemented No longer supported in language

(continued)

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

FILTER TYPE ( ) WITH not implemented No longer are option statements supported in language This command once specified essentially a GOTO depending on construction

(see XRECODE) GRECODE (GRCD) This new command provides a method for the group recoding of input values

IF IF The IF statement has properties found virtually in all programming languages ie provides a means of evaluating a condition and controlling the execution of alternative logical statements Similar to pseudocode a CONCOR programmer must utilize the structured command IF THEN END-THEN ELSE END-ELSE construction in place of the IF THEN END ELSE END statements acceptable to the 1978 version

LET LET Virtually unchanged

LIST New command which allows for the generation of specific messages to be displayed in the report of edit statistics by questionnaire as well as specific individual identifiers

OUTPUT New command which provides the means by which records will be written to the output-file

(continuej)

4)bull

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 VERSION 1

RANGE (RNG)

RECODE (REC)

STOP- keyword

UPDATE (UPD)

(continued)

DECEMBER 1979 VERSION 2

RANGE (RNG)

name changed

STOP- keyword

UNTIL DO END-DO

UPDATE (UPD)

COMMENTS

As a basic editing command of CONCOR the December 1979 version permits the listing of the out-of-range values as well as the entire record or questionnaire through DUMPR and DUMPO optionswithin the command Another new option of this command involves the utilization of a PASS END-PASS FAIL END-FAIL construction permitting the specification of logicalcommand paths depending upon the outcome of the range test

Name changed to DRECODE Virtually identical with the DRC command in the COCENTS system

Unchanged

DO-END-DO one of the most significant and powerful enhancements to the CONCOR language this command provides a loopingcapability similar to most other high-level pogramminglanguages The number of iterations is under user control as well as the value initially assigned to the user identifier (counter) which is incremented with each successive execution of the command statements

Virtually unchanged it is the basic mechanism for maintaining arrays involved in imputation processes

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

WRITE (WRT) WRITE (WRT) The WRITE language statement while once utilized as the primary output command is now used to create an auxiliary or derivative file Statistics may be accumulated with the specification of the GENERATE-EDIT-STATISTICS option of the EXECUTION-DIVISION (See WRITE-FILE of DICTIONARY-DIVISION)

XRECODE (XREC) name changed Old command which has become the GRECODE command in the December 1979 version

0 0 0

DECEMBER 1978 VERSION 1

DECEMBER 1979 VERSION 2

Not implemented REPORT-DIViSION

DISPLAY-CONTROL-SECTION

DISPLAY-EDIT-STATISTICS

TOLERANCE-CONTROL-SECTION

ERROR-RATE-CHECK REJECT-FILE

END-DIVISION

COMMENTS

This new division (note division and section names treated as comments) provides the means to controlthe type of edit statistics reports to be generated Generally all reports produced by the commands ofthis division are organized according to the AREA-CONTROL command of the DICTIONARY-DIVISION Notethat this means the input data file must be preproshycessed (presorted external to CONCOR) on the basisof the AREA-CONTROL data field as CONCOR is incapableof merging uncontiguous records

There is currently no command to enable users to specify unique report headings on the listingfrom this division

APPENDIX G

ISPC FUTURE ENHANCEMENT LIST

G-1

Appendix G

Future Versions of CONCOR Design Considerations

The present version of the CONCOR system (Basic COBOL Version 2) was designed to provide adequate computing power to properly edit housing and population census da a using automatic correction techniquns As a result CONCOR should not be construed to be the ultimate software package for generalized data editing The developers of this version of CONCOR realize that there is a group of improvements that can be made to the system which would facilitate the census data editing process A second group of changes or enhancements also exist that could facilitate the editing of survey or other types of census data It should be noted that some of the modifications to meet these two objectives at times are mutually exclusive Some of the possible modifications and considerations are outlined below

1 Housing and Population Census Processing

1 1 Software

1111 Dictionary Division

(1) Implementation of the Dictionary Attributes Section This would allow the user to specify the dictionary size (number of pages) on disk and control other file characteristics such as volume specification and retention periods if applicable

(2) Addition of an optional input andor output file(s) which could be used to initialize and save values stored in arrays and used during the allocation process This feature would require additional syntax analysis and the creation of new commands to perform the movement of the data values

(3) An output record format area (Output Record Section) could be added This would allow the creation of edited output files in a different format and length than the file read into CONCOR This implementation would require other enhanceients to provide a simple means of moving values from th-iT i tato tile output file Unique naming of user-identifiers would also need to be addressed Another possiblity is the declaration of both the input and output location in the same command when specifying the contents of tile data record to be edited

(4) Addition of user-identifier naires to fields used in the controlling of summary statistics and unique questionnaire determination This would require modification to the system level data dictionary

46 G-2

(5) Increasing the number of area and questionnaire control fields

This implementation would require modification lo the

dictionary as tell as to the error records passed to the Report

Division Another problem would be the formatting of the extra

data values in the report headings

(6) Permit the mixing of difFerent data item types in the CONTROL

commands This would allow the greatest flexibilty in choosing

control fields In order to implement this enhancement the

dictionary and the generated COBOL program (EDITOR) would

require modification

(7) Give the user access to the values in the control fields during

the editing process Either the user would be prohibited from

changing these values (as is currently done with CONCOR

internal identifiers) to prevent the creation of new questionnaires or the summary information gathered on a

questionnaire basis would become meaningless Another

ramification of this enhancement is how to handle fields defined as alphanumeric as CONCOR internally works in binary

(8) Allow input data records with a similar format to be defined by

the same DEFINE-RECORD command statement This could be

accomplished by permitting multiple values in the RECORD-TYPE

clause of the DEFINE-RECORD command A possible problem here

is in the determination of which value to be move to the output

record when the record is to be written out using the OUTPUT

command 0

(9) Implement a COMMION-DATA concept to handle items command to all

record types This could be costly in terms of core storage

and additional conversion time if a storage area is allocated

for each identifier for each record type Only permitting one

common area for all records would require validation of the

data to ensure the values were exactly the same on each record

This wJould require the CONCOR program to perform many more

internal checks when reading the data file into the store area

(10) Permit the selective inclusion of data items found in the

DEFINE-RECORD command statement into the universe count used

for the determination of tolerance levels when using the COUNT-IIPUTES command statement

(11) Increase the number and types of data format referencing now

permitted External numeric data (type N) could be expanded to

handle 18 digit numbers Also signed numeric and packed

decimal data format could be added The signed numeric format

would allow both leading blanks and negative data values This

format is essential ihen dealing with data files generated by

FORTRAN programs

0

+

G-3 47

(12) Increase the number of comparison strings permitted for alphanumeric data items This would allow for easier processing of alphanumeric strings but would require modifications to the system data dictionary

(13) If a range of valid values were specified with the declaration of a user-identifier in the DEFINE-RECORD command an automatic range check could be performed prior to CONCOR giving the user access to the data items on the questionnaire The inclusion of such values are now done by many users in the form of comment statements This enhancement would also facilitate the use of the dictionary by tabulation software

(14) The language format used in the Dictionary Division could be modified to recognize both the space and comma character as valid delimiters The user would still be required to code two commas to receive the system default value for any parameter

(15) Provide a repetition factor in the initialization of arrays

(16) Print the data dictionary name in the titles of the dictionary documentation or provide a means by which the user may specify a title to appear in the listings

112 Execution Division

(M) Implement the Run Control Section This would allow the user to increase execution speed of the EDITOR program This would be accomplished by eliminating some of the system protection features such as division by zero and out of bounds checking In this secion the user could also specify new maximum values for the EDITOR error limits

(2) Provide an internal identifier (OCCURRENCE-PTR) which would contain the occurrence number of the record currently being filtered The only problem to resolve is how to treat this id-ntifier outside of the Filter Routines

(3) Implement a CREATE commnand This command would allow the user to create a record that Was not found on the input file This implementation would require special care as it gives the user the ability to change the valme of a CONCOR internal identifier (TYPE-CGUNT) and modify the contents of the store Another consequence of this command is what action is to be taken in the calculation of tolerance in terms of this new records inclusion into the universe of observations and number of changes

iK

48 G-4

(4) The code generated in the EDITOR program for the DRECODE

command should be modified to utilized a lookup table approach

This would decrease the execution time of the command but woul C

require additional communication between the GENED and GENDD

programs

(5) Permit the user to continue the message text field over

successive lines of code

(6) Implementation of a SET command command that would change

CONCORs default occurrence pointer for the duration of the SET

command This command would be ever helpful as the validation

of the existence of a record would only have to be done once

when the SET command was encounterd The major drawback to

this command is in the users ability to remember the action

taken during the last execution of the command when the

Execution Division command statements may cover many pages of

code

(7) Implement a method of specifying different data formats for

fields on the WRITE command statement This would give greater

flexibility to the user but would require major revision to the

routines now used to process the WRITE command

(8) Provide the user with two sets of internal identifiers The

first set currently exists in the system and is valid on the

basis of the current control area (if specified) being

executed The second set would be accumulated on a run total

basis and would require the creation of new CONCOR internal

ident ifiers

(9) Implement automatic array declarations for capturing allocation

frequency distributions This could be done by the system for

discrete values (especially if point number 13 above is

implemented) but a mechanism for handling continuous values

would need to be designed nother program would then be

required to format the information gathered into a readable

form

113 Report Division

(I) Provide a means by which the user may specify their own headings or titles -for the reports

(2) Give the user the ability to override page ejection as a means

of saving paper

(3) Provide a neans by which the statistics gathered may be

presented in other aqgregations other than the area-break and

total levels now pro ided This would require another

procedure to aggregate the statistics and pass them onto the

Report Division before the printing of the reports began Thi

extra procedure is required because of program size

cons i derat ions

G-5

49

114 System Level

(1) Generation of assembler (ALC) language code for the EDITOR program to be used on the IBM 360370 series type computers

(2) Modification of the GENSRC program i utilize a lookup table instead of the linear search technioa2 now used

(3) Modification of the RDMSGTXT program to look at continuation lines when searching for the text to be supplied by the system

(4) Hlodification to the system data dictionary to better arrange and make available information needed most frequently Also to examine the possibility of making the section dealing with the message text and report generation a separate file Investigate the industry standards for dictionary format and see how CONCOR meets the requirements set forth

(5) Make the parsing and syntacial routines interactive so as to increase the product ity of the CONCOR user This does not mean to imply that the editing process itself should be made interactive but rather only the development of the user code should be put online

12 Documentation

(1)The development of a self-teaching CONCOR manual

(2) The development of a CONCOR Users Guide

G-6

0

50

2 Survey or other Census Processing

21 Software

(1) Make optional the specification of the record type and questionnaire identification information as some files are not

hierarchical in nature

(2) Provide another input file as a means of doing a check-in procedure of sample cases expected This new input file would contain the master list of those oberservations selected to be

examined

(3) Allow floating point calculations

(4) Provide a mechanism for referencing repetitive sections of questionnaire This referencing (loop letter approach) could be accomplished in the same fashion as interrecord checking is now implemented Note that by using this approach two versions of the software would be required One version would be as CONCOR is now implemented and the second would accept flat data files where the subscript indicated a repetitive section on the survey document

(5) Add a weighting scheme to allow meaningful totals and summary

statistics based upon the sampling frame used to gather the data being edited

(6) Increase the number of user-identifiers allowed in the system because of the more detailed information found on survey forms

(7) Provide a means of manual correction of erroneous data items This correction scheme would utilize the data dictionary and allow for correction based upon the user-identifer name and

questionnaire identification CELADE has a COBOL version of

this program but it has not been finished nor tested

42

APPENDIX H

CONCOR SYSTEM INTERNAL VARIABLES

0 APPENO H

CONCOR SYSTEM INTERNAL VARIABLES

1 EOF-FLAG

2 ERROR-TN-QUESTIONNAIRE

3 CURRENT-POINTER-VALUE

4 TYPE-COUNT ( )

5 RECORD-COUNT

6 QUESTIONNAIRE-COUNT

7 RECORDS-IN-STORE

8 INVALID-RECORD-TYPE-COUNT

9 INCOMPLETE-FLAG

10 CONTINUATION-FLAG

11 INVALID-RECORD-FLAG

EOF-FLAG

ERRORS-IN-QUESTIONNAIRE

TYPE-COUNT ( )

IN-RECORD-COUNT

IN-QUESTIONNAIRE-COUNT

RECORDS-IN-STORE

INVALID-RECORD-TYPE-COUNT

INCOMPLETE-FLAG

CONTINUATION-FLAG

OUT-OF-RANGE

NOT-NUMBER

BLANK

(new internal counters) OUT-RECORD-COUNT OUTPUT-QUEST-COUNT WRITE-RECORD-COUNT

Change in spelling of variable

Not mentioned--possible ommission from manual

Not implemented

Note Current documentation equivocates when some variables

are reset--most are initialized with each control area break

Though implemented as reserved identifiers due to poor documentation of (old) CONCOR exact values and utility were unknown

These apparently new counters permit access to valuable totals within control areas

Note These are not cumulative counters Also outputshyquest-count violates naming conversion eg in out

APPENDIX I

CONCOR-EDITOR EXECUTION STATISTICS

C 0 N C o R - E D I T O R EXECUT ION S-TA-T1 ST I CS RUN DATEt 517I

APPENDIX I

CONTROL AREA 0 INPUTS RELD OUTPUTS BY OUTPUTCOHMAND OUTPUTS BY W0ITECOMP4JD NO VALI 0 SEQUENCE NUMBER 00000 00 0 RECTYPES 0

OUESTIONNAIRES RECORDS 0 QUEST KEYWORD- 0 ECORDS- RECORDS FORQUFSTshy

bullDIVISION BY ZERO LINE NUMBER i18 to DIVISION BY ZERO LINE NUMBER 118 bo DIVISION BY ZERO 00 LINE NUMBER 118

Oboo DIVISION BY ZERO 00 LINE NUMBER 118 LTNE |UJMP 000 DIVISION BY ZERO 000 LINE NUMBER 118

bo DIVISION BY ZERO 0 LINE NUMBER 118 00 RUN ABORTED DUE TO DIVISION BY ZERO ERRORS 00 1it LIST IR003 = Ro3t

1 6 42

112 LIST

Comments One of the most sever criticisms 114 LIST IDIVISIION BY ZErO ERROR Cw FOLLOWS of the old CONCOR package was that ]l LET RO01 003 = O it would cease processing for no

apparent reason--ABEND without generating any messages The most )16 LIST iRO01 R003 SET TO ZEROS Pou19 Ron3 conmon types of cessations are division by zero array coordinate 117 LIST RO0 0 = errors and user specified terminations A program to deliberately 118 LET P003 = R002 RO01 test CONCORS ability to providediagnostics was written The system 119 LIST LIST R003 R002RO01 =1 R0031 perfoned exceptionally well and correctly provided the line numbers in the 120 source program which caused the data exceptions The program text has been 121 superimposed upon the Execution Statistics page for illustration purposes 122 END TEST OF OPERANDS AND MSG

123

124

1 5 PEGISTERS RESET TO ZERO

125 LET R001 RO00 R003 = nt

0 N C 1 - Z D I T 0 it - ( r S T A T I S T I C S QJi DAT 0119

I ~ - ~APPENDIX b)APPENDIX TL I A Cr14A) 01 3TUTS JRtTF -0 VtLIQ AP-A I PUT R - 3ITD JTS LY UTPUT - Y C

I)ESTIjAAIES - PECLRW cU] I S T igry-t QEC UP )S irrkqS F] ) T

VLLII= FF= 15( C 1 u U-1 Tr JJ TI A)[ 9 )i 1= TI =WATE ) L i - U = VI L IJ) L T J3r-= I n

D 1 4A 1 EItR P IXSTI J A 1 L = 7 II I fIAT E I i Y C

= y 4 I 19f CZ U AT -Z-2j 1ESTIJh14 E CJ)PI) IATE= 2 V L 1E= L T I P1 =UM 11

= rU F r

F R L- -- I JE TI rjA OrPR eTE= rUMiF=E II V1 LIIF= LIT V M

tYfZU ttj gR R QUIESTI3Ii1 j Vi r IATE= LI n l FJw== 0R V J L T C ( -pJIT E~Q - cUESTT Y2ZjI 1= g rfiflUI)IATL= VeLUJE= P LIf PP= 1 3)

bull~ - A ED 011 1 Tn CCy -tENCE C rGR DI T FOR UP S )r 0 JCCUJ0EICF FJ71 ECGPO

173

175 UNTIL ROO1 gt 10 VARY P001 FROM 1 By Comments

In this example RO01 a 2 dimensional array 177 DO with 10 rows and 10 columns is attempting

178 UNTIL R002 - 10 VARY BO FROM 1 BY 1 to update the smaller (5 X 5) TA2 Nonshy- Uexistent element references caused the error

The program text has been superimposed on this

page for illustration purposes

180 ALLOCATE NOOP = TA1 (RO0] ROO)I

JAI UPDATE TA2 (RO01 R002) RO031

182 LIST N002 = R0037 NOO R003

1P3 LIST VALUES OF TA1 = RO03v TAl (RO01 R002)

184 LIST

185 LIST ITA2=TA2 RO01 Rn02)l

l6 END-DOI LIST

187 END-DO

deg r- -- shy

APPENJDIXI1 J( j

CONTROL AREA INPUTS READ OUTPUTS BY OUTPUT LOMANO OUTPUTS TEltCOMMNDfN

SEQUENCE NUMBER 11T A QUESTIONNAIRES RECORDS QUEST KEYWORD RECORDS SFf- 0 T rr-

JOB STOPPED DUETO USER STOP-RUN COMMAND ON LINC 215 - - -- 2 15 _ _ 4$ A

EDITOR -- NO RMAL END OF J B r T I1)

214 IF IN-RECORD-COUNT gt Comments In this example when the internal variable1215 ITHEN STOP-RUN END-THEN was than

215 TE END-THEN IN-RECORD-COUNT was greater than 500 processing was directed by the source COBOL CONCOR language

216 1 program to cease The program text has been - upon this page for illustrationI _-EC superimposed

S IF IN-RECORD-COUNT- lt 2 - - purposes ~~~~~~~~~~- bull - - ------I-

218~ THEN -LIST lt(THAN 20RECS INR

amp-~~~~~ ntai-beDcent

I

APPENDIX J

DIAGNOSTIC MEFSAGE GUIDE EXAMPLE

APPENDIX J DD-2

WARNING(DD-001) BLANK COMMAND STATEMENT LINE

EXPLANATION A blank command statement line was found in the user Dictionary Division command statement file

ACTION TAKEN Parsing continued with the next command statement line in the Dictionary Division command file

USER RESPONSE Put a comment indicator () in column 1 of the command statement line if spacing is desired if not remove the command statement line

ERROR (DD-002) ILLEGAL FIRST CHARACTER IN STRING

EXPLANATION A character not in the CONCOR legal character set (A-Z 0-9 - + the comma character () the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found as the first character in the string

ACTION TAKEN Parsing began again with the next string

USER RESPONSE Check for possible keying errors and ensure that the first character is a member of the valid CONCOR

character set

ERROR (DD-003) END OF LINE REACHED WITH NO DELIMITER FOR ILLEGAL STRING

EXPLANATION A string starting with a character not valid in the CONCOR legal character set (A-Z 0-9 - + the comma character ()p the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found without one of the delimiter characters ( =) when the end of the command line was reached

ACTION TAKEN Parsing began again with the next user command statement line in the Dictionary Division command file

USER RESPONSE Check for possible keying errors and ensure that each of the characters in the string is a member of the CONCOR valid character set (A-Z 0-9 - + the comma character () he delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ())

ERROR (DD-004) COMMENT INDICATOR () IN STRING

EXPLANATION The comment indicator character () was found embedded in a string

Page 26: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …

3

Thursday January 10

930 am-1030 Discussion of Problems - If115 pm-215

Free work time - UntilExit - Stop

1030 - 1045 Break 215 - 230 Break

1045 shy 1200 Edit Specification Command Statements 230 - 325 - Drecode

(continued) - Grecode

- Allocate - Update - Let

1200 - 115pm Break

Friday January 11

930 am-1030 Edit Specification 115 pm-3 25 Hand out Edit Specishyfications as problemCommand Statements

(continued) Free work time

- Output - Write

1030 - 1045 Break

1045 - 1200 Report Division Command Statements - Display-Control-

Section -Display-Edit-Statistics

- Tolerance-Control-Section -Error-Rate-Check -Reject-File

-Report Examples

1200 - 115 pm Break

I

29

Monday January 14

930 am-1030 Discuss procedures for running problems on computer

1030-1045 Break

1045-1200 Component Programs of the CONCOR system

1200- 115 pm Break

Tuesday January 15

930 am - 325 pm Free work time

Wednesday January 16

930 am 1200 Free work time

1200- 115 pm Break

115 pm-215 How to Install CONCOR on IBM 360370 OS

215- 230 Break

230-325 Free work time

Thursday January 17

930 am-325 Free work time

115 pm-215 Future CONCOR Design Considerations - for census processing - for survey processing

- manual correction system

215- 230 Break

230 - 245 Evaluation Guidelines

- Hand out evaluation forms

245 - 325 Free work time

Friday January 18

930 am-1030 Free work time 115-325 Free work time

1030 - 1045 Break

1045 - 1200 Group Discussion on CONCOR - submission of written evaluations Distribution of IBM installation tapes Presentation of Certificates to Participants

1200-115 pm Break

APPENDIX D

PARTICIPANTS

APPENDIX D

CONCOR WORKSHOP January 7 - 18 1980 Marlow Heights MD

PARTICIPANTS

KENYA

James Midianga Government Computer Centre Central Bureau of Statistics PO Box 30266 Nairobi Kenya

PANAMA

Omar Rivera Rios Data Processing Specialist Contraloria General de la Republica de Panama Apartado Postal 5213 Panama 5 Panama

PHILIPPINES Guida T Capellan OIC Computer Programming Division National Census and Statistics Office Solicarel Bldg 1 Magsaysay Blvd

Sta Mesa Manila Philippines

THAILAND

Angsumal Sunalai National Statistical Office Bangkok 1 Thailand

EGYPT

Farag Sedky Mourad Ghaleb CAPMAS (Centeral Agency for Public

Mobilization and Statistics)NASR City Cairo Egypt

L

2 31

SAUDI ARABIA

Shargi R Al-Shargi National Computer Center PO Box 2534 Riyadh Saudi Arabia

Ahmed S Al-Ghamdi Dept Director of National Computer Center PO Box 2534 Riyadh Saudi Arabia

OTHER

Robert W ONeal USREPJECOR APO New York NY 09038

John N Adams Data Processing Advisor DACCA Department of State Washington DC 20520

OR co UNDP PO Box 224 Dacca Bangladesh

Joe Quasney US Census Bureau Washington DC 20233

Howard Brunsman 5715 N Ninth St Arlington VA 22205

Larry Shiller and Julio Ortuzar Delta Systems Consultants Inc 264 Alhambra Circle Coral Cables FL 33134

Nicholas Ourusoff Burpee Hill Road New London NH 03257 (United Nations)

Frederic J Grant IV Systems Development GeorgiaWorld Congress Institute 580 North Omni International Atlanta Georgia 30303

3 32

Rafael Samper Mary Jo Keenan Catherine B Gleason LACDR Room 2246 NS Washington DC 20523

John Marshall AIDSERDMPSE Rm 706C SA-12 Washington DC 20523

Susanne Bacon and Mary K Friday ISPCData Processing US Census Bureau Washington DC 20233

Leo Dougherty ISPCWorld Census Staff US Bureau of the Census Washington DC 20233

33

CON C OR WORKSHOP

January 7-18 1979 Washington DC

Staff

Robert R Bair

Luis Garcia

David Malkovsky

Sandra Mansfield

Selma Sawaya

Vivian Toro

APPENDIX E

CONCOR EVALUATION FORM

0

APPENDIX E

CONCOR WORKSHOP January 16 1980

BASIC COBOL VERSION 2

December 1979 Release

Participant Evaluation of Package

1 What is your impression of this version of CONCOR in terms of its usefulness and reliability for editing housing and population census data in less developed countries

2 If you are familiar with previous versions of CONCOR how does this current version compare to them

2D

35

2

3 How would you compare the utility of this version of CONCOR for less developed countries with other systems for editing data of which you are familiar

4 How well do you feel the documentation (Reference Manual and Diagnostic Messages Guide) supports the software

36

3

5 How well do you think this version of CONCOR would serve for editing other types of statistical censuses or surveys

6 If your organization does statistical data processing by computer would you recommend to your superiors that this software be used to edit the data Do you feel assistance would be required in the installation of this software on your organizations computer andor in training userR on the packages applicability to their work

37

7 In what way(s) could improvements or enhancements be made to this version

of the CONCOR software andor its documentation

8 Other comments

1shy

APPENDIX F

NEWOLD COMMAND COMPARISONS

0 0

OLD CONCOR VERSION 1 DECEMBER 1978

DATA-DICTIONARY

DEFINE INPUT-FILE

DEFINE ERROR-FILE

DEFINE COMMON-DATA

(contains the questionnaire-IDand record type location information)

DEFINE RECORD-TYPE=

DEFINE NEW-DATA

DEFINE ARRAY-DATA

APPENDIX F

NEW CONCOR VERSION 2 DECEMBER 1979

DICTIONARY-DIVISION

DICTIONARY-ATTRIBUTES-SECTION

DICTIONARY-NAME

FILE-SECTION

INPUT-FILE OUTPUT-FILE WRITE-FILE ERROR-FILE

DOENTIFICATION-CONTROL-SECTION

AREA-CONTROL

QUESTIONNAIRE-CONTROL

RECORD-CONTROL

INPUT-RECORD-SECTION

DEFINE-RECORD

WORKING-DATA-SECTION

NEW-DATA ARRAY-DATA

END-DiVISION

COMMENTS

In Version 1 all command keyword labels had to begin in column 1 ofthe coding sheet The position notation of Version 2 permits thecommand to begin in columns 1-72

The period preceding a statement

signifies that it is but a commentcard Accordingly it should be noted that while the END-DIVISION command must be present the division name identifier DICTIONARY-DIVISION

has not been implemented

Note Some parameters for each commandhave been altered even where general correspondence exists

OLD CONCOR VERSION 1 DECEMBER 1978

EDIT-SPECIFICATION

PROLOG

FILTER TYPE ( )

EPILOG

ALLOCATE (ALL) ASSERT (AST)

END ERROR

FILTER TYPE ( ) WITH

IF LET

RANGE (RNG) RECODE (REC) STOP --keyword

UPDATE (UPD) WRITE (WRT) XRECODE (XREC)

NEW CONCOR VERSION 2 DECEMBER 1979

EXECUTION-DIVISION

REPORT-CONTROL-SECTION

COUNT-IMPUTES GENERATE-EDIT-STATISTICS

EDIT-SPECIFICATION-SECTION

PROLOG-ROUTINE

FILTER-ROUTINE (name-of-record-type)

EPILOG-ROUTINE

ALLOCATE (ALLOC) ASSERT (ASRT) DRECODE (DRCD) END-DIVISION

EXIT

GRECODE (GRCD) IF LET LIST OUTPUT RANGE (RNG)

STOP UNTIL UPDATE (UPD) WRITE (WRT)

END-DIVISION

COMMENTS

Periods preceding division and section names signify that they are treated as comments by CONCOR

END-DIVISION signifies the termination of the CONCOR division organization and must always be present even though division identifiers have not been implemented

This figure additionally permits a comparison of the EDIT-SPECI-FICATION commands of both CONCOR language versions Note the changes in abbreviations and that some keyword options (not shown) have been altered even where general corresponshydence exists Individual commands are discussed on the following pages

CONCOR LANGUAGE COMMAND STATEMENTS

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

ALLOCATE (ALL) ALLOCATE (ALLOC) Allows for the assignment of values and with the UPDATE command providesthe imputation mechanism Major difference is this statement now providesfor a receiving-identifier list and statistics are generated as coded inthe GENERATE-EDIT-STATISTICS command option of the EXECUTION-DIVISION

ASSERT fAST) ASSERT (ASRT) The ASSERT command is the basic consistency editor command of CONCOR Command now provides for a NOERROR and MESSAGE option Additionallyby utilizinq the DUMPR DUMPO keywords provides the user with the option of using all the current value of all user-identifiers defined for the record type being processed Other changes to this command include the pass end-pass fail end-fail constructions for the P purpose of maintaining structured logic

(see RECODE) DRECODE (DRCD) Provides a method for recoding data items having a starting value of zero and continue in ascending sequence Replaces t previous XRECODE command

EXIT Provides the means to terminate EXECUTION (ESCAPE) of command statements within the UNTIL command

END END-IF END-THEN

A solitary END command is no lonqer permitted in conditional construction Note the similarity to COBOL pseudocode

END-ELSE

ERROR not implemented No longer supported in language

(continued)

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

FILTER TYPE ( ) WITH not implemented No longer are option statements supported in language This command once specified essentially a GOTO depending on construction

(see XRECODE) GRECODE (GRCD) This new command provides a method for the group recoding of input values

IF IF The IF statement has properties found virtually in all programming languages ie provides a means of evaluating a condition and controlling the execution of alternative logical statements Similar to pseudocode a CONCOR programmer must utilize the structured command IF THEN END-THEN ELSE END-ELSE construction in place of the IF THEN END ELSE END statements acceptable to the 1978 version

LET LET Virtually unchanged

LIST New command which allows for the generation of specific messages to be displayed in the report of edit statistics by questionnaire as well as specific individual identifiers

OUTPUT New command which provides the means by which records will be written to the output-file

(continuej)

4)bull

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 VERSION 1

RANGE (RNG)

RECODE (REC)

STOP- keyword

UPDATE (UPD)

(continued)

DECEMBER 1979 VERSION 2

RANGE (RNG)

name changed

STOP- keyword

UNTIL DO END-DO

UPDATE (UPD)

COMMENTS

As a basic editing command of CONCOR the December 1979 version permits the listing of the out-of-range values as well as the entire record or questionnaire through DUMPR and DUMPO optionswithin the command Another new option of this command involves the utilization of a PASS END-PASS FAIL END-FAIL construction permitting the specification of logicalcommand paths depending upon the outcome of the range test

Name changed to DRECODE Virtually identical with the DRC command in the COCENTS system

Unchanged

DO-END-DO one of the most significant and powerful enhancements to the CONCOR language this command provides a loopingcapability similar to most other high-level pogramminglanguages The number of iterations is under user control as well as the value initially assigned to the user identifier (counter) which is incremented with each successive execution of the command statements

Virtually unchanged it is the basic mechanism for maintaining arrays involved in imputation processes

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

WRITE (WRT) WRITE (WRT) The WRITE language statement while once utilized as the primary output command is now used to create an auxiliary or derivative file Statistics may be accumulated with the specification of the GENERATE-EDIT-STATISTICS option of the EXECUTION-DIVISION (See WRITE-FILE of DICTIONARY-DIVISION)

XRECODE (XREC) name changed Old command which has become the GRECODE command in the December 1979 version

0 0 0

DECEMBER 1978 VERSION 1

DECEMBER 1979 VERSION 2

Not implemented REPORT-DIViSION

DISPLAY-CONTROL-SECTION

DISPLAY-EDIT-STATISTICS

TOLERANCE-CONTROL-SECTION

ERROR-RATE-CHECK REJECT-FILE

END-DIVISION

COMMENTS

This new division (note division and section names treated as comments) provides the means to controlthe type of edit statistics reports to be generated Generally all reports produced by the commands ofthis division are organized according to the AREA-CONTROL command of the DICTIONARY-DIVISION Notethat this means the input data file must be preproshycessed (presorted external to CONCOR) on the basisof the AREA-CONTROL data field as CONCOR is incapableof merging uncontiguous records

There is currently no command to enable users to specify unique report headings on the listingfrom this division

APPENDIX G

ISPC FUTURE ENHANCEMENT LIST

G-1

Appendix G

Future Versions of CONCOR Design Considerations

The present version of the CONCOR system (Basic COBOL Version 2) was designed to provide adequate computing power to properly edit housing and population census da a using automatic correction techniquns As a result CONCOR should not be construed to be the ultimate software package for generalized data editing The developers of this version of CONCOR realize that there is a group of improvements that can be made to the system which would facilitate the census data editing process A second group of changes or enhancements also exist that could facilitate the editing of survey or other types of census data It should be noted that some of the modifications to meet these two objectives at times are mutually exclusive Some of the possible modifications and considerations are outlined below

1 Housing and Population Census Processing

1 1 Software

1111 Dictionary Division

(1) Implementation of the Dictionary Attributes Section This would allow the user to specify the dictionary size (number of pages) on disk and control other file characteristics such as volume specification and retention periods if applicable

(2) Addition of an optional input andor output file(s) which could be used to initialize and save values stored in arrays and used during the allocation process This feature would require additional syntax analysis and the creation of new commands to perform the movement of the data values

(3) An output record format area (Output Record Section) could be added This would allow the creation of edited output files in a different format and length than the file read into CONCOR This implementation would require other enhanceients to provide a simple means of moving values from th-iT i tato tile output file Unique naming of user-identifiers would also need to be addressed Another possiblity is the declaration of both the input and output location in the same command when specifying the contents of tile data record to be edited

(4) Addition of user-identifier naires to fields used in the controlling of summary statistics and unique questionnaire determination This would require modification to the system level data dictionary

46 G-2

(5) Increasing the number of area and questionnaire control fields

This implementation would require modification lo the

dictionary as tell as to the error records passed to the Report

Division Another problem would be the formatting of the extra

data values in the report headings

(6) Permit the mixing of difFerent data item types in the CONTROL

commands This would allow the greatest flexibilty in choosing

control fields In order to implement this enhancement the

dictionary and the generated COBOL program (EDITOR) would

require modification

(7) Give the user access to the values in the control fields during

the editing process Either the user would be prohibited from

changing these values (as is currently done with CONCOR

internal identifiers) to prevent the creation of new questionnaires or the summary information gathered on a

questionnaire basis would become meaningless Another

ramification of this enhancement is how to handle fields defined as alphanumeric as CONCOR internally works in binary

(8) Allow input data records with a similar format to be defined by

the same DEFINE-RECORD command statement This could be

accomplished by permitting multiple values in the RECORD-TYPE

clause of the DEFINE-RECORD command A possible problem here

is in the determination of which value to be move to the output

record when the record is to be written out using the OUTPUT

command 0

(9) Implement a COMMION-DATA concept to handle items command to all

record types This could be costly in terms of core storage

and additional conversion time if a storage area is allocated

for each identifier for each record type Only permitting one

common area for all records would require validation of the

data to ensure the values were exactly the same on each record

This wJould require the CONCOR program to perform many more

internal checks when reading the data file into the store area

(10) Permit the selective inclusion of data items found in the

DEFINE-RECORD command statement into the universe count used

for the determination of tolerance levels when using the COUNT-IIPUTES command statement

(11) Increase the number and types of data format referencing now

permitted External numeric data (type N) could be expanded to

handle 18 digit numbers Also signed numeric and packed

decimal data format could be added The signed numeric format

would allow both leading blanks and negative data values This

format is essential ihen dealing with data files generated by

FORTRAN programs

0

+

G-3 47

(12) Increase the number of comparison strings permitted for alphanumeric data items This would allow for easier processing of alphanumeric strings but would require modifications to the system data dictionary

(13) If a range of valid values were specified with the declaration of a user-identifier in the DEFINE-RECORD command an automatic range check could be performed prior to CONCOR giving the user access to the data items on the questionnaire The inclusion of such values are now done by many users in the form of comment statements This enhancement would also facilitate the use of the dictionary by tabulation software

(14) The language format used in the Dictionary Division could be modified to recognize both the space and comma character as valid delimiters The user would still be required to code two commas to receive the system default value for any parameter

(15) Provide a repetition factor in the initialization of arrays

(16) Print the data dictionary name in the titles of the dictionary documentation or provide a means by which the user may specify a title to appear in the listings

112 Execution Division

(M) Implement the Run Control Section This would allow the user to increase execution speed of the EDITOR program This would be accomplished by eliminating some of the system protection features such as division by zero and out of bounds checking In this secion the user could also specify new maximum values for the EDITOR error limits

(2) Provide an internal identifier (OCCURRENCE-PTR) which would contain the occurrence number of the record currently being filtered The only problem to resolve is how to treat this id-ntifier outside of the Filter Routines

(3) Implement a CREATE commnand This command would allow the user to create a record that Was not found on the input file This implementation would require special care as it gives the user the ability to change the valme of a CONCOR internal identifier (TYPE-CGUNT) and modify the contents of the store Another consequence of this command is what action is to be taken in the calculation of tolerance in terms of this new records inclusion into the universe of observations and number of changes

iK

48 G-4

(4) The code generated in the EDITOR program for the DRECODE

command should be modified to utilized a lookup table approach

This would decrease the execution time of the command but woul C

require additional communication between the GENED and GENDD

programs

(5) Permit the user to continue the message text field over

successive lines of code

(6) Implementation of a SET command command that would change

CONCORs default occurrence pointer for the duration of the SET

command This command would be ever helpful as the validation

of the existence of a record would only have to be done once

when the SET command was encounterd The major drawback to

this command is in the users ability to remember the action

taken during the last execution of the command when the

Execution Division command statements may cover many pages of

code

(7) Implement a method of specifying different data formats for

fields on the WRITE command statement This would give greater

flexibility to the user but would require major revision to the

routines now used to process the WRITE command

(8) Provide the user with two sets of internal identifiers The

first set currently exists in the system and is valid on the

basis of the current control area (if specified) being

executed The second set would be accumulated on a run total

basis and would require the creation of new CONCOR internal

ident ifiers

(9) Implement automatic array declarations for capturing allocation

frequency distributions This could be done by the system for

discrete values (especially if point number 13 above is

implemented) but a mechanism for handling continuous values

would need to be designed nother program would then be

required to format the information gathered into a readable

form

113 Report Division

(I) Provide a means by which the user may specify their own headings or titles -for the reports

(2) Give the user the ability to override page ejection as a means

of saving paper

(3) Provide a neans by which the statistics gathered may be

presented in other aqgregations other than the area-break and

total levels now pro ided This would require another

procedure to aggregate the statistics and pass them onto the

Report Division before the printing of the reports began Thi

extra procedure is required because of program size

cons i derat ions

G-5

49

114 System Level

(1) Generation of assembler (ALC) language code for the EDITOR program to be used on the IBM 360370 series type computers

(2) Modification of the GENSRC program i utilize a lookup table instead of the linear search technioa2 now used

(3) Modification of the RDMSGTXT program to look at continuation lines when searching for the text to be supplied by the system

(4) Hlodification to the system data dictionary to better arrange and make available information needed most frequently Also to examine the possibility of making the section dealing with the message text and report generation a separate file Investigate the industry standards for dictionary format and see how CONCOR meets the requirements set forth

(5) Make the parsing and syntacial routines interactive so as to increase the product ity of the CONCOR user This does not mean to imply that the editing process itself should be made interactive but rather only the development of the user code should be put online

12 Documentation

(1)The development of a self-teaching CONCOR manual

(2) The development of a CONCOR Users Guide

G-6

0

50

2 Survey or other Census Processing

21 Software

(1) Make optional the specification of the record type and questionnaire identification information as some files are not

hierarchical in nature

(2) Provide another input file as a means of doing a check-in procedure of sample cases expected This new input file would contain the master list of those oberservations selected to be

examined

(3) Allow floating point calculations

(4) Provide a mechanism for referencing repetitive sections of questionnaire This referencing (loop letter approach) could be accomplished in the same fashion as interrecord checking is now implemented Note that by using this approach two versions of the software would be required One version would be as CONCOR is now implemented and the second would accept flat data files where the subscript indicated a repetitive section on the survey document

(5) Add a weighting scheme to allow meaningful totals and summary

statistics based upon the sampling frame used to gather the data being edited

(6) Increase the number of user-identifiers allowed in the system because of the more detailed information found on survey forms

(7) Provide a means of manual correction of erroneous data items This correction scheme would utilize the data dictionary and allow for correction based upon the user-identifer name and

questionnaire identification CELADE has a COBOL version of

this program but it has not been finished nor tested

42

APPENDIX H

CONCOR SYSTEM INTERNAL VARIABLES

0 APPENO H

CONCOR SYSTEM INTERNAL VARIABLES

1 EOF-FLAG

2 ERROR-TN-QUESTIONNAIRE

3 CURRENT-POINTER-VALUE

4 TYPE-COUNT ( )

5 RECORD-COUNT

6 QUESTIONNAIRE-COUNT

7 RECORDS-IN-STORE

8 INVALID-RECORD-TYPE-COUNT

9 INCOMPLETE-FLAG

10 CONTINUATION-FLAG

11 INVALID-RECORD-FLAG

EOF-FLAG

ERRORS-IN-QUESTIONNAIRE

TYPE-COUNT ( )

IN-RECORD-COUNT

IN-QUESTIONNAIRE-COUNT

RECORDS-IN-STORE

INVALID-RECORD-TYPE-COUNT

INCOMPLETE-FLAG

CONTINUATION-FLAG

OUT-OF-RANGE

NOT-NUMBER

BLANK

(new internal counters) OUT-RECORD-COUNT OUTPUT-QUEST-COUNT WRITE-RECORD-COUNT

Change in spelling of variable

Not mentioned--possible ommission from manual

Not implemented

Note Current documentation equivocates when some variables

are reset--most are initialized with each control area break

Though implemented as reserved identifiers due to poor documentation of (old) CONCOR exact values and utility were unknown

These apparently new counters permit access to valuable totals within control areas

Note These are not cumulative counters Also outputshyquest-count violates naming conversion eg in out

APPENDIX I

CONCOR-EDITOR EXECUTION STATISTICS

C 0 N C o R - E D I T O R EXECUT ION S-TA-T1 ST I CS RUN DATEt 517I

APPENDIX I

CONTROL AREA 0 INPUTS RELD OUTPUTS BY OUTPUTCOHMAND OUTPUTS BY W0ITECOMP4JD NO VALI 0 SEQUENCE NUMBER 00000 00 0 RECTYPES 0

OUESTIONNAIRES RECORDS 0 QUEST KEYWORD- 0 ECORDS- RECORDS FORQUFSTshy

bullDIVISION BY ZERO LINE NUMBER i18 to DIVISION BY ZERO LINE NUMBER 118 bo DIVISION BY ZERO 00 LINE NUMBER 118

Oboo DIVISION BY ZERO 00 LINE NUMBER 118 LTNE |UJMP 000 DIVISION BY ZERO 000 LINE NUMBER 118

bo DIVISION BY ZERO 0 LINE NUMBER 118 00 RUN ABORTED DUE TO DIVISION BY ZERO ERRORS 00 1it LIST IR003 = Ro3t

1 6 42

112 LIST

Comments One of the most sever criticisms 114 LIST IDIVISIION BY ZErO ERROR Cw FOLLOWS of the old CONCOR package was that ]l LET RO01 003 = O it would cease processing for no

apparent reason--ABEND without generating any messages The most )16 LIST iRO01 R003 SET TO ZEROS Pou19 Ron3 conmon types of cessations are division by zero array coordinate 117 LIST RO0 0 = errors and user specified terminations A program to deliberately 118 LET P003 = R002 RO01 test CONCORS ability to providediagnostics was written The system 119 LIST LIST R003 R002RO01 =1 R0031 perfoned exceptionally well and correctly provided the line numbers in the 120 source program which caused the data exceptions The program text has been 121 superimposed upon the Execution Statistics page for illustration purposes 122 END TEST OF OPERANDS AND MSG

123

124

1 5 PEGISTERS RESET TO ZERO

125 LET R001 RO00 R003 = nt

0 N C 1 - Z D I T 0 it - ( r S T A T I S T I C S QJi DAT 0119

I ~ - ~APPENDIX b)APPENDIX TL I A Cr14A) 01 3TUTS JRtTF -0 VtLIQ AP-A I PUT R - 3ITD JTS LY UTPUT - Y C

I)ESTIjAAIES - PECLRW cU] I S T igry-t QEC UP )S irrkqS F] ) T

VLLII= FF= 15( C 1 u U-1 Tr JJ TI A)[ 9 )i 1= TI =WATE ) L i - U = VI L IJ) L T J3r-= I n

D 1 4A 1 EItR P IXSTI J A 1 L = 7 II I fIAT E I i Y C

= y 4 I 19f CZ U AT -Z-2j 1ESTIJh14 E CJ)PI) IATE= 2 V L 1E= L T I P1 =UM 11

= rU F r

F R L- -- I JE TI rjA OrPR eTE= rUMiF=E II V1 LIIF= LIT V M

tYfZU ttj gR R QUIESTI3Ii1 j Vi r IATE= LI n l FJw== 0R V J L T C ( -pJIT E~Q - cUESTT Y2ZjI 1= g rfiflUI)IATL= VeLUJE= P LIf PP= 1 3)

bull~ - A ED 011 1 Tn CCy -tENCE C rGR DI T FOR UP S )r 0 JCCUJ0EICF FJ71 ECGPO

173

175 UNTIL ROO1 gt 10 VARY P001 FROM 1 By Comments

In this example RO01 a 2 dimensional array 177 DO with 10 rows and 10 columns is attempting

178 UNTIL R002 - 10 VARY BO FROM 1 BY 1 to update the smaller (5 X 5) TA2 Nonshy- Uexistent element references caused the error

The program text has been superimposed on this

page for illustration purposes

180 ALLOCATE NOOP = TA1 (RO0] ROO)I

JAI UPDATE TA2 (RO01 R002) RO031

182 LIST N002 = R0037 NOO R003

1P3 LIST VALUES OF TA1 = RO03v TAl (RO01 R002)

184 LIST

185 LIST ITA2=TA2 RO01 Rn02)l

l6 END-DOI LIST

187 END-DO

deg r- -- shy

APPENJDIXI1 J( j

CONTROL AREA INPUTS READ OUTPUTS BY OUTPUT LOMANO OUTPUTS TEltCOMMNDfN

SEQUENCE NUMBER 11T A QUESTIONNAIRES RECORDS QUEST KEYWORD RECORDS SFf- 0 T rr-

JOB STOPPED DUETO USER STOP-RUN COMMAND ON LINC 215 - - -- 2 15 _ _ 4$ A

EDITOR -- NO RMAL END OF J B r T I1)

214 IF IN-RECORD-COUNT gt Comments In this example when the internal variable1215 ITHEN STOP-RUN END-THEN was than

215 TE END-THEN IN-RECORD-COUNT was greater than 500 processing was directed by the source COBOL CONCOR language

216 1 program to cease The program text has been - upon this page for illustrationI _-EC superimposed

S IF IN-RECORD-COUNT- lt 2 - - purposes ~~~~~~~~~~- bull - - ------I-

218~ THEN -LIST lt(THAN 20RECS INR

amp-~~~~~ ntai-beDcent

I

APPENDIX J

DIAGNOSTIC MEFSAGE GUIDE EXAMPLE

APPENDIX J DD-2

WARNING(DD-001) BLANK COMMAND STATEMENT LINE

EXPLANATION A blank command statement line was found in the user Dictionary Division command statement file

ACTION TAKEN Parsing continued with the next command statement line in the Dictionary Division command file

USER RESPONSE Put a comment indicator () in column 1 of the command statement line if spacing is desired if not remove the command statement line

ERROR (DD-002) ILLEGAL FIRST CHARACTER IN STRING

EXPLANATION A character not in the CONCOR legal character set (A-Z 0-9 - + the comma character () the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found as the first character in the string

ACTION TAKEN Parsing began again with the next string

USER RESPONSE Check for possible keying errors and ensure that the first character is a member of the valid CONCOR

character set

ERROR (DD-003) END OF LINE REACHED WITH NO DELIMITER FOR ILLEGAL STRING

EXPLANATION A string starting with a character not valid in the CONCOR legal character set (A-Z 0-9 - + the comma character ()p the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found without one of the delimiter characters ( =) when the end of the command line was reached

ACTION TAKEN Parsing began again with the next user command statement line in the Dictionary Division command file

USER RESPONSE Check for possible keying errors and ensure that each of the characters in the string is a member of the CONCOR valid character set (A-Z 0-9 - + the comma character () he delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ())

ERROR (DD-004) COMMENT INDICATOR () IN STRING

EXPLANATION The comment indicator character () was found embedded in a string

Page 27: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …

29

Monday January 14

930 am-1030 Discuss procedures for running problems on computer

1030-1045 Break

1045-1200 Component Programs of the CONCOR system

1200- 115 pm Break

Tuesday January 15

930 am - 325 pm Free work time

Wednesday January 16

930 am 1200 Free work time

1200- 115 pm Break

115 pm-215 How to Install CONCOR on IBM 360370 OS

215- 230 Break

230-325 Free work time

Thursday January 17

930 am-325 Free work time

115 pm-215 Future CONCOR Design Considerations - for census processing - for survey processing

- manual correction system

215- 230 Break

230 - 245 Evaluation Guidelines

- Hand out evaluation forms

245 - 325 Free work time

Friday January 18

930 am-1030 Free work time 115-325 Free work time

1030 - 1045 Break

1045 - 1200 Group Discussion on CONCOR - submission of written evaluations Distribution of IBM installation tapes Presentation of Certificates to Participants

1200-115 pm Break

APPENDIX D

PARTICIPANTS

APPENDIX D

CONCOR WORKSHOP January 7 - 18 1980 Marlow Heights MD

PARTICIPANTS

KENYA

James Midianga Government Computer Centre Central Bureau of Statistics PO Box 30266 Nairobi Kenya

PANAMA

Omar Rivera Rios Data Processing Specialist Contraloria General de la Republica de Panama Apartado Postal 5213 Panama 5 Panama

PHILIPPINES Guida T Capellan OIC Computer Programming Division National Census and Statistics Office Solicarel Bldg 1 Magsaysay Blvd

Sta Mesa Manila Philippines

THAILAND

Angsumal Sunalai National Statistical Office Bangkok 1 Thailand

EGYPT

Farag Sedky Mourad Ghaleb CAPMAS (Centeral Agency for Public

Mobilization and Statistics)NASR City Cairo Egypt

L

2 31

SAUDI ARABIA

Shargi R Al-Shargi National Computer Center PO Box 2534 Riyadh Saudi Arabia

Ahmed S Al-Ghamdi Dept Director of National Computer Center PO Box 2534 Riyadh Saudi Arabia

OTHER

Robert W ONeal USREPJECOR APO New York NY 09038

John N Adams Data Processing Advisor DACCA Department of State Washington DC 20520

OR co UNDP PO Box 224 Dacca Bangladesh

Joe Quasney US Census Bureau Washington DC 20233

Howard Brunsman 5715 N Ninth St Arlington VA 22205

Larry Shiller and Julio Ortuzar Delta Systems Consultants Inc 264 Alhambra Circle Coral Cables FL 33134

Nicholas Ourusoff Burpee Hill Road New London NH 03257 (United Nations)

Frederic J Grant IV Systems Development GeorgiaWorld Congress Institute 580 North Omni International Atlanta Georgia 30303

3 32

Rafael Samper Mary Jo Keenan Catherine B Gleason LACDR Room 2246 NS Washington DC 20523

John Marshall AIDSERDMPSE Rm 706C SA-12 Washington DC 20523

Susanne Bacon and Mary K Friday ISPCData Processing US Census Bureau Washington DC 20233

Leo Dougherty ISPCWorld Census Staff US Bureau of the Census Washington DC 20233

33

CON C OR WORKSHOP

January 7-18 1979 Washington DC

Staff

Robert R Bair

Luis Garcia

David Malkovsky

Sandra Mansfield

Selma Sawaya

Vivian Toro

APPENDIX E

CONCOR EVALUATION FORM

0

APPENDIX E

CONCOR WORKSHOP January 16 1980

BASIC COBOL VERSION 2

December 1979 Release

Participant Evaluation of Package

1 What is your impression of this version of CONCOR in terms of its usefulness and reliability for editing housing and population census data in less developed countries

2 If you are familiar with previous versions of CONCOR how does this current version compare to them

2D

35

2

3 How would you compare the utility of this version of CONCOR for less developed countries with other systems for editing data of which you are familiar

4 How well do you feel the documentation (Reference Manual and Diagnostic Messages Guide) supports the software

36

3

5 How well do you think this version of CONCOR would serve for editing other types of statistical censuses or surveys

6 If your organization does statistical data processing by computer would you recommend to your superiors that this software be used to edit the data Do you feel assistance would be required in the installation of this software on your organizations computer andor in training userR on the packages applicability to their work

37

7 In what way(s) could improvements or enhancements be made to this version

of the CONCOR software andor its documentation

8 Other comments

1shy

APPENDIX F

NEWOLD COMMAND COMPARISONS

0 0

OLD CONCOR VERSION 1 DECEMBER 1978

DATA-DICTIONARY

DEFINE INPUT-FILE

DEFINE ERROR-FILE

DEFINE COMMON-DATA

(contains the questionnaire-IDand record type location information)

DEFINE RECORD-TYPE=

DEFINE NEW-DATA

DEFINE ARRAY-DATA

APPENDIX F

NEW CONCOR VERSION 2 DECEMBER 1979

DICTIONARY-DIVISION

DICTIONARY-ATTRIBUTES-SECTION

DICTIONARY-NAME

FILE-SECTION

INPUT-FILE OUTPUT-FILE WRITE-FILE ERROR-FILE

DOENTIFICATION-CONTROL-SECTION

AREA-CONTROL

QUESTIONNAIRE-CONTROL

RECORD-CONTROL

INPUT-RECORD-SECTION

DEFINE-RECORD

WORKING-DATA-SECTION

NEW-DATA ARRAY-DATA

END-DiVISION

COMMENTS

In Version 1 all command keyword labels had to begin in column 1 ofthe coding sheet The position notation of Version 2 permits thecommand to begin in columns 1-72

The period preceding a statement

signifies that it is but a commentcard Accordingly it should be noted that while the END-DIVISION command must be present the division name identifier DICTIONARY-DIVISION

has not been implemented

Note Some parameters for each commandhave been altered even where general correspondence exists

OLD CONCOR VERSION 1 DECEMBER 1978

EDIT-SPECIFICATION

PROLOG

FILTER TYPE ( )

EPILOG

ALLOCATE (ALL) ASSERT (AST)

END ERROR

FILTER TYPE ( ) WITH

IF LET

RANGE (RNG) RECODE (REC) STOP --keyword

UPDATE (UPD) WRITE (WRT) XRECODE (XREC)

NEW CONCOR VERSION 2 DECEMBER 1979

EXECUTION-DIVISION

REPORT-CONTROL-SECTION

COUNT-IMPUTES GENERATE-EDIT-STATISTICS

EDIT-SPECIFICATION-SECTION

PROLOG-ROUTINE

FILTER-ROUTINE (name-of-record-type)

EPILOG-ROUTINE

ALLOCATE (ALLOC) ASSERT (ASRT) DRECODE (DRCD) END-DIVISION

EXIT

GRECODE (GRCD) IF LET LIST OUTPUT RANGE (RNG)

STOP UNTIL UPDATE (UPD) WRITE (WRT)

END-DIVISION

COMMENTS

Periods preceding division and section names signify that they are treated as comments by CONCOR

END-DIVISION signifies the termination of the CONCOR division organization and must always be present even though division identifiers have not been implemented

This figure additionally permits a comparison of the EDIT-SPECI-FICATION commands of both CONCOR language versions Note the changes in abbreviations and that some keyword options (not shown) have been altered even where general corresponshydence exists Individual commands are discussed on the following pages

CONCOR LANGUAGE COMMAND STATEMENTS

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

ALLOCATE (ALL) ALLOCATE (ALLOC) Allows for the assignment of values and with the UPDATE command providesthe imputation mechanism Major difference is this statement now providesfor a receiving-identifier list and statistics are generated as coded inthe GENERATE-EDIT-STATISTICS command option of the EXECUTION-DIVISION

ASSERT fAST) ASSERT (ASRT) The ASSERT command is the basic consistency editor command of CONCOR Command now provides for a NOERROR and MESSAGE option Additionallyby utilizinq the DUMPR DUMPO keywords provides the user with the option of using all the current value of all user-identifiers defined for the record type being processed Other changes to this command include the pass end-pass fail end-fail constructions for the P purpose of maintaining structured logic

(see RECODE) DRECODE (DRCD) Provides a method for recoding data items having a starting value of zero and continue in ascending sequence Replaces t previous XRECODE command

EXIT Provides the means to terminate EXECUTION (ESCAPE) of command statements within the UNTIL command

END END-IF END-THEN

A solitary END command is no lonqer permitted in conditional construction Note the similarity to COBOL pseudocode

END-ELSE

ERROR not implemented No longer supported in language

(continued)

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

FILTER TYPE ( ) WITH not implemented No longer are option statements supported in language This command once specified essentially a GOTO depending on construction

(see XRECODE) GRECODE (GRCD) This new command provides a method for the group recoding of input values

IF IF The IF statement has properties found virtually in all programming languages ie provides a means of evaluating a condition and controlling the execution of alternative logical statements Similar to pseudocode a CONCOR programmer must utilize the structured command IF THEN END-THEN ELSE END-ELSE construction in place of the IF THEN END ELSE END statements acceptable to the 1978 version

LET LET Virtually unchanged

LIST New command which allows for the generation of specific messages to be displayed in the report of edit statistics by questionnaire as well as specific individual identifiers

OUTPUT New command which provides the means by which records will be written to the output-file

(continuej)

4)bull

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 VERSION 1

RANGE (RNG)

RECODE (REC)

STOP- keyword

UPDATE (UPD)

(continued)

DECEMBER 1979 VERSION 2

RANGE (RNG)

name changed

STOP- keyword

UNTIL DO END-DO

UPDATE (UPD)

COMMENTS

As a basic editing command of CONCOR the December 1979 version permits the listing of the out-of-range values as well as the entire record or questionnaire through DUMPR and DUMPO optionswithin the command Another new option of this command involves the utilization of a PASS END-PASS FAIL END-FAIL construction permitting the specification of logicalcommand paths depending upon the outcome of the range test

Name changed to DRECODE Virtually identical with the DRC command in the COCENTS system

Unchanged

DO-END-DO one of the most significant and powerful enhancements to the CONCOR language this command provides a loopingcapability similar to most other high-level pogramminglanguages The number of iterations is under user control as well as the value initially assigned to the user identifier (counter) which is incremented with each successive execution of the command statements

Virtually unchanged it is the basic mechanism for maintaining arrays involved in imputation processes

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

WRITE (WRT) WRITE (WRT) The WRITE language statement while once utilized as the primary output command is now used to create an auxiliary or derivative file Statistics may be accumulated with the specification of the GENERATE-EDIT-STATISTICS option of the EXECUTION-DIVISION (See WRITE-FILE of DICTIONARY-DIVISION)

XRECODE (XREC) name changed Old command which has become the GRECODE command in the December 1979 version

0 0 0

DECEMBER 1978 VERSION 1

DECEMBER 1979 VERSION 2

Not implemented REPORT-DIViSION

DISPLAY-CONTROL-SECTION

DISPLAY-EDIT-STATISTICS

TOLERANCE-CONTROL-SECTION

ERROR-RATE-CHECK REJECT-FILE

END-DIVISION

COMMENTS

This new division (note division and section names treated as comments) provides the means to controlthe type of edit statistics reports to be generated Generally all reports produced by the commands ofthis division are organized according to the AREA-CONTROL command of the DICTIONARY-DIVISION Notethat this means the input data file must be preproshycessed (presorted external to CONCOR) on the basisof the AREA-CONTROL data field as CONCOR is incapableof merging uncontiguous records

There is currently no command to enable users to specify unique report headings on the listingfrom this division

APPENDIX G

ISPC FUTURE ENHANCEMENT LIST

G-1

Appendix G

Future Versions of CONCOR Design Considerations

The present version of the CONCOR system (Basic COBOL Version 2) was designed to provide adequate computing power to properly edit housing and population census da a using automatic correction techniquns As a result CONCOR should not be construed to be the ultimate software package for generalized data editing The developers of this version of CONCOR realize that there is a group of improvements that can be made to the system which would facilitate the census data editing process A second group of changes or enhancements also exist that could facilitate the editing of survey or other types of census data It should be noted that some of the modifications to meet these two objectives at times are mutually exclusive Some of the possible modifications and considerations are outlined below

1 Housing and Population Census Processing

1 1 Software

1111 Dictionary Division

(1) Implementation of the Dictionary Attributes Section This would allow the user to specify the dictionary size (number of pages) on disk and control other file characteristics such as volume specification and retention periods if applicable

(2) Addition of an optional input andor output file(s) which could be used to initialize and save values stored in arrays and used during the allocation process This feature would require additional syntax analysis and the creation of new commands to perform the movement of the data values

(3) An output record format area (Output Record Section) could be added This would allow the creation of edited output files in a different format and length than the file read into CONCOR This implementation would require other enhanceients to provide a simple means of moving values from th-iT i tato tile output file Unique naming of user-identifiers would also need to be addressed Another possiblity is the declaration of both the input and output location in the same command when specifying the contents of tile data record to be edited

(4) Addition of user-identifier naires to fields used in the controlling of summary statistics and unique questionnaire determination This would require modification to the system level data dictionary

46 G-2

(5) Increasing the number of area and questionnaire control fields

This implementation would require modification lo the

dictionary as tell as to the error records passed to the Report

Division Another problem would be the formatting of the extra

data values in the report headings

(6) Permit the mixing of difFerent data item types in the CONTROL

commands This would allow the greatest flexibilty in choosing

control fields In order to implement this enhancement the

dictionary and the generated COBOL program (EDITOR) would

require modification

(7) Give the user access to the values in the control fields during

the editing process Either the user would be prohibited from

changing these values (as is currently done with CONCOR

internal identifiers) to prevent the creation of new questionnaires or the summary information gathered on a

questionnaire basis would become meaningless Another

ramification of this enhancement is how to handle fields defined as alphanumeric as CONCOR internally works in binary

(8) Allow input data records with a similar format to be defined by

the same DEFINE-RECORD command statement This could be

accomplished by permitting multiple values in the RECORD-TYPE

clause of the DEFINE-RECORD command A possible problem here

is in the determination of which value to be move to the output

record when the record is to be written out using the OUTPUT

command 0

(9) Implement a COMMION-DATA concept to handle items command to all

record types This could be costly in terms of core storage

and additional conversion time if a storage area is allocated

for each identifier for each record type Only permitting one

common area for all records would require validation of the

data to ensure the values were exactly the same on each record

This wJould require the CONCOR program to perform many more

internal checks when reading the data file into the store area

(10) Permit the selective inclusion of data items found in the

DEFINE-RECORD command statement into the universe count used

for the determination of tolerance levels when using the COUNT-IIPUTES command statement

(11) Increase the number and types of data format referencing now

permitted External numeric data (type N) could be expanded to

handle 18 digit numbers Also signed numeric and packed

decimal data format could be added The signed numeric format

would allow both leading blanks and negative data values This

format is essential ihen dealing with data files generated by

FORTRAN programs

0

+

G-3 47

(12) Increase the number of comparison strings permitted for alphanumeric data items This would allow for easier processing of alphanumeric strings but would require modifications to the system data dictionary

(13) If a range of valid values were specified with the declaration of a user-identifier in the DEFINE-RECORD command an automatic range check could be performed prior to CONCOR giving the user access to the data items on the questionnaire The inclusion of such values are now done by many users in the form of comment statements This enhancement would also facilitate the use of the dictionary by tabulation software

(14) The language format used in the Dictionary Division could be modified to recognize both the space and comma character as valid delimiters The user would still be required to code two commas to receive the system default value for any parameter

(15) Provide a repetition factor in the initialization of arrays

(16) Print the data dictionary name in the titles of the dictionary documentation or provide a means by which the user may specify a title to appear in the listings

112 Execution Division

(M) Implement the Run Control Section This would allow the user to increase execution speed of the EDITOR program This would be accomplished by eliminating some of the system protection features such as division by zero and out of bounds checking In this secion the user could also specify new maximum values for the EDITOR error limits

(2) Provide an internal identifier (OCCURRENCE-PTR) which would contain the occurrence number of the record currently being filtered The only problem to resolve is how to treat this id-ntifier outside of the Filter Routines

(3) Implement a CREATE commnand This command would allow the user to create a record that Was not found on the input file This implementation would require special care as it gives the user the ability to change the valme of a CONCOR internal identifier (TYPE-CGUNT) and modify the contents of the store Another consequence of this command is what action is to be taken in the calculation of tolerance in terms of this new records inclusion into the universe of observations and number of changes

iK

48 G-4

(4) The code generated in the EDITOR program for the DRECODE

command should be modified to utilized a lookup table approach

This would decrease the execution time of the command but woul C

require additional communication between the GENED and GENDD

programs

(5) Permit the user to continue the message text field over

successive lines of code

(6) Implementation of a SET command command that would change

CONCORs default occurrence pointer for the duration of the SET

command This command would be ever helpful as the validation

of the existence of a record would only have to be done once

when the SET command was encounterd The major drawback to

this command is in the users ability to remember the action

taken during the last execution of the command when the

Execution Division command statements may cover many pages of

code

(7) Implement a method of specifying different data formats for

fields on the WRITE command statement This would give greater

flexibility to the user but would require major revision to the

routines now used to process the WRITE command

(8) Provide the user with two sets of internal identifiers The

first set currently exists in the system and is valid on the

basis of the current control area (if specified) being

executed The second set would be accumulated on a run total

basis and would require the creation of new CONCOR internal

ident ifiers

(9) Implement automatic array declarations for capturing allocation

frequency distributions This could be done by the system for

discrete values (especially if point number 13 above is

implemented) but a mechanism for handling continuous values

would need to be designed nother program would then be

required to format the information gathered into a readable

form

113 Report Division

(I) Provide a means by which the user may specify their own headings or titles -for the reports

(2) Give the user the ability to override page ejection as a means

of saving paper

(3) Provide a neans by which the statistics gathered may be

presented in other aqgregations other than the area-break and

total levels now pro ided This would require another

procedure to aggregate the statistics and pass them onto the

Report Division before the printing of the reports began Thi

extra procedure is required because of program size

cons i derat ions

G-5

49

114 System Level

(1) Generation of assembler (ALC) language code for the EDITOR program to be used on the IBM 360370 series type computers

(2) Modification of the GENSRC program i utilize a lookup table instead of the linear search technioa2 now used

(3) Modification of the RDMSGTXT program to look at continuation lines when searching for the text to be supplied by the system

(4) Hlodification to the system data dictionary to better arrange and make available information needed most frequently Also to examine the possibility of making the section dealing with the message text and report generation a separate file Investigate the industry standards for dictionary format and see how CONCOR meets the requirements set forth

(5) Make the parsing and syntacial routines interactive so as to increase the product ity of the CONCOR user This does not mean to imply that the editing process itself should be made interactive but rather only the development of the user code should be put online

12 Documentation

(1)The development of a self-teaching CONCOR manual

(2) The development of a CONCOR Users Guide

G-6

0

50

2 Survey or other Census Processing

21 Software

(1) Make optional the specification of the record type and questionnaire identification information as some files are not

hierarchical in nature

(2) Provide another input file as a means of doing a check-in procedure of sample cases expected This new input file would contain the master list of those oberservations selected to be

examined

(3) Allow floating point calculations

(4) Provide a mechanism for referencing repetitive sections of questionnaire This referencing (loop letter approach) could be accomplished in the same fashion as interrecord checking is now implemented Note that by using this approach two versions of the software would be required One version would be as CONCOR is now implemented and the second would accept flat data files where the subscript indicated a repetitive section on the survey document

(5) Add a weighting scheme to allow meaningful totals and summary

statistics based upon the sampling frame used to gather the data being edited

(6) Increase the number of user-identifiers allowed in the system because of the more detailed information found on survey forms

(7) Provide a means of manual correction of erroneous data items This correction scheme would utilize the data dictionary and allow for correction based upon the user-identifer name and

questionnaire identification CELADE has a COBOL version of

this program but it has not been finished nor tested

42

APPENDIX H

CONCOR SYSTEM INTERNAL VARIABLES

0 APPENO H

CONCOR SYSTEM INTERNAL VARIABLES

1 EOF-FLAG

2 ERROR-TN-QUESTIONNAIRE

3 CURRENT-POINTER-VALUE

4 TYPE-COUNT ( )

5 RECORD-COUNT

6 QUESTIONNAIRE-COUNT

7 RECORDS-IN-STORE

8 INVALID-RECORD-TYPE-COUNT

9 INCOMPLETE-FLAG

10 CONTINUATION-FLAG

11 INVALID-RECORD-FLAG

EOF-FLAG

ERRORS-IN-QUESTIONNAIRE

TYPE-COUNT ( )

IN-RECORD-COUNT

IN-QUESTIONNAIRE-COUNT

RECORDS-IN-STORE

INVALID-RECORD-TYPE-COUNT

INCOMPLETE-FLAG

CONTINUATION-FLAG

OUT-OF-RANGE

NOT-NUMBER

BLANK

(new internal counters) OUT-RECORD-COUNT OUTPUT-QUEST-COUNT WRITE-RECORD-COUNT

Change in spelling of variable

Not mentioned--possible ommission from manual

Not implemented

Note Current documentation equivocates when some variables

are reset--most are initialized with each control area break

Though implemented as reserved identifiers due to poor documentation of (old) CONCOR exact values and utility were unknown

These apparently new counters permit access to valuable totals within control areas

Note These are not cumulative counters Also outputshyquest-count violates naming conversion eg in out

APPENDIX I

CONCOR-EDITOR EXECUTION STATISTICS

C 0 N C o R - E D I T O R EXECUT ION S-TA-T1 ST I CS RUN DATEt 517I

APPENDIX I

CONTROL AREA 0 INPUTS RELD OUTPUTS BY OUTPUTCOHMAND OUTPUTS BY W0ITECOMP4JD NO VALI 0 SEQUENCE NUMBER 00000 00 0 RECTYPES 0

OUESTIONNAIRES RECORDS 0 QUEST KEYWORD- 0 ECORDS- RECORDS FORQUFSTshy

bullDIVISION BY ZERO LINE NUMBER i18 to DIVISION BY ZERO LINE NUMBER 118 bo DIVISION BY ZERO 00 LINE NUMBER 118

Oboo DIVISION BY ZERO 00 LINE NUMBER 118 LTNE |UJMP 000 DIVISION BY ZERO 000 LINE NUMBER 118

bo DIVISION BY ZERO 0 LINE NUMBER 118 00 RUN ABORTED DUE TO DIVISION BY ZERO ERRORS 00 1it LIST IR003 = Ro3t

1 6 42

112 LIST

Comments One of the most sever criticisms 114 LIST IDIVISIION BY ZErO ERROR Cw FOLLOWS of the old CONCOR package was that ]l LET RO01 003 = O it would cease processing for no

apparent reason--ABEND without generating any messages The most )16 LIST iRO01 R003 SET TO ZEROS Pou19 Ron3 conmon types of cessations are division by zero array coordinate 117 LIST RO0 0 = errors and user specified terminations A program to deliberately 118 LET P003 = R002 RO01 test CONCORS ability to providediagnostics was written The system 119 LIST LIST R003 R002RO01 =1 R0031 perfoned exceptionally well and correctly provided the line numbers in the 120 source program which caused the data exceptions The program text has been 121 superimposed upon the Execution Statistics page for illustration purposes 122 END TEST OF OPERANDS AND MSG

123

124

1 5 PEGISTERS RESET TO ZERO

125 LET R001 RO00 R003 = nt

0 N C 1 - Z D I T 0 it - ( r S T A T I S T I C S QJi DAT 0119

I ~ - ~APPENDIX b)APPENDIX TL I A Cr14A) 01 3TUTS JRtTF -0 VtLIQ AP-A I PUT R - 3ITD JTS LY UTPUT - Y C

I)ESTIjAAIES - PECLRW cU] I S T igry-t QEC UP )S irrkqS F] ) T

VLLII= FF= 15( C 1 u U-1 Tr JJ TI A)[ 9 )i 1= TI =WATE ) L i - U = VI L IJ) L T J3r-= I n

D 1 4A 1 EItR P IXSTI J A 1 L = 7 II I fIAT E I i Y C

= y 4 I 19f CZ U AT -Z-2j 1ESTIJh14 E CJ)PI) IATE= 2 V L 1E= L T I P1 =UM 11

= rU F r

F R L- -- I JE TI rjA OrPR eTE= rUMiF=E II V1 LIIF= LIT V M

tYfZU ttj gR R QUIESTI3Ii1 j Vi r IATE= LI n l FJw== 0R V J L T C ( -pJIT E~Q - cUESTT Y2ZjI 1= g rfiflUI)IATL= VeLUJE= P LIf PP= 1 3)

bull~ - A ED 011 1 Tn CCy -tENCE C rGR DI T FOR UP S )r 0 JCCUJ0EICF FJ71 ECGPO

173

175 UNTIL ROO1 gt 10 VARY P001 FROM 1 By Comments

In this example RO01 a 2 dimensional array 177 DO with 10 rows and 10 columns is attempting

178 UNTIL R002 - 10 VARY BO FROM 1 BY 1 to update the smaller (5 X 5) TA2 Nonshy- Uexistent element references caused the error

The program text has been superimposed on this

page for illustration purposes

180 ALLOCATE NOOP = TA1 (RO0] ROO)I

JAI UPDATE TA2 (RO01 R002) RO031

182 LIST N002 = R0037 NOO R003

1P3 LIST VALUES OF TA1 = RO03v TAl (RO01 R002)

184 LIST

185 LIST ITA2=TA2 RO01 Rn02)l

l6 END-DOI LIST

187 END-DO

deg r- -- shy

APPENJDIXI1 J( j

CONTROL AREA INPUTS READ OUTPUTS BY OUTPUT LOMANO OUTPUTS TEltCOMMNDfN

SEQUENCE NUMBER 11T A QUESTIONNAIRES RECORDS QUEST KEYWORD RECORDS SFf- 0 T rr-

JOB STOPPED DUETO USER STOP-RUN COMMAND ON LINC 215 - - -- 2 15 _ _ 4$ A

EDITOR -- NO RMAL END OF J B r T I1)

214 IF IN-RECORD-COUNT gt Comments In this example when the internal variable1215 ITHEN STOP-RUN END-THEN was than

215 TE END-THEN IN-RECORD-COUNT was greater than 500 processing was directed by the source COBOL CONCOR language

216 1 program to cease The program text has been - upon this page for illustrationI _-EC superimposed

S IF IN-RECORD-COUNT- lt 2 - - purposes ~~~~~~~~~~- bull - - ------I-

218~ THEN -LIST lt(THAN 20RECS INR

amp-~~~~~ ntai-beDcent

I

APPENDIX J

DIAGNOSTIC MEFSAGE GUIDE EXAMPLE

APPENDIX J DD-2

WARNING(DD-001) BLANK COMMAND STATEMENT LINE

EXPLANATION A blank command statement line was found in the user Dictionary Division command statement file

ACTION TAKEN Parsing continued with the next command statement line in the Dictionary Division command file

USER RESPONSE Put a comment indicator () in column 1 of the command statement line if spacing is desired if not remove the command statement line

ERROR (DD-002) ILLEGAL FIRST CHARACTER IN STRING

EXPLANATION A character not in the CONCOR legal character set (A-Z 0-9 - + the comma character () the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found as the first character in the string

ACTION TAKEN Parsing began again with the next string

USER RESPONSE Check for possible keying errors and ensure that the first character is a member of the valid CONCOR

character set

ERROR (DD-003) END OF LINE REACHED WITH NO DELIMITER FOR ILLEGAL STRING

EXPLANATION A string starting with a character not valid in the CONCOR legal character set (A-Z 0-9 - + the comma character ()p the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found without one of the delimiter characters ( =) when the end of the command line was reached

ACTION TAKEN Parsing began again with the next user command statement line in the Dictionary Division command file

USER RESPONSE Check for possible keying errors and ensure that each of the characters in the string is a member of the CONCOR valid character set (A-Z 0-9 - + the comma character () he delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ())

ERROR (DD-004) COMMENT INDICATOR () IN STRING

EXPLANATION The comment indicator character () was found embedded in a string

Page 28: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …

APPENDIX D

PARTICIPANTS

APPENDIX D

CONCOR WORKSHOP January 7 - 18 1980 Marlow Heights MD

PARTICIPANTS

KENYA

James Midianga Government Computer Centre Central Bureau of Statistics PO Box 30266 Nairobi Kenya

PANAMA

Omar Rivera Rios Data Processing Specialist Contraloria General de la Republica de Panama Apartado Postal 5213 Panama 5 Panama

PHILIPPINES Guida T Capellan OIC Computer Programming Division National Census and Statistics Office Solicarel Bldg 1 Magsaysay Blvd

Sta Mesa Manila Philippines

THAILAND

Angsumal Sunalai National Statistical Office Bangkok 1 Thailand

EGYPT

Farag Sedky Mourad Ghaleb CAPMAS (Centeral Agency for Public

Mobilization and Statistics)NASR City Cairo Egypt

L

2 31

SAUDI ARABIA

Shargi R Al-Shargi National Computer Center PO Box 2534 Riyadh Saudi Arabia

Ahmed S Al-Ghamdi Dept Director of National Computer Center PO Box 2534 Riyadh Saudi Arabia

OTHER

Robert W ONeal USREPJECOR APO New York NY 09038

John N Adams Data Processing Advisor DACCA Department of State Washington DC 20520

OR co UNDP PO Box 224 Dacca Bangladesh

Joe Quasney US Census Bureau Washington DC 20233

Howard Brunsman 5715 N Ninth St Arlington VA 22205

Larry Shiller and Julio Ortuzar Delta Systems Consultants Inc 264 Alhambra Circle Coral Cables FL 33134

Nicholas Ourusoff Burpee Hill Road New London NH 03257 (United Nations)

Frederic J Grant IV Systems Development GeorgiaWorld Congress Institute 580 North Omni International Atlanta Georgia 30303

3 32

Rafael Samper Mary Jo Keenan Catherine B Gleason LACDR Room 2246 NS Washington DC 20523

John Marshall AIDSERDMPSE Rm 706C SA-12 Washington DC 20523

Susanne Bacon and Mary K Friday ISPCData Processing US Census Bureau Washington DC 20233

Leo Dougherty ISPCWorld Census Staff US Bureau of the Census Washington DC 20233

33

CON C OR WORKSHOP

January 7-18 1979 Washington DC

Staff

Robert R Bair

Luis Garcia

David Malkovsky

Sandra Mansfield

Selma Sawaya

Vivian Toro

APPENDIX E

CONCOR EVALUATION FORM

0

APPENDIX E

CONCOR WORKSHOP January 16 1980

BASIC COBOL VERSION 2

December 1979 Release

Participant Evaluation of Package

1 What is your impression of this version of CONCOR in terms of its usefulness and reliability for editing housing and population census data in less developed countries

2 If you are familiar with previous versions of CONCOR how does this current version compare to them

2D

35

2

3 How would you compare the utility of this version of CONCOR for less developed countries with other systems for editing data of which you are familiar

4 How well do you feel the documentation (Reference Manual and Diagnostic Messages Guide) supports the software

36

3

5 How well do you think this version of CONCOR would serve for editing other types of statistical censuses or surveys

6 If your organization does statistical data processing by computer would you recommend to your superiors that this software be used to edit the data Do you feel assistance would be required in the installation of this software on your organizations computer andor in training userR on the packages applicability to their work

37

7 In what way(s) could improvements or enhancements be made to this version

of the CONCOR software andor its documentation

8 Other comments

1shy

APPENDIX F

NEWOLD COMMAND COMPARISONS

0 0

OLD CONCOR VERSION 1 DECEMBER 1978

DATA-DICTIONARY

DEFINE INPUT-FILE

DEFINE ERROR-FILE

DEFINE COMMON-DATA

(contains the questionnaire-IDand record type location information)

DEFINE RECORD-TYPE=

DEFINE NEW-DATA

DEFINE ARRAY-DATA

APPENDIX F

NEW CONCOR VERSION 2 DECEMBER 1979

DICTIONARY-DIVISION

DICTIONARY-ATTRIBUTES-SECTION

DICTIONARY-NAME

FILE-SECTION

INPUT-FILE OUTPUT-FILE WRITE-FILE ERROR-FILE

DOENTIFICATION-CONTROL-SECTION

AREA-CONTROL

QUESTIONNAIRE-CONTROL

RECORD-CONTROL

INPUT-RECORD-SECTION

DEFINE-RECORD

WORKING-DATA-SECTION

NEW-DATA ARRAY-DATA

END-DiVISION

COMMENTS

In Version 1 all command keyword labels had to begin in column 1 ofthe coding sheet The position notation of Version 2 permits thecommand to begin in columns 1-72

The period preceding a statement

signifies that it is but a commentcard Accordingly it should be noted that while the END-DIVISION command must be present the division name identifier DICTIONARY-DIVISION

has not been implemented

Note Some parameters for each commandhave been altered even where general correspondence exists

OLD CONCOR VERSION 1 DECEMBER 1978

EDIT-SPECIFICATION

PROLOG

FILTER TYPE ( )

EPILOG

ALLOCATE (ALL) ASSERT (AST)

END ERROR

FILTER TYPE ( ) WITH

IF LET

RANGE (RNG) RECODE (REC) STOP --keyword

UPDATE (UPD) WRITE (WRT) XRECODE (XREC)

NEW CONCOR VERSION 2 DECEMBER 1979

EXECUTION-DIVISION

REPORT-CONTROL-SECTION

COUNT-IMPUTES GENERATE-EDIT-STATISTICS

EDIT-SPECIFICATION-SECTION

PROLOG-ROUTINE

FILTER-ROUTINE (name-of-record-type)

EPILOG-ROUTINE

ALLOCATE (ALLOC) ASSERT (ASRT) DRECODE (DRCD) END-DIVISION

EXIT

GRECODE (GRCD) IF LET LIST OUTPUT RANGE (RNG)

STOP UNTIL UPDATE (UPD) WRITE (WRT)

END-DIVISION

COMMENTS

Periods preceding division and section names signify that they are treated as comments by CONCOR

END-DIVISION signifies the termination of the CONCOR division organization and must always be present even though division identifiers have not been implemented

This figure additionally permits a comparison of the EDIT-SPECI-FICATION commands of both CONCOR language versions Note the changes in abbreviations and that some keyword options (not shown) have been altered even where general corresponshydence exists Individual commands are discussed on the following pages

CONCOR LANGUAGE COMMAND STATEMENTS

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

ALLOCATE (ALL) ALLOCATE (ALLOC) Allows for the assignment of values and with the UPDATE command providesthe imputation mechanism Major difference is this statement now providesfor a receiving-identifier list and statistics are generated as coded inthe GENERATE-EDIT-STATISTICS command option of the EXECUTION-DIVISION

ASSERT fAST) ASSERT (ASRT) The ASSERT command is the basic consistency editor command of CONCOR Command now provides for a NOERROR and MESSAGE option Additionallyby utilizinq the DUMPR DUMPO keywords provides the user with the option of using all the current value of all user-identifiers defined for the record type being processed Other changes to this command include the pass end-pass fail end-fail constructions for the P purpose of maintaining structured logic

(see RECODE) DRECODE (DRCD) Provides a method for recoding data items having a starting value of zero and continue in ascending sequence Replaces t previous XRECODE command

EXIT Provides the means to terminate EXECUTION (ESCAPE) of command statements within the UNTIL command

END END-IF END-THEN

A solitary END command is no lonqer permitted in conditional construction Note the similarity to COBOL pseudocode

END-ELSE

ERROR not implemented No longer supported in language

(continued)

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

FILTER TYPE ( ) WITH not implemented No longer are option statements supported in language This command once specified essentially a GOTO depending on construction

(see XRECODE) GRECODE (GRCD) This new command provides a method for the group recoding of input values

IF IF The IF statement has properties found virtually in all programming languages ie provides a means of evaluating a condition and controlling the execution of alternative logical statements Similar to pseudocode a CONCOR programmer must utilize the structured command IF THEN END-THEN ELSE END-ELSE construction in place of the IF THEN END ELSE END statements acceptable to the 1978 version

LET LET Virtually unchanged

LIST New command which allows for the generation of specific messages to be displayed in the report of edit statistics by questionnaire as well as specific individual identifiers

OUTPUT New command which provides the means by which records will be written to the output-file

(continuej)

4)bull

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 VERSION 1

RANGE (RNG)

RECODE (REC)

STOP- keyword

UPDATE (UPD)

(continued)

DECEMBER 1979 VERSION 2

RANGE (RNG)

name changed

STOP- keyword

UNTIL DO END-DO

UPDATE (UPD)

COMMENTS

As a basic editing command of CONCOR the December 1979 version permits the listing of the out-of-range values as well as the entire record or questionnaire through DUMPR and DUMPO optionswithin the command Another new option of this command involves the utilization of a PASS END-PASS FAIL END-FAIL construction permitting the specification of logicalcommand paths depending upon the outcome of the range test

Name changed to DRECODE Virtually identical with the DRC command in the COCENTS system

Unchanged

DO-END-DO one of the most significant and powerful enhancements to the CONCOR language this command provides a loopingcapability similar to most other high-level pogramminglanguages The number of iterations is under user control as well as the value initially assigned to the user identifier (counter) which is incremented with each successive execution of the command statements

Virtually unchanged it is the basic mechanism for maintaining arrays involved in imputation processes

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

WRITE (WRT) WRITE (WRT) The WRITE language statement while once utilized as the primary output command is now used to create an auxiliary or derivative file Statistics may be accumulated with the specification of the GENERATE-EDIT-STATISTICS option of the EXECUTION-DIVISION (See WRITE-FILE of DICTIONARY-DIVISION)

XRECODE (XREC) name changed Old command which has become the GRECODE command in the December 1979 version

0 0 0

DECEMBER 1978 VERSION 1

DECEMBER 1979 VERSION 2

Not implemented REPORT-DIViSION

DISPLAY-CONTROL-SECTION

DISPLAY-EDIT-STATISTICS

TOLERANCE-CONTROL-SECTION

ERROR-RATE-CHECK REJECT-FILE

END-DIVISION

COMMENTS

This new division (note division and section names treated as comments) provides the means to controlthe type of edit statistics reports to be generated Generally all reports produced by the commands ofthis division are organized according to the AREA-CONTROL command of the DICTIONARY-DIVISION Notethat this means the input data file must be preproshycessed (presorted external to CONCOR) on the basisof the AREA-CONTROL data field as CONCOR is incapableof merging uncontiguous records

There is currently no command to enable users to specify unique report headings on the listingfrom this division

APPENDIX G

ISPC FUTURE ENHANCEMENT LIST

G-1

Appendix G

Future Versions of CONCOR Design Considerations

The present version of the CONCOR system (Basic COBOL Version 2) was designed to provide adequate computing power to properly edit housing and population census da a using automatic correction techniquns As a result CONCOR should not be construed to be the ultimate software package for generalized data editing The developers of this version of CONCOR realize that there is a group of improvements that can be made to the system which would facilitate the census data editing process A second group of changes or enhancements also exist that could facilitate the editing of survey or other types of census data It should be noted that some of the modifications to meet these two objectives at times are mutually exclusive Some of the possible modifications and considerations are outlined below

1 Housing and Population Census Processing

1 1 Software

1111 Dictionary Division

(1) Implementation of the Dictionary Attributes Section This would allow the user to specify the dictionary size (number of pages) on disk and control other file characteristics such as volume specification and retention periods if applicable

(2) Addition of an optional input andor output file(s) which could be used to initialize and save values stored in arrays and used during the allocation process This feature would require additional syntax analysis and the creation of new commands to perform the movement of the data values

(3) An output record format area (Output Record Section) could be added This would allow the creation of edited output files in a different format and length than the file read into CONCOR This implementation would require other enhanceients to provide a simple means of moving values from th-iT i tato tile output file Unique naming of user-identifiers would also need to be addressed Another possiblity is the declaration of both the input and output location in the same command when specifying the contents of tile data record to be edited

(4) Addition of user-identifier naires to fields used in the controlling of summary statistics and unique questionnaire determination This would require modification to the system level data dictionary

46 G-2

(5) Increasing the number of area and questionnaire control fields

This implementation would require modification lo the

dictionary as tell as to the error records passed to the Report

Division Another problem would be the formatting of the extra

data values in the report headings

(6) Permit the mixing of difFerent data item types in the CONTROL

commands This would allow the greatest flexibilty in choosing

control fields In order to implement this enhancement the

dictionary and the generated COBOL program (EDITOR) would

require modification

(7) Give the user access to the values in the control fields during

the editing process Either the user would be prohibited from

changing these values (as is currently done with CONCOR

internal identifiers) to prevent the creation of new questionnaires or the summary information gathered on a

questionnaire basis would become meaningless Another

ramification of this enhancement is how to handle fields defined as alphanumeric as CONCOR internally works in binary

(8) Allow input data records with a similar format to be defined by

the same DEFINE-RECORD command statement This could be

accomplished by permitting multiple values in the RECORD-TYPE

clause of the DEFINE-RECORD command A possible problem here

is in the determination of which value to be move to the output

record when the record is to be written out using the OUTPUT

command 0

(9) Implement a COMMION-DATA concept to handle items command to all

record types This could be costly in terms of core storage

and additional conversion time if a storage area is allocated

for each identifier for each record type Only permitting one

common area for all records would require validation of the

data to ensure the values were exactly the same on each record

This wJould require the CONCOR program to perform many more

internal checks when reading the data file into the store area

(10) Permit the selective inclusion of data items found in the

DEFINE-RECORD command statement into the universe count used

for the determination of tolerance levels when using the COUNT-IIPUTES command statement

(11) Increase the number and types of data format referencing now

permitted External numeric data (type N) could be expanded to

handle 18 digit numbers Also signed numeric and packed

decimal data format could be added The signed numeric format

would allow both leading blanks and negative data values This

format is essential ihen dealing with data files generated by

FORTRAN programs

0

+

G-3 47

(12) Increase the number of comparison strings permitted for alphanumeric data items This would allow for easier processing of alphanumeric strings but would require modifications to the system data dictionary

(13) If a range of valid values were specified with the declaration of a user-identifier in the DEFINE-RECORD command an automatic range check could be performed prior to CONCOR giving the user access to the data items on the questionnaire The inclusion of such values are now done by many users in the form of comment statements This enhancement would also facilitate the use of the dictionary by tabulation software

(14) The language format used in the Dictionary Division could be modified to recognize both the space and comma character as valid delimiters The user would still be required to code two commas to receive the system default value for any parameter

(15) Provide a repetition factor in the initialization of arrays

(16) Print the data dictionary name in the titles of the dictionary documentation or provide a means by which the user may specify a title to appear in the listings

112 Execution Division

(M) Implement the Run Control Section This would allow the user to increase execution speed of the EDITOR program This would be accomplished by eliminating some of the system protection features such as division by zero and out of bounds checking In this secion the user could also specify new maximum values for the EDITOR error limits

(2) Provide an internal identifier (OCCURRENCE-PTR) which would contain the occurrence number of the record currently being filtered The only problem to resolve is how to treat this id-ntifier outside of the Filter Routines

(3) Implement a CREATE commnand This command would allow the user to create a record that Was not found on the input file This implementation would require special care as it gives the user the ability to change the valme of a CONCOR internal identifier (TYPE-CGUNT) and modify the contents of the store Another consequence of this command is what action is to be taken in the calculation of tolerance in terms of this new records inclusion into the universe of observations and number of changes

iK

48 G-4

(4) The code generated in the EDITOR program for the DRECODE

command should be modified to utilized a lookup table approach

This would decrease the execution time of the command but woul C

require additional communication between the GENED and GENDD

programs

(5) Permit the user to continue the message text field over

successive lines of code

(6) Implementation of a SET command command that would change

CONCORs default occurrence pointer for the duration of the SET

command This command would be ever helpful as the validation

of the existence of a record would only have to be done once

when the SET command was encounterd The major drawback to

this command is in the users ability to remember the action

taken during the last execution of the command when the

Execution Division command statements may cover many pages of

code

(7) Implement a method of specifying different data formats for

fields on the WRITE command statement This would give greater

flexibility to the user but would require major revision to the

routines now used to process the WRITE command

(8) Provide the user with two sets of internal identifiers The

first set currently exists in the system and is valid on the

basis of the current control area (if specified) being

executed The second set would be accumulated on a run total

basis and would require the creation of new CONCOR internal

ident ifiers

(9) Implement automatic array declarations for capturing allocation

frequency distributions This could be done by the system for

discrete values (especially if point number 13 above is

implemented) but a mechanism for handling continuous values

would need to be designed nother program would then be

required to format the information gathered into a readable

form

113 Report Division

(I) Provide a means by which the user may specify their own headings or titles -for the reports

(2) Give the user the ability to override page ejection as a means

of saving paper

(3) Provide a neans by which the statistics gathered may be

presented in other aqgregations other than the area-break and

total levels now pro ided This would require another

procedure to aggregate the statistics and pass them onto the

Report Division before the printing of the reports began Thi

extra procedure is required because of program size

cons i derat ions

G-5

49

114 System Level

(1) Generation of assembler (ALC) language code for the EDITOR program to be used on the IBM 360370 series type computers

(2) Modification of the GENSRC program i utilize a lookup table instead of the linear search technioa2 now used

(3) Modification of the RDMSGTXT program to look at continuation lines when searching for the text to be supplied by the system

(4) Hlodification to the system data dictionary to better arrange and make available information needed most frequently Also to examine the possibility of making the section dealing with the message text and report generation a separate file Investigate the industry standards for dictionary format and see how CONCOR meets the requirements set forth

(5) Make the parsing and syntacial routines interactive so as to increase the product ity of the CONCOR user This does not mean to imply that the editing process itself should be made interactive but rather only the development of the user code should be put online

12 Documentation

(1)The development of a self-teaching CONCOR manual

(2) The development of a CONCOR Users Guide

G-6

0

50

2 Survey or other Census Processing

21 Software

(1) Make optional the specification of the record type and questionnaire identification information as some files are not

hierarchical in nature

(2) Provide another input file as a means of doing a check-in procedure of sample cases expected This new input file would contain the master list of those oberservations selected to be

examined

(3) Allow floating point calculations

(4) Provide a mechanism for referencing repetitive sections of questionnaire This referencing (loop letter approach) could be accomplished in the same fashion as interrecord checking is now implemented Note that by using this approach two versions of the software would be required One version would be as CONCOR is now implemented and the second would accept flat data files where the subscript indicated a repetitive section on the survey document

(5) Add a weighting scheme to allow meaningful totals and summary

statistics based upon the sampling frame used to gather the data being edited

(6) Increase the number of user-identifiers allowed in the system because of the more detailed information found on survey forms

(7) Provide a means of manual correction of erroneous data items This correction scheme would utilize the data dictionary and allow for correction based upon the user-identifer name and

questionnaire identification CELADE has a COBOL version of

this program but it has not been finished nor tested

42

APPENDIX H

CONCOR SYSTEM INTERNAL VARIABLES

0 APPENO H

CONCOR SYSTEM INTERNAL VARIABLES

1 EOF-FLAG

2 ERROR-TN-QUESTIONNAIRE

3 CURRENT-POINTER-VALUE

4 TYPE-COUNT ( )

5 RECORD-COUNT

6 QUESTIONNAIRE-COUNT

7 RECORDS-IN-STORE

8 INVALID-RECORD-TYPE-COUNT

9 INCOMPLETE-FLAG

10 CONTINUATION-FLAG

11 INVALID-RECORD-FLAG

EOF-FLAG

ERRORS-IN-QUESTIONNAIRE

TYPE-COUNT ( )

IN-RECORD-COUNT

IN-QUESTIONNAIRE-COUNT

RECORDS-IN-STORE

INVALID-RECORD-TYPE-COUNT

INCOMPLETE-FLAG

CONTINUATION-FLAG

OUT-OF-RANGE

NOT-NUMBER

BLANK

(new internal counters) OUT-RECORD-COUNT OUTPUT-QUEST-COUNT WRITE-RECORD-COUNT

Change in spelling of variable

Not mentioned--possible ommission from manual

Not implemented

Note Current documentation equivocates when some variables

are reset--most are initialized with each control area break

Though implemented as reserved identifiers due to poor documentation of (old) CONCOR exact values and utility were unknown

These apparently new counters permit access to valuable totals within control areas

Note These are not cumulative counters Also outputshyquest-count violates naming conversion eg in out

APPENDIX I

CONCOR-EDITOR EXECUTION STATISTICS

C 0 N C o R - E D I T O R EXECUT ION S-TA-T1 ST I CS RUN DATEt 517I

APPENDIX I

CONTROL AREA 0 INPUTS RELD OUTPUTS BY OUTPUTCOHMAND OUTPUTS BY W0ITECOMP4JD NO VALI 0 SEQUENCE NUMBER 00000 00 0 RECTYPES 0

OUESTIONNAIRES RECORDS 0 QUEST KEYWORD- 0 ECORDS- RECORDS FORQUFSTshy

bullDIVISION BY ZERO LINE NUMBER i18 to DIVISION BY ZERO LINE NUMBER 118 bo DIVISION BY ZERO 00 LINE NUMBER 118

Oboo DIVISION BY ZERO 00 LINE NUMBER 118 LTNE |UJMP 000 DIVISION BY ZERO 000 LINE NUMBER 118

bo DIVISION BY ZERO 0 LINE NUMBER 118 00 RUN ABORTED DUE TO DIVISION BY ZERO ERRORS 00 1it LIST IR003 = Ro3t

1 6 42

112 LIST

Comments One of the most sever criticisms 114 LIST IDIVISIION BY ZErO ERROR Cw FOLLOWS of the old CONCOR package was that ]l LET RO01 003 = O it would cease processing for no

apparent reason--ABEND without generating any messages The most )16 LIST iRO01 R003 SET TO ZEROS Pou19 Ron3 conmon types of cessations are division by zero array coordinate 117 LIST RO0 0 = errors and user specified terminations A program to deliberately 118 LET P003 = R002 RO01 test CONCORS ability to providediagnostics was written The system 119 LIST LIST R003 R002RO01 =1 R0031 perfoned exceptionally well and correctly provided the line numbers in the 120 source program which caused the data exceptions The program text has been 121 superimposed upon the Execution Statistics page for illustration purposes 122 END TEST OF OPERANDS AND MSG

123

124

1 5 PEGISTERS RESET TO ZERO

125 LET R001 RO00 R003 = nt

0 N C 1 - Z D I T 0 it - ( r S T A T I S T I C S QJi DAT 0119

I ~ - ~APPENDIX b)APPENDIX TL I A Cr14A) 01 3TUTS JRtTF -0 VtLIQ AP-A I PUT R - 3ITD JTS LY UTPUT - Y C

I)ESTIjAAIES - PECLRW cU] I S T igry-t QEC UP )S irrkqS F] ) T

VLLII= FF= 15( C 1 u U-1 Tr JJ TI A)[ 9 )i 1= TI =WATE ) L i - U = VI L IJ) L T J3r-= I n

D 1 4A 1 EItR P IXSTI J A 1 L = 7 II I fIAT E I i Y C

= y 4 I 19f CZ U AT -Z-2j 1ESTIJh14 E CJ)PI) IATE= 2 V L 1E= L T I P1 =UM 11

= rU F r

F R L- -- I JE TI rjA OrPR eTE= rUMiF=E II V1 LIIF= LIT V M

tYfZU ttj gR R QUIESTI3Ii1 j Vi r IATE= LI n l FJw== 0R V J L T C ( -pJIT E~Q - cUESTT Y2ZjI 1= g rfiflUI)IATL= VeLUJE= P LIf PP= 1 3)

bull~ - A ED 011 1 Tn CCy -tENCE C rGR DI T FOR UP S )r 0 JCCUJ0EICF FJ71 ECGPO

173

175 UNTIL ROO1 gt 10 VARY P001 FROM 1 By Comments

In this example RO01 a 2 dimensional array 177 DO with 10 rows and 10 columns is attempting

178 UNTIL R002 - 10 VARY BO FROM 1 BY 1 to update the smaller (5 X 5) TA2 Nonshy- Uexistent element references caused the error

The program text has been superimposed on this

page for illustration purposes

180 ALLOCATE NOOP = TA1 (RO0] ROO)I

JAI UPDATE TA2 (RO01 R002) RO031

182 LIST N002 = R0037 NOO R003

1P3 LIST VALUES OF TA1 = RO03v TAl (RO01 R002)

184 LIST

185 LIST ITA2=TA2 RO01 Rn02)l

l6 END-DOI LIST

187 END-DO

deg r- -- shy

APPENJDIXI1 J( j

CONTROL AREA INPUTS READ OUTPUTS BY OUTPUT LOMANO OUTPUTS TEltCOMMNDfN

SEQUENCE NUMBER 11T A QUESTIONNAIRES RECORDS QUEST KEYWORD RECORDS SFf- 0 T rr-

JOB STOPPED DUETO USER STOP-RUN COMMAND ON LINC 215 - - -- 2 15 _ _ 4$ A

EDITOR -- NO RMAL END OF J B r T I1)

214 IF IN-RECORD-COUNT gt Comments In this example when the internal variable1215 ITHEN STOP-RUN END-THEN was than

215 TE END-THEN IN-RECORD-COUNT was greater than 500 processing was directed by the source COBOL CONCOR language

216 1 program to cease The program text has been - upon this page for illustrationI _-EC superimposed

S IF IN-RECORD-COUNT- lt 2 - - purposes ~~~~~~~~~~- bull - - ------I-

218~ THEN -LIST lt(THAN 20RECS INR

amp-~~~~~ ntai-beDcent

I

APPENDIX J

DIAGNOSTIC MEFSAGE GUIDE EXAMPLE

APPENDIX J DD-2

WARNING(DD-001) BLANK COMMAND STATEMENT LINE

EXPLANATION A blank command statement line was found in the user Dictionary Division command statement file

ACTION TAKEN Parsing continued with the next command statement line in the Dictionary Division command file

USER RESPONSE Put a comment indicator () in column 1 of the command statement line if spacing is desired if not remove the command statement line

ERROR (DD-002) ILLEGAL FIRST CHARACTER IN STRING

EXPLANATION A character not in the CONCOR legal character set (A-Z 0-9 - + the comma character () the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found as the first character in the string

ACTION TAKEN Parsing began again with the next string

USER RESPONSE Check for possible keying errors and ensure that the first character is a member of the valid CONCOR

character set

ERROR (DD-003) END OF LINE REACHED WITH NO DELIMITER FOR ILLEGAL STRING

EXPLANATION A string starting with a character not valid in the CONCOR legal character set (A-Z 0-9 - + the comma character ()p the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found without one of the delimiter characters ( =) when the end of the command line was reached

ACTION TAKEN Parsing began again with the next user command statement line in the Dictionary Division command file

USER RESPONSE Check for possible keying errors and ensure that each of the characters in the string is a member of the CONCOR valid character set (A-Z 0-9 - + the comma character () he delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ())

ERROR (DD-004) COMMENT INDICATOR () IN STRING

EXPLANATION The comment indicator character () was found embedded in a string

Page 29: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …

APPENDIX D

CONCOR WORKSHOP January 7 - 18 1980 Marlow Heights MD

PARTICIPANTS

KENYA

James Midianga Government Computer Centre Central Bureau of Statistics PO Box 30266 Nairobi Kenya

PANAMA

Omar Rivera Rios Data Processing Specialist Contraloria General de la Republica de Panama Apartado Postal 5213 Panama 5 Panama

PHILIPPINES Guida T Capellan OIC Computer Programming Division National Census and Statistics Office Solicarel Bldg 1 Magsaysay Blvd

Sta Mesa Manila Philippines

THAILAND

Angsumal Sunalai National Statistical Office Bangkok 1 Thailand

EGYPT

Farag Sedky Mourad Ghaleb CAPMAS (Centeral Agency for Public

Mobilization and Statistics)NASR City Cairo Egypt

L

2 31

SAUDI ARABIA

Shargi R Al-Shargi National Computer Center PO Box 2534 Riyadh Saudi Arabia

Ahmed S Al-Ghamdi Dept Director of National Computer Center PO Box 2534 Riyadh Saudi Arabia

OTHER

Robert W ONeal USREPJECOR APO New York NY 09038

John N Adams Data Processing Advisor DACCA Department of State Washington DC 20520

OR co UNDP PO Box 224 Dacca Bangladesh

Joe Quasney US Census Bureau Washington DC 20233

Howard Brunsman 5715 N Ninth St Arlington VA 22205

Larry Shiller and Julio Ortuzar Delta Systems Consultants Inc 264 Alhambra Circle Coral Cables FL 33134

Nicholas Ourusoff Burpee Hill Road New London NH 03257 (United Nations)

Frederic J Grant IV Systems Development GeorgiaWorld Congress Institute 580 North Omni International Atlanta Georgia 30303

3 32

Rafael Samper Mary Jo Keenan Catherine B Gleason LACDR Room 2246 NS Washington DC 20523

John Marshall AIDSERDMPSE Rm 706C SA-12 Washington DC 20523

Susanne Bacon and Mary K Friday ISPCData Processing US Census Bureau Washington DC 20233

Leo Dougherty ISPCWorld Census Staff US Bureau of the Census Washington DC 20233

33

CON C OR WORKSHOP

January 7-18 1979 Washington DC

Staff

Robert R Bair

Luis Garcia

David Malkovsky

Sandra Mansfield

Selma Sawaya

Vivian Toro

APPENDIX E

CONCOR EVALUATION FORM

0

APPENDIX E

CONCOR WORKSHOP January 16 1980

BASIC COBOL VERSION 2

December 1979 Release

Participant Evaluation of Package

1 What is your impression of this version of CONCOR in terms of its usefulness and reliability for editing housing and population census data in less developed countries

2 If you are familiar with previous versions of CONCOR how does this current version compare to them

2D

35

2

3 How would you compare the utility of this version of CONCOR for less developed countries with other systems for editing data of which you are familiar

4 How well do you feel the documentation (Reference Manual and Diagnostic Messages Guide) supports the software

36

3

5 How well do you think this version of CONCOR would serve for editing other types of statistical censuses or surveys

6 If your organization does statistical data processing by computer would you recommend to your superiors that this software be used to edit the data Do you feel assistance would be required in the installation of this software on your organizations computer andor in training userR on the packages applicability to their work

37

7 In what way(s) could improvements or enhancements be made to this version

of the CONCOR software andor its documentation

8 Other comments

1shy

APPENDIX F

NEWOLD COMMAND COMPARISONS

0 0

OLD CONCOR VERSION 1 DECEMBER 1978

DATA-DICTIONARY

DEFINE INPUT-FILE

DEFINE ERROR-FILE

DEFINE COMMON-DATA

(contains the questionnaire-IDand record type location information)

DEFINE RECORD-TYPE=

DEFINE NEW-DATA

DEFINE ARRAY-DATA

APPENDIX F

NEW CONCOR VERSION 2 DECEMBER 1979

DICTIONARY-DIVISION

DICTIONARY-ATTRIBUTES-SECTION

DICTIONARY-NAME

FILE-SECTION

INPUT-FILE OUTPUT-FILE WRITE-FILE ERROR-FILE

DOENTIFICATION-CONTROL-SECTION

AREA-CONTROL

QUESTIONNAIRE-CONTROL

RECORD-CONTROL

INPUT-RECORD-SECTION

DEFINE-RECORD

WORKING-DATA-SECTION

NEW-DATA ARRAY-DATA

END-DiVISION

COMMENTS

In Version 1 all command keyword labels had to begin in column 1 ofthe coding sheet The position notation of Version 2 permits thecommand to begin in columns 1-72

The period preceding a statement

signifies that it is but a commentcard Accordingly it should be noted that while the END-DIVISION command must be present the division name identifier DICTIONARY-DIVISION

has not been implemented

Note Some parameters for each commandhave been altered even where general correspondence exists

OLD CONCOR VERSION 1 DECEMBER 1978

EDIT-SPECIFICATION

PROLOG

FILTER TYPE ( )

EPILOG

ALLOCATE (ALL) ASSERT (AST)

END ERROR

FILTER TYPE ( ) WITH

IF LET

RANGE (RNG) RECODE (REC) STOP --keyword

UPDATE (UPD) WRITE (WRT) XRECODE (XREC)

NEW CONCOR VERSION 2 DECEMBER 1979

EXECUTION-DIVISION

REPORT-CONTROL-SECTION

COUNT-IMPUTES GENERATE-EDIT-STATISTICS

EDIT-SPECIFICATION-SECTION

PROLOG-ROUTINE

FILTER-ROUTINE (name-of-record-type)

EPILOG-ROUTINE

ALLOCATE (ALLOC) ASSERT (ASRT) DRECODE (DRCD) END-DIVISION

EXIT

GRECODE (GRCD) IF LET LIST OUTPUT RANGE (RNG)

STOP UNTIL UPDATE (UPD) WRITE (WRT)

END-DIVISION

COMMENTS

Periods preceding division and section names signify that they are treated as comments by CONCOR

END-DIVISION signifies the termination of the CONCOR division organization and must always be present even though division identifiers have not been implemented

This figure additionally permits a comparison of the EDIT-SPECI-FICATION commands of both CONCOR language versions Note the changes in abbreviations and that some keyword options (not shown) have been altered even where general corresponshydence exists Individual commands are discussed on the following pages

CONCOR LANGUAGE COMMAND STATEMENTS

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

ALLOCATE (ALL) ALLOCATE (ALLOC) Allows for the assignment of values and with the UPDATE command providesthe imputation mechanism Major difference is this statement now providesfor a receiving-identifier list and statistics are generated as coded inthe GENERATE-EDIT-STATISTICS command option of the EXECUTION-DIVISION

ASSERT fAST) ASSERT (ASRT) The ASSERT command is the basic consistency editor command of CONCOR Command now provides for a NOERROR and MESSAGE option Additionallyby utilizinq the DUMPR DUMPO keywords provides the user with the option of using all the current value of all user-identifiers defined for the record type being processed Other changes to this command include the pass end-pass fail end-fail constructions for the P purpose of maintaining structured logic

(see RECODE) DRECODE (DRCD) Provides a method for recoding data items having a starting value of zero and continue in ascending sequence Replaces t previous XRECODE command

EXIT Provides the means to terminate EXECUTION (ESCAPE) of command statements within the UNTIL command

END END-IF END-THEN

A solitary END command is no lonqer permitted in conditional construction Note the similarity to COBOL pseudocode

END-ELSE

ERROR not implemented No longer supported in language

(continued)

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

FILTER TYPE ( ) WITH not implemented No longer are option statements supported in language This command once specified essentially a GOTO depending on construction

(see XRECODE) GRECODE (GRCD) This new command provides a method for the group recoding of input values

IF IF The IF statement has properties found virtually in all programming languages ie provides a means of evaluating a condition and controlling the execution of alternative logical statements Similar to pseudocode a CONCOR programmer must utilize the structured command IF THEN END-THEN ELSE END-ELSE construction in place of the IF THEN END ELSE END statements acceptable to the 1978 version

LET LET Virtually unchanged

LIST New command which allows for the generation of specific messages to be displayed in the report of edit statistics by questionnaire as well as specific individual identifiers

OUTPUT New command which provides the means by which records will be written to the output-file

(continuej)

4)bull

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 VERSION 1

RANGE (RNG)

RECODE (REC)

STOP- keyword

UPDATE (UPD)

(continued)

DECEMBER 1979 VERSION 2

RANGE (RNG)

name changed

STOP- keyword

UNTIL DO END-DO

UPDATE (UPD)

COMMENTS

As a basic editing command of CONCOR the December 1979 version permits the listing of the out-of-range values as well as the entire record or questionnaire through DUMPR and DUMPO optionswithin the command Another new option of this command involves the utilization of a PASS END-PASS FAIL END-FAIL construction permitting the specification of logicalcommand paths depending upon the outcome of the range test

Name changed to DRECODE Virtually identical with the DRC command in the COCENTS system

Unchanged

DO-END-DO one of the most significant and powerful enhancements to the CONCOR language this command provides a loopingcapability similar to most other high-level pogramminglanguages The number of iterations is under user control as well as the value initially assigned to the user identifier (counter) which is incremented with each successive execution of the command statements

Virtually unchanged it is the basic mechanism for maintaining arrays involved in imputation processes

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

WRITE (WRT) WRITE (WRT) The WRITE language statement while once utilized as the primary output command is now used to create an auxiliary or derivative file Statistics may be accumulated with the specification of the GENERATE-EDIT-STATISTICS option of the EXECUTION-DIVISION (See WRITE-FILE of DICTIONARY-DIVISION)

XRECODE (XREC) name changed Old command which has become the GRECODE command in the December 1979 version

0 0 0

DECEMBER 1978 VERSION 1

DECEMBER 1979 VERSION 2

Not implemented REPORT-DIViSION

DISPLAY-CONTROL-SECTION

DISPLAY-EDIT-STATISTICS

TOLERANCE-CONTROL-SECTION

ERROR-RATE-CHECK REJECT-FILE

END-DIVISION

COMMENTS

This new division (note division and section names treated as comments) provides the means to controlthe type of edit statistics reports to be generated Generally all reports produced by the commands ofthis division are organized according to the AREA-CONTROL command of the DICTIONARY-DIVISION Notethat this means the input data file must be preproshycessed (presorted external to CONCOR) on the basisof the AREA-CONTROL data field as CONCOR is incapableof merging uncontiguous records

There is currently no command to enable users to specify unique report headings on the listingfrom this division

APPENDIX G

ISPC FUTURE ENHANCEMENT LIST

G-1

Appendix G

Future Versions of CONCOR Design Considerations

The present version of the CONCOR system (Basic COBOL Version 2) was designed to provide adequate computing power to properly edit housing and population census da a using automatic correction techniquns As a result CONCOR should not be construed to be the ultimate software package for generalized data editing The developers of this version of CONCOR realize that there is a group of improvements that can be made to the system which would facilitate the census data editing process A second group of changes or enhancements also exist that could facilitate the editing of survey or other types of census data It should be noted that some of the modifications to meet these two objectives at times are mutually exclusive Some of the possible modifications and considerations are outlined below

1 Housing and Population Census Processing

1 1 Software

1111 Dictionary Division

(1) Implementation of the Dictionary Attributes Section This would allow the user to specify the dictionary size (number of pages) on disk and control other file characteristics such as volume specification and retention periods if applicable

(2) Addition of an optional input andor output file(s) which could be used to initialize and save values stored in arrays and used during the allocation process This feature would require additional syntax analysis and the creation of new commands to perform the movement of the data values

(3) An output record format area (Output Record Section) could be added This would allow the creation of edited output files in a different format and length than the file read into CONCOR This implementation would require other enhanceients to provide a simple means of moving values from th-iT i tato tile output file Unique naming of user-identifiers would also need to be addressed Another possiblity is the declaration of both the input and output location in the same command when specifying the contents of tile data record to be edited

(4) Addition of user-identifier naires to fields used in the controlling of summary statistics and unique questionnaire determination This would require modification to the system level data dictionary

46 G-2

(5) Increasing the number of area and questionnaire control fields

This implementation would require modification lo the

dictionary as tell as to the error records passed to the Report

Division Another problem would be the formatting of the extra

data values in the report headings

(6) Permit the mixing of difFerent data item types in the CONTROL

commands This would allow the greatest flexibilty in choosing

control fields In order to implement this enhancement the

dictionary and the generated COBOL program (EDITOR) would

require modification

(7) Give the user access to the values in the control fields during

the editing process Either the user would be prohibited from

changing these values (as is currently done with CONCOR

internal identifiers) to prevent the creation of new questionnaires or the summary information gathered on a

questionnaire basis would become meaningless Another

ramification of this enhancement is how to handle fields defined as alphanumeric as CONCOR internally works in binary

(8) Allow input data records with a similar format to be defined by

the same DEFINE-RECORD command statement This could be

accomplished by permitting multiple values in the RECORD-TYPE

clause of the DEFINE-RECORD command A possible problem here

is in the determination of which value to be move to the output

record when the record is to be written out using the OUTPUT

command 0

(9) Implement a COMMION-DATA concept to handle items command to all

record types This could be costly in terms of core storage

and additional conversion time if a storage area is allocated

for each identifier for each record type Only permitting one

common area for all records would require validation of the

data to ensure the values were exactly the same on each record

This wJould require the CONCOR program to perform many more

internal checks when reading the data file into the store area

(10) Permit the selective inclusion of data items found in the

DEFINE-RECORD command statement into the universe count used

for the determination of tolerance levels when using the COUNT-IIPUTES command statement

(11) Increase the number and types of data format referencing now

permitted External numeric data (type N) could be expanded to

handle 18 digit numbers Also signed numeric and packed

decimal data format could be added The signed numeric format

would allow both leading blanks and negative data values This

format is essential ihen dealing with data files generated by

FORTRAN programs

0

+

G-3 47

(12) Increase the number of comparison strings permitted for alphanumeric data items This would allow for easier processing of alphanumeric strings but would require modifications to the system data dictionary

(13) If a range of valid values were specified with the declaration of a user-identifier in the DEFINE-RECORD command an automatic range check could be performed prior to CONCOR giving the user access to the data items on the questionnaire The inclusion of such values are now done by many users in the form of comment statements This enhancement would also facilitate the use of the dictionary by tabulation software

(14) The language format used in the Dictionary Division could be modified to recognize both the space and comma character as valid delimiters The user would still be required to code two commas to receive the system default value for any parameter

(15) Provide a repetition factor in the initialization of arrays

(16) Print the data dictionary name in the titles of the dictionary documentation or provide a means by which the user may specify a title to appear in the listings

112 Execution Division

(M) Implement the Run Control Section This would allow the user to increase execution speed of the EDITOR program This would be accomplished by eliminating some of the system protection features such as division by zero and out of bounds checking In this secion the user could also specify new maximum values for the EDITOR error limits

(2) Provide an internal identifier (OCCURRENCE-PTR) which would contain the occurrence number of the record currently being filtered The only problem to resolve is how to treat this id-ntifier outside of the Filter Routines

(3) Implement a CREATE commnand This command would allow the user to create a record that Was not found on the input file This implementation would require special care as it gives the user the ability to change the valme of a CONCOR internal identifier (TYPE-CGUNT) and modify the contents of the store Another consequence of this command is what action is to be taken in the calculation of tolerance in terms of this new records inclusion into the universe of observations and number of changes

iK

48 G-4

(4) The code generated in the EDITOR program for the DRECODE

command should be modified to utilized a lookup table approach

This would decrease the execution time of the command but woul C

require additional communication between the GENED and GENDD

programs

(5) Permit the user to continue the message text field over

successive lines of code

(6) Implementation of a SET command command that would change

CONCORs default occurrence pointer for the duration of the SET

command This command would be ever helpful as the validation

of the existence of a record would only have to be done once

when the SET command was encounterd The major drawback to

this command is in the users ability to remember the action

taken during the last execution of the command when the

Execution Division command statements may cover many pages of

code

(7) Implement a method of specifying different data formats for

fields on the WRITE command statement This would give greater

flexibility to the user but would require major revision to the

routines now used to process the WRITE command

(8) Provide the user with two sets of internal identifiers The

first set currently exists in the system and is valid on the

basis of the current control area (if specified) being

executed The second set would be accumulated on a run total

basis and would require the creation of new CONCOR internal

ident ifiers

(9) Implement automatic array declarations for capturing allocation

frequency distributions This could be done by the system for

discrete values (especially if point number 13 above is

implemented) but a mechanism for handling continuous values

would need to be designed nother program would then be

required to format the information gathered into a readable

form

113 Report Division

(I) Provide a means by which the user may specify their own headings or titles -for the reports

(2) Give the user the ability to override page ejection as a means

of saving paper

(3) Provide a neans by which the statistics gathered may be

presented in other aqgregations other than the area-break and

total levels now pro ided This would require another

procedure to aggregate the statistics and pass them onto the

Report Division before the printing of the reports began Thi

extra procedure is required because of program size

cons i derat ions

G-5

49

114 System Level

(1) Generation of assembler (ALC) language code for the EDITOR program to be used on the IBM 360370 series type computers

(2) Modification of the GENSRC program i utilize a lookup table instead of the linear search technioa2 now used

(3) Modification of the RDMSGTXT program to look at continuation lines when searching for the text to be supplied by the system

(4) Hlodification to the system data dictionary to better arrange and make available information needed most frequently Also to examine the possibility of making the section dealing with the message text and report generation a separate file Investigate the industry standards for dictionary format and see how CONCOR meets the requirements set forth

(5) Make the parsing and syntacial routines interactive so as to increase the product ity of the CONCOR user This does not mean to imply that the editing process itself should be made interactive but rather only the development of the user code should be put online

12 Documentation

(1)The development of a self-teaching CONCOR manual

(2) The development of a CONCOR Users Guide

G-6

0

50

2 Survey or other Census Processing

21 Software

(1) Make optional the specification of the record type and questionnaire identification information as some files are not

hierarchical in nature

(2) Provide another input file as a means of doing a check-in procedure of sample cases expected This new input file would contain the master list of those oberservations selected to be

examined

(3) Allow floating point calculations

(4) Provide a mechanism for referencing repetitive sections of questionnaire This referencing (loop letter approach) could be accomplished in the same fashion as interrecord checking is now implemented Note that by using this approach two versions of the software would be required One version would be as CONCOR is now implemented and the second would accept flat data files where the subscript indicated a repetitive section on the survey document

(5) Add a weighting scheme to allow meaningful totals and summary

statistics based upon the sampling frame used to gather the data being edited

(6) Increase the number of user-identifiers allowed in the system because of the more detailed information found on survey forms

(7) Provide a means of manual correction of erroneous data items This correction scheme would utilize the data dictionary and allow for correction based upon the user-identifer name and

questionnaire identification CELADE has a COBOL version of

this program but it has not been finished nor tested

42

APPENDIX H

CONCOR SYSTEM INTERNAL VARIABLES

0 APPENO H

CONCOR SYSTEM INTERNAL VARIABLES

1 EOF-FLAG

2 ERROR-TN-QUESTIONNAIRE

3 CURRENT-POINTER-VALUE

4 TYPE-COUNT ( )

5 RECORD-COUNT

6 QUESTIONNAIRE-COUNT

7 RECORDS-IN-STORE

8 INVALID-RECORD-TYPE-COUNT

9 INCOMPLETE-FLAG

10 CONTINUATION-FLAG

11 INVALID-RECORD-FLAG

EOF-FLAG

ERRORS-IN-QUESTIONNAIRE

TYPE-COUNT ( )

IN-RECORD-COUNT

IN-QUESTIONNAIRE-COUNT

RECORDS-IN-STORE

INVALID-RECORD-TYPE-COUNT

INCOMPLETE-FLAG

CONTINUATION-FLAG

OUT-OF-RANGE

NOT-NUMBER

BLANK

(new internal counters) OUT-RECORD-COUNT OUTPUT-QUEST-COUNT WRITE-RECORD-COUNT

Change in spelling of variable

Not mentioned--possible ommission from manual

Not implemented

Note Current documentation equivocates when some variables

are reset--most are initialized with each control area break

Though implemented as reserved identifiers due to poor documentation of (old) CONCOR exact values and utility were unknown

These apparently new counters permit access to valuable totals within control areas

Note These are not cumulative counters Also outputshyquest-count violates naming conversion eg in out

APPENDIX I

CONCOR-EDITOR EXECUTION STATISTICS

C 0 N C o R - E D I T O R EXECUT ION S-TA-T1 ST I CS RUN DATEt 517I

APPENDIX I

CONTROL AREA 0 INPUTS RELD OUTPUTS BY OUTPUTCOHMAND OUTPUTS BY W0ITECOMP4JD NO VALI 0 SEQUENCE NUMBER 00000 00 0 RECTYPES 0

OUESTIONNAIRES RECORDS 0 QUEST KEYWORD- 0 ECORDS- RECORDS FORQUFSTshy

bullDIVISION BY ZERO LINE NUMBER i18 to DIVISION BY ZERO LINE NUMBER 118 bo DIVISION BY ZERO 00 LINE NUMBER 118

Oboo DIVISION BY ZERO 00 LINE NUMBER 118 LTNE |UJMP 000 DIVISION BY ZERO 000 LINE NUMBER 118

bo DIVISION BY ZERO 0 LINE NUMBER 118 00 RUN ABORTED DUE TO DIVISION BY ZERO ERRORS 00 1it LIST IR003 = Ro3t

1 6 42

112 LIST

Comments One of the most sever criticisms 114 LIST IDIVISIION BY ZErO ERROR Cw FOLLOWS of the old CONCOR package was that ]l LET RO01 003 = O it would cease processing for no

apparent reason--ABEND without generating any messages The most )16 LIST iRO01 R003 SET TO ZEROS Pou19 Ron3 conmon types of cessations are division by zero array coordinate 117 LIST RO0 0 = errors and user specified terminations A program to deliberately 118 LET P003 = R002 RO01 test CONCORS ability to providediagnostics was written The system 119 LIST LIST R003 R002RO01 =1 R0031 perfoned exceptionally well and correctly provided the line numbers in the 120 source program which caused the data exceptions The program text has been 121 superimposed upon the Execution Statistics page for illustration purposes 122 END TEST OF OPERANDS AND MSG

123

124

1 5 PEGISTERS RESET TO ZERO

125 LET R001 RO00 R003 = nt

0 N C 1 - Z D I T 0 it - ( r S T A T I S T I C S QJi DAT 0119

I ~ - ~APPENDIX b)APPENDIX TL I A Cr14A) 01 3TUTS JRtTF -0 VtLIQ AP-A I PUT R - 3ITD JTS LY UTPUT - Y C

I)ESTIjAAIES - PECLRW cU] I S T igry-t QEC UP )S irrkqS F] ) T

VLLII= FF= 15( C 1 u U-1 Tr JJ TI A)[ 9 )i 1= TI =WATE ) L i - U = VI L IJ) L T J3r-= I n

D 1 4A 1 EItR P IXSTI J A 1 L = 7 II I fIAT E I i Y C

= y 4 I 19f CZ U AT -Z-2j 1ESTIJh14 E CJ)PI) IATE= 2 V L 1E= L T I P1 =UM 11

= rU F r

F R L- -- I JE TI rjA OrPR eTE= rUMiF=E II V1 LIIF= LIT V M

tYfZU ttj gR R QUIESTI3Ii1 j Vi r IATE= LI n l FJw== 0R V J L T C ( -pJIT E~Q - cUESTT Y2ZjI 1= g rfiflUI)IATL= VeLUJE= P LIf PP= 1 3)

bull~ - A ED 011 1 Tn CCy -tENCE C rGR DI T FOR UP S )r 0 JCCUJ0EICF FJ71 ECGPO

173

175 UNTIL ROO1 gt 10 VARY P001 FROM 1 By Comments

In this example RO01 a 2 dimensional array 177 DO with 10 rows and 10 columns is attempting

178 UNTIL R002 - 10 VARY BO FROM 1 BY 1 to update the smaller (5 X 5) TA2 Nonshy- Uexistent element references caused the error

The program text has been superimposed on this

page for illustration purposes

180 ALLOCATE NOOP = TA1 (RO0] ROO)I

JAI UPDATE TA2 (RO01 R002) RO031

182 LIST N002 = R0037 NOO R003

1P3 LIST VALUES OF TA1 = RO03v TAl (RO01 R002)

184 LIST

185 LIST ITA2=TA2 RO01 Rn02)l

l6 END-DOI LIST

187 END-DO

deg r- -- shy

APPENJDIXI1 J( j

CONTROL AREA INPUTS READ OUTPUTS BY OUTPUT LOMANO OUTPUTS TEltCOMMNDfN

SEQUENCE NUMBER 11T A QUESTIONNAIRES RECORDS QUEST KEYWORD RECORDS SFf- 0 T rr-

JOB STOPPED DUETO USER STOP-RUN COMMAND ON LINC 215 - - -- 2 15 _ _ 4$ A

EDITOR -- NO RMAL END OF J B r T I1)

214 IF IN-RECORD-COUNT gt Comments In this example when the internal variable1215 ITHEN STOP-RUN END-THEN was than

215 TE END-THEN IN-RECORD-COUNT was greater than 500 processing was directed by the source COBOL CONCOR language

216 1 program to cease The program text has been - upon this page for illustrationI _-EC superimposed

S IF IN-RECORD-COUNT- lt 2 - - purposes ~~~~~~~~~~- bull - - ------I-

218~ THEN -LIST lt(THAN 20RECS INR

amp-~~~~~ ntai-beDcent

I

APPENDIX J

DIAGNOSTIC MEFSAGE GUIDE EXAMPLE

APPENDIX J DD-2

WARNING(DD-001) BLANK COMMAND STATEMENT LINE

EXPLANATION A blank command statement line was found in the user Dictionary Division command statement file

ACTION TAKEN Parsing continued with the next command statement line in the Dictionary Division command file

USER RESPONSE Put a comment indicator () in column 1 of the command statement line if spacing is desired if not remove the command statement line

ERROR (DD-002) ILLEGAL FIRST CHARACTER IN STRING

EXPLANATION A character not in the CONCOR legal character set (A-Z 0-9 - + the comma character () the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found as the first character in the string

ACTION TAKEN Parsing began again with the next string

USER RESPONSE Check for possible keying errors and ensure that the first character is a member of the valid CONCOR

character set

ERROR (DD-003) END OF LINE REACHED WITH NO DELIMITER FOR ILLEGAL STRING

EXPLANATION A string starting with a character not valid in the CONCOR legal character set (A-Z 0-9 - + the comma character ()p the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found without one of the delimiter characters ( =) when the end of the command line was reached

ACTION TAKEN Parsing began again with the next user command statement line in the Dictionary Division command file

USER RESPONSE Check for possible keying errors and ensure that each of the characters in the string is a member of the CONCOR valid character set (A-Z 0-9 - + the comma character () he delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ())

ERROR (DD-004) COMMENT INDICATOR () IN STRING

EXPLANATION The comment indicator character () was found embedded in a string

Page 30: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …

2 31

SAUDI ARABIA

Shargi R Al-Shargi National Computer Center PO Box 2534 Riyadh Saudi Arabia

Ahmed S Al-Ghamdi Dept Director of National Computer Center PO Box 2534 Riyadh Saudi Arabia

OTHER

Robert W ONeal USREPJECOR APO New York NY 09038

John N Adams Data Processing Advisor DACCA Department of State Washington DC 20520

OR co UNDP PO Box 224 Dacca Bangladesh

Joe Quasney US Census Bureau Washington DC 20233

Howard Brunsman 5715 N Ninth St Arlington VA 22205

Larry Shiller and Julio Ortuzar Delta Systems Consultants Inc 264 Alhambra Circle Coral Cables FL 33134

Nicholas Ourusoff Burpee Hill Road New London NH 03257 (United Nations)

Frederic J Grant IV Systems Development GeorgiaWorld Congress Institute 580 North Omni International Atlanta Georgia 30303

3 32

Rafael Samper Mary Jo Keenan Catherine B Gleason LACDR Room 2246 NS Washington DC 20523

John Marshall AIDSERDMPSE Rm 706C SA-12 Washington DC 20523

Susanne Bacon and Mary K Friday ISPCData Processing US Census Bureau Washington DC 20233

Leo Dougherty ISPCWorld Census Staff US Bureau of the Census Washington DC 20233

33

CON C OR WORKSHOP

January 7-18 1979 Washington DC

Staff

Robert R Bair

Luis Garcia

David Malkovsky

Sandra Mansfield

Selma Sawaya

Vivian Toro

APPENDIX E

CONCOR EVALUATION FORM

0

APPENDIX E

CONCOR WORKSHOP January 16 1980

BASIC COBOL VERSION 2

December 1979 Release

Participant Evaluation of Package

1 What is your impression of this version of CONCOR in terms of its usefulness and reliability for editing housing and population census data in less developed countries

2 If you are familiar with previous versions of CONCOR how does this current version compare to them

2D

35

2

3 How would you compare the utility of this version of CONCOR for less developed countries with other systems for editing data of which you are familiar

4 How well do you feel the documentation (Reference Manual and Diagnostic Messages Guide) supports the software

36

3

5 How well do you think this version of CONCOR would serve for editing other types of statistical censuses or surveys

6 If your organization does statistical data processing by computer would you recommend to your superiors that this software be used to edit the data Do you feel assistance would be required in the installation of this software on your organizations computer andor in training userR on the packages applicability to their work

37

7 In what way(s) could improvements or enhancements be made to this version

of the CONCOR software andor its documentation

8 Other comments

1shy

APPENDIX F

NEWOLD COMMAND COMPARISONS

0 0

OLD CONCOR VERSION 1 DECEMBER 1978

DATA-DICTIONARY

DEFINE INPUT-FILE

DEFINE ERROR-FILE

DEFINE COMMON-DATA

(contains the questionnaire-IDand record type location information)

DEFINE RECORD-TYPE=

DEFINE NEW-DATA

DEFINE ARRAY-DATA

APPENDIX F

NEW CONCOR VERSION 2 DECEMBER 1979

DICTIONARY-DIVISION

DICTIONARY-ATTRIBUTES-SECTION

DICTIONARY-NAME

FILE-SECTION

INPUT-FILE OUTPUT-FILE WRITE-FILE ERROR-FILE

DOENTIFICATION-CONTROL-SECTION

AREA-CONTROL

QUESTIONNAIRE-CONTROL

RECORD-CONTROL

INPUT-RECORD-SECTION

DEFINE-RECORD

WORKING-DATA-SECTION

NEW-DATA ARRAY-DATA

END-DiVISION

COMMENTS

In Version 1 all command keyword labels had to begin in column 1 ofthe coding sheet The position notation of Version 2 permits thecommand to begin in columns 1-72

The period preceding a statement

signifies that it is but a commentcard Accordingly it should be noted that while the END-DIVISION command must be present the division name identifier DICTIONARY-DIVISION

has not been implemented

Note Some parameters for each commandhave been altered even where general correspondence exists

OLD CONCOR VERSION 1 DECEMBER 1978

EDIT-SPECIFICATION

PROLOG

FILTER TYPE ( )

EPILOG

ALLOCATE (ALL) ASSERT (AST)

END ERROR

FILTER TYPE ( ) WITH

IF LET

RANGE (RNG) RECODE (REC) STOP --keyword

UPDATE (UPD) WRITE (WRT) XRECODE (XREC)

NEW CONCOR VERSION 2 DECEMBER 1979

EXECUTION-DIVISION

REPORT-CONTROL-SECTION

COUNT-IMPUTES GENERATE-EDIT-STATISTICS

EDIT-SPECIFICATION-SECTION

PROLOG-ROUTINE

FILTER-ROUTINE (name-of-record-type)

EPILOG-ROUTINE

ALLOCATE (ALLOC) ASSERT (ASRT) DRECODE (DRCD) END-DIVISION

EXIT

GRECODE (GRCD) IF LET LIST OUTPUT RANGE (RNG)

STOP UNTIL UPDATE (UPD) WRITE (WRT)

END-DIVISION

COMMENTS

Periods preceding division and section names signify that they are treated as comments by CONCOR

END-DIVISION signifies the termination of the CONCOR division organization and must always be present even though division identifiers have not been implemented

This figure additionally permits a comparison of the EDIT-SPECI-FICATION commands of both CONCOR language versions Note the changes in abbreviations and that some keyword options (not shown) have been altered even where general corresponshydence exists Individual commands are discussed on the following pages

CONCOR LANGUAGE COMMAND STATEMENTS

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

ALLOCATE (ALL) ALLOCATE (ALLOC) Allows for the assignment of values and with the UPDATE command providesthe imputation mechanism Major difference is this statement now providesfor a receiving-identifier list and statistics are generated as coded inthe GENERATE-EDIT-STATISTICS command option of the EXECUTION-DIVISION

ASSERT fAST) ASSERT (ASRT) The ASSERT command is the basic consistency editor command of CONCOR Command now provides for a NOERROR and MESSAGE option Additionallyby utilizinq the DUMPR DUMPO keywords provides the user with the option of using all the current value of all user-identifiers defined for the record type being processed Other changes to this command include the pass end-pass fail end-fail constructions for the P purpose of maintaining structured logic

(see RECODE) DRECODE (DRCD) Provides a method for recoding data items having a starting value of zero and continue in ascending sequence Replaces t previous XRECODE command

EXIT Provides the means to terminate EXECUTION (ESCAPE) of command statements within the UNTIL command

END END-IF END-THEN

A solitary END command is no lonqer permitted in conditional construction Note the similarity to COBOL pseudocode

END-ELSE

ERROR not implemented No longer supported in language

(continued)

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

FILTER TYPE ( ) WITH not implemented No longer are option statements supported in language This command once specified essentially a GOTO depending on construction

(see XRECODE) GRECODE (GRCD) This new command provides a method for the group recoding of input values

IF IF The IF statement has properties found virtually in all programming languages ie provides a means of evaluating a condition and controlling the execution of alternative logical statements Similar to pseudocode a CONCOR programmer must utilize the structured command IF THEN END-THEN ELSE END-ELSE construction in place of the IF THEN END ELSE END statements acceptable to the 1978 version

LET LET Virtually unchanged

LIST New command which allows for the generation of specific messages to be displayed in the report of edit statistics by questionnaire as well as specific individual identifiers

OUTPUT New command which provides the means by which records will be written to the output-file

(continuej)

4)bull

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 VERSION 1

RANGE (RNG)

RECODE (REC)

STOP- keyword

UPDATE (UPD)

(continued)

DECEMBER 1979 VERSION 2

RANGE (RNG)

name changed

STOP- keyword

UNTIL DO END-DO

UPDATE (UPD)

COMMENTS

As a basic editing command of CONCOR the December 1979 version permits the listing of the out-of-range values as well as the entire record or questionnaire through DUMPR and DUMPO optionswithin the command Another new option of this command involves the utilization of a PASS END-PASS FAIL END-FAIL construction permitting the specification of logicalcommand paths depending upon the outcome of the range test

Name changed to DRECODE Virtually identical with the DRC command in the COCENTS system

Unchanged

DO-END-DO one of the most significant and powerful enhancements to the CONCOR language this command provides a loopingcapability similar to most other high-level pogramminglanguages The number of iterations is under user control as well as the value initially assigned to the user identifier (counter) which is incremented with each successive execution of the command statements

Virtually unchanged it is the basic mechanism for maintaining arrays involved in imputation processes

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

WRITE (WRT) WRITE (WRT) The WRITE language statement while once utilized as the primary output command is now used to create an auxiliary or derivative file Statistics may be accumulated with the specification of the GENERATE-EDIT-STATISTICS option of the EXECUTION-DIVISION (See WRITE-FILE of DICTIONARY-DIVISION)

XRECODE (XREC) name changed Old command which has become the GRECODE command in the December 1979 version

0 0 0

DECEMBER 1978 VERSION 1

DECEMBER 1979 VERSION 2

Not implemented REPORT-DIViSION

DISPLAY-CONTROL-SECTION

DISPLAY-EDIT-STATISTICS

TOLERANCE-CONTROL-SECTION

ERROR-RATE-CHECK REJECT-FILE

END-DIVISION

COMMENTS

This new division (note division and section names treated as comments) provides the means to controlthe type of edit statistics reports to be generated Generally all reports produced by the commands ofthis division are organized according to the AREA-CONTROL command of the DICTIONARY-DIVISION Notethat this means the input data file must be preproshycessed (presorted external to CONCOR) on the basisof the AREA-CONTROL data field as CONCOR is incapableof merging uncontiguous records

There is currently no command to enable users to specify unique report headings on the listingfrom this division

APPENDIX G

ISPC FUTURE ENHANCEMENT LIST

G-1

Appendix G

Future Versions of CONCOR Design Considerations

The present version of the CONCOR system (Basic COBOL Version 2) was designed to provide adequate computing power to properly edit housing and population census da a using automatic correction techniquns As a result CONCOR should not be construed to be the ultimate software package for generalized data editing The developers of this version of CONCOR realize that there is a group of improvements that can be made to the system which would facilitate the census data editing process A second group of changes or enhancements also exist that could facilitate the editing of survey or other types of census data It should be noted that some of the modifications to meet these two objectives at times are mutually exclusive Some of the possible modifications and considerations are outlined below

1 Housing and Population Census Processing

1 1 Software

1111 Dictionary Division

(1) Implementation of the Dictionary Attributes Section This would allow the user to specify the dictionary size (number of pages) on disk and control other file characteristics such as volume specification and retention periods if applicable

(2) Addition of an optional input andor output file(s) which could be used to initialize and save values stored in arrays and used during the allocation process This feature would require additional syntax analysis and the creation of new commands to perform the movement of the data values

(3) An output record format area (Output Record Section) could be added This would allow the creation of edited output files in a different format and length than the file read into CONCOR This implementation would require other enhanceients to provide a simple means of moving values from th-iT i tato tile output file Unique naming of user-identifiers would also need to be addressed Another possiblity is the declaration of both the input and output location in the same command when specifying the contents of tile data record to be edited

(4) Addition of user-identifier naires to fields used in the controlling of summary statistics and unique questionnaire determination This would require modification to the system level data dictionary

46 G-2

(5) Increasing the number of area and questionnaire control fields

This implementation would require modification lo the

dictionary as tell as to the error records passed to the Report

Division Another problem would be the formatting of the extra

data values in the report headings

(6) Permit the mixing of difFerent data item types in the CONTROL

commands This would allow the greatest flexibilty in choosing

control fields In order to implement this enhancement the

dictionary and the generated COBOL program (EDITOR) would

require modification

(7) Give the user access to the values in the control fields during

the editing process Either the user would be prohibited from

changing these values (as is currently done with CONCOR

internal identifiers) to prevent the creation of new questionnaires or the summary information gathered on a

questionnaire basis would become meaningless Another

ramification of this enhancement is how to handle fields defined as alphanumeric as CONCOR internally works in binary

(8) Allow input data records with a similar format to be defined by

the same DEFINE-RECORD command statement This could be

accomplished by permitting multiple values in the RECORD-TYPE

clause of the DEFINE-RECORD command A possible problem here

is in the determination of which value to be move to the output

record when the record is to be written out using the OUTPUT

command 0

(9) Implement a COMMION-DATA concept to handle items command to all

record types This could be costly in terms of core storage

and additional conversion time if a storage area is allocated

for each identifier for each record type Only permitting one

common area for all records would require validation of the

data to ensure the values were exactly the same on each record

This wJould require the CONCOR program to perform many more

internal checks when reading the data file into the store area

(10) Permit the selective inclusion of data items found in the

DEFINE-RECORD command statement into the universe count used

for the determination of tolerance levels when using the COUNT-IIPUTES command statement

(11) Increase the number and types of data format referencing now

permitted External numeric data (type N) could be expanded to

handle 18 digit numbers Also signed numeric and packed

decimal data format could be added The signed numeric format

would allow both leading blanks and negative data values This

format is essential ihen dealing with data files generated by

FORTRAN programs

0

+

G-3 47

(12) Increase the number of comparison strings permitted for alphanumeric data items This would allow for easier processing of alphanumeric strings but would require modifications to the system data dictionary

(13) If a range of valid values were specified with the declaration of a user-identifier in the DEFINE-RECORD command an automatic range check could be performed prior to CONCOR giving the user access to the data items on the questionnaire The inclusion of such values are now done by many users in the form of comment statements This enhancement would also facilitate the use of the dictionary by tabulation software

(14) The language format used in the Dictionary Division could be modified to recognize both the space and comma character as valid delimiters The user would still be required to code two commas to receive the system default value for any parameter

(15) Provide a repetition factor in the initialization of arrays

(16) Print the data dictionary name in the titles of the dictionary documentation or provide a means by which the user may specify a title to appear in the listings

112 Execution Division

(M) Implement the Run Control Section This would allow the user to increase execution speed of the EDITOR program This would be accomplished by eliminating some of the system protection features such as division by zero and out of bounds checking In this secion the user could also specify new maximum values for the EDITOR error limits

(2) Provide an internal identifier (OCCURRENCE-PTR) which would contain the occurrence number of the record currently being filtered The only problem to resolve is how to treat this id-ntifier outside of the Filter Routines

(3) Implement a CREATE commnand This command would allow the user to create a record that Was not found on the input file This implementation would require special care as it gives the user the ability to change the valme of a CONCOR internal identifier (TYPE-CGUNT) and modify the contents of the store Another consequence of this command is what action is to be taken in the calculation of tolerance in terms of this new records inclusion into the universe of observations and number of changes

iK

48 G-4

(4) The code generated in the EDITOR program for the DRECODE

command should be modified to utilized a lookup table approach

This would decrease the execution time of the command but woul C

require additional communication between the GENED and GENDD

programs

(5) Permit the user to continue the message text field over

successive lines of code

(6) Implementation of a SET command command that would change

CONCORs default occurrence pointer for the duration of the SET

command This command would be ever helpful as the validation

of the existence of a record would only have to be done once

when the SET command was encounterd The major drawback to

this command is in the users ability to remember the action

taken during the last execution of the command when the

Execution Division command statements may cover many pages of

code

(7) Implement a method of specifying different data formats for

fields on the WRITE command statement This would give greater

flexibility to the user but would require major revision to the

routines now used to process the WRITE command

(8) Provide the user with two sets of internal identifiers The

first set currently exists in the system and is valid on the

basis of the current control area (if specified) being

executed The second set would be accumulated on a run total

basis and would require the creation of new CONCOR internal

ident ifiers

(9) Implement automatic array declarations for capturing allocation

frequency distributions This could be done by the system for

discrete values (especially if point number 13 above is

implemented) but a mechanism for handling continuous values

would need to be designed nother program would then be

required to format the information gathered into a readable

form

113 Report Division

(I) Provide a means by which the user may specify their own headings or titles -for the reports

(2) Give the user the ability to override page ejection as a means

of saving paper

(3) Provide a neans by which the statistics gathered may be

presented in other aqgregations other than the area-break and

total levels now pro ided This would require another

procedure to aggregate the statistics and pass them onto the

Report Division before the printing of the reports began Thi

extra procedure is required because of program size

cons i derat ions

G-5

49

114 System Level

(1) Generation of assembler (ALC) language code for the EDITOR program to be used on the IBM 360370 series type computers

(2) Modification of the GENSRC program i utilize a lookup table instead of the linear search technioa2 now used

(3) Modification of the RDMSGTXT program to look at continuation lines when searching for the text to be supplied by the system

(4) Hlodification to the system data dictionary to better arrange and make available information needed most frequently Also to examine the possibility of making the section dealing with the message text and report generation a separate file Investigate the industry standards for dictionary format and see how CONCOR meets the requirements set forth

(5) Make the parsing and syntacial routines interactive so as to increase the product ity of the CONCOR user This does not mean to imply that the editing process itself should be made interactive but rather only the development of the user code should be put online

12 Documentation

(1)The development of a self-teaching CONCOR manual

(2) The development of a CONCOR Users Guide

G-6

0

50

2 Survey or other Census Processing

21 Software

(1) Make optional the specification of the record type and questionnaire identification information as some files are not

hierarchical in nature

(2) Provide another input file as a means of doing a check-in procedure of sample cases expected This new input file would contain the master list of those oberservations selected to be

examined

(3) Allow floating point calculations

(4) Provide a mechanism for referencing repetitive sections of questionnaire This referencing (loop letter approach) could be accomplished in the same fashion as interrecord checking is now implemented Note that by using this approach two versions of the software would be required One version would be as CONCOR is now implemented and the second would accept flat data files where the subscript indicated a repetitive section on the survey document

(5) Add a weighting scheme to allow meaningful totals and summary

statistics based upon the sampling frame used to gather the data being edited

(6) Increase the number of user-identifiers allowed in the system because of the more detailed information found on survey forms

(7) Provide a means of manual correction of erroneous data items This correction scheme would utilize the data dictionary and allow for correction based upon the user-identifer name and

questionnaire identification CELADE has a COBOL version of

this program but it has not been finished nor tested

42

APPENDIX H

CONCOR SYSTEM INTERNAL VARIABLES

0 APPENO H

CONCOR SYSTEM INTERNAL VARIABLES

1 EOF-FLAG

2 ERROR-TN-QUESTIONNAIRE

3 CURRENT-POINTER-VALUE

4 TYPE-COUNT ( )

5 RECORD-COUNT

6 QUESTIONNAIRE-COUNT

7 RECORDS-IN-STORE

8 INVALID-RECORD-TYPE-COUNT

9 INCOMPLETE-FLAG

10 CONTINUATION-FLAG

11 INVALID-RECORD-FLAG

EOF-FLAG

ERRORS-IN-QUESTIONNAIRE

TYPE-COUNT ( )

IN-RECORD-COUNT

IN-QUESTIONNAIRE-COUNT

RECORDS-IN-STORE

INVALID-RECORD-TYPE-COUNT

INCOMPLETE-FLAG

CONTINUATION-FLAG

OUT-OF-RANGE

NOT-NUMBER

BLANK

(new internal counters) OUT-RECORD-COUNT OUTPUT-QUEST-COUNT WRITE-RECORD-COUNT

Change in spelling of variable

Not mentioned--possible ommission from manual

Not implemented

Note Current documentation equivocates when some variables

are reset--most are initialized with each control area break

Though implemented as reserved identifiers due to poor documentation of (old) CONCOR exact values and utility were unknown

These apparently new counters permit access to valuable totals within control areas

Note These are not cumulative counters Also outputshyquest-count violates naming conversion eg in out

APPENDIX I

CONCOR-EDITOR EXECUTION STATISTICS

C 0 N C o R - E D I T O R EXECUT ION S-TA-T1 ST I CS RUN DATEt 517I

APPENDIX I

CONTROL AREA 0 INPUTS RELD OUTPUTS BY OUTPUTCOHMAND OUTPUTS BY W0ITECOMP4JD NO VALI 0 SEQUENCE NUMBER 00000 00 0 RECTYPES 0

OUESTIONNAIRES RECORDS 0 QUEST KEYWORD- 0 ECORDS- RECORDS FORQUFSTshy

bullDIVISION BY ZERO LINE NUMBER i18 to DIVISION BY ZERO LINE NUMBER 118 bo DIVISION BY ZERO 00 LINE NUMBER 118

Oboo DIVISION BY ZERO 00 LINE NUMBER 118 LTNE |UJMP 000 DIVISION BY ZERO 000 LINE NUMBER 118

bo DIVISION BY ZERO 0 LINE NUMBER 118 00 RUN ABORTED DUE TO DIVISION BY ZERO ERRORS 00 1it LIST IR003 = Ro3t

1 6 42

112 LIST

Comments One of the most sever criticisms 114 LIST IDIVISIION BY ZErO ERROR Cw FOLLOWS of the old CONCOR package was that ]l LET RO01 003 = O it would cease processing for no

apparent reason--ABEND without generating any messages The most )16 LIST iRO01 R003 SET TO ZEROS Pou19 Ron3 conmon types of cessations are division by zero array coordinate 117 LIST RO0 0 = errors and user specified terminations A program to deliberately 118 LET P003 = R002 RO01 test CONCORS ability to providediagnostics was written The system 119 LIST LIST R003 R002RO01 =1 R0031 perfoned exceptionally well and correctly provided the line numbers in the 120 source program which caused the data exceptions The program text has been 121 superimposed upon the Execution Statistics page for illustration purposes 122 END TEST OF OPERANDS AND MSG

123

124

1 5 PEGISTERS RESET TO ZERO

125 LET R001 RO00 R003 = nt

0 N C 1 - Z D I T 0 it - ( r S T A T I S T I C S QJi DAT 0119

I ~ - ~APPENDIX b)APPENDIX TL I A Cr14A) 01 3TUTS JRtTF -0 VtLIQ AP-A I PUT R - 3ITD JTS LY UTPUT - Y C

I)ESTIjAAIES - PECLRW cU] I S T igry-t QEC UP )S irrkqS F] ) T

VLLII= FF= 15( C 1 u U-1 Tr JJ TI A)[ 9 )i 1= TI =WATE ) L i - U = VI L IJ) L T J3r-= I n

D 1 4A 1 EItR P IXSTI J A 1 L = 7 II I fIAT E I i Y C

= y 4 I 19f CZ U AT -Z-2j 1ESTIJh14 E CJ)PI) IATE= 2 V L 1E= L T I P1 =UM 11

= rU F r

F R L- -- I JE TI rjA OrPR eTE= rUMiF=E II V1 LIIF= LIT V M

tYfZU ttj gR R QUIESTI3Ii1 j Vi r IATE= LI n l FJw== 0R V J L T C ( -pJIT E~Q - cUESTT Y2ZjI 1= g rfiflUI)IATL= VeLUJE= P LIf PP= 1 3)

bull~ - A ED 011 1 Tn CCy -tENCE C rGR DI T FOR UP S )r 0 JCCUJ0EICF FJ71 ECGPO

173

175 UNTIL ROO1 gt 10 VARY P001 FROM 1 By Comments

In this example RO01 a 2 dimensional array 177 DO with 10 rows and 10 columns is attempting

178 UNTIL R002 - 10 VARY BO FROM 1 BY 1 to update the smaller (5 X 5) TA2 Nonshy- Uexistent element references caused the error

The program text has been superimposed on this

page for illustration purposes

180 ALLOCATE NOOP = TA1 (RO0] ROO)I

JAI UPDATE TA2 (RO01 R002) RO031

182 LIST N002 = R0037 NOO R003

1P3 LIST VALUES OF TA1 = RO03v TAl (RO01 R002)

184 LIST

185 LIST ITA2=TA2 RO01 Rn02)l

l6 END-DOI LIST

187 END-DO

deg r- -- shy

APPENJDIXI1 J( j

CONTROL AREA INPUTS READ OUTPUTS BY OUTPUT LOMANO OUTPUTS TEltCOMMNDfN

SEQUENCE NUMBER 11T A QUESTIONNAIRES RECORDS QUEST KEYWORD RECORDS SFf- 0 T rr-

JOB STOPPED DUETO USER STOP-RUN COMMAND ON LINC 215 - - -- 2 15 _ _ 4$ A

EDITOR -- NO RMAL END OF J B r T I1)

214 IF IN-RECORD-COUNT gt Comments In this example when the internal variable1215 ITHEN STOP-RUN END-THEN was than

215 TE END-THEN IN-RECORD-COUNT was greater than 500 processing was directed by the source COBOL CONCOR language

216 1 program to cease The program text has been - upon this page for illustrationI _-EC superimposed

S IF IN-RECORD-COUNT- lt 2 - - purposes ~~~~~~~~~~- bull - - ------I-

218~ THEN -LIST lt(THAN 20RECS INR

amp-~~~~~ ntai-beDcent

I

APPENDIX J

DIAGNOSTIC MEFSAGE GUIDE EXAMPLE

APPENDIX J DD-2

WARNING(DD-001) BLANK COMMAND STATEMENT LINE

EXPLANATION A blank command statement line was found in the user Dictionary Division command statement file

ACTION TAKEN Parsing continued with the next command statement line in the Dictionary Division command file

USER RESPONSE Put a comment indicator () in column 1 of the command statement line if spacing is desired if not remove the command statement line

ERROR (DD-002) ILLEGAL FIRST CHARACTER IN STRING

EXPLANATION A character not in the CONCOR legal character set (A-Z 0-9 - + the comma character () the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found as the first character in the string

ACTION TAKEN Parsing began again with the next string

USER RESPONSE Check for possible keying errors and ensure that the first character is a member of the valid CONCOR

character set

ERROR (DD-003) END OF LINE REACHED WITH NO DELIMITER FOR ILLEGAL STRING

EXPLANATION A string starting with a character not valid in the CONCOR legal character set (A-Z 0-9 - + the comma character ()p the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found without one of the delimiter characters ( =) when the end of the command line was reached

ACTION TAKEN Parsing began again with the next user command statement line in the Dictionary Division command file

USER RESPONSE Check for possible keying errors and ensure that each of the characters in the string is a member of the CONCOR valid character set (A-Z 0-9 - + the comma character () he delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ())

ERROR (DD-004) COMMENT INDICATOR () IN STRING

EXPLANATION The comment indicator character () was found embedded in a string

Page 31: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …

3 32

Rafael Samper Mary Jo Keenan Catherine B Gleason LACDR Room 2246 NS Washington DC 20523

John Marshall AIDSERDMPSE Rm 706C SA-12 Washington DC 20523

Susanne Bacon and Mary K Friday ISPCData Processing US Census Bureau Washington DC 20233

Leo Dougherty ISPCWorld Census Staff US Bureau of the Census Washington DC 20233

33

CON C OR WORKSHOP

January 7-18 1979 Washington DC

Staff

Robert R Bair

Luis Garcia

David Malkovsky

Sandra Mansfield

Selma Sawaya

Vivian Toro

APPENDIX E

CONCOR EVALUATION FORM

0

APPENDIX E

CONCOR WORKSHOP January 16 1980

BASIC COBOL VERSION 2

December 1979 Release

Participant Evaluation of Package

1 What is your impression of this version of CONCOR in terms of its usefulness and reliability for editing housing and population census data in less developed countries

2 If you are familiar with previous versions of CONCOR how does this current version compare to them

2D

35

2

3 How would you compare the utility of this version of CONCOR for less developed countries with other systems for editing data of which you are familiar

4 How well do you feel the documentation (Reference Manual and Diagnostic Messages Guide) supports the software

36

3

5 How well do you think this version of CONCOR would serve for editing other types of statistical censuses or surveys

6 If your organization does statistical data processing by computer would you recommend to your superiors that this software be used to edit the data Do you feel assistance would be required in the installation of this software on your organizations computer andor in training userR on the packages applicability to their work

37

7 In what way(s) could improvements or enhancements be made to this version

of the CONCOR software andor its documentation

8 Other comments

1shy

APPENDIX F

NEWOLD COMMAND COMPARISONS

0 0

OLD CONCOR VERSION 1 DECEMBER 1978

DATA-DICTIONARY

DEFINE INPUT-FILE

DEFINE ERROR-FILE

DEFINE COMMON-DATA

(contains the questionnaire-IDand record type location information)

DEFINE RECORD-TYPE=

DEFINE NEW-DATA

DEFINE ARRAY-DATA

APPENDIX F

NEW CONCOR VERSION 2 DECEMBER 1979

DICTIONARY-DIVISION

DICTIONARY-ATTRIBUTES-SECTION

DICTIONARY-NAME

FILE-SECTION

INPUT-FILE OUTPUT-FILE WRITE-FILE ERROR-FILE

DOENTIFICATION-CONTROL-SECTION

AREA-CONTROL

QUESTIONNAIRE-CONTROL

RECORD-CONTROL

INPUT-RECORD-SECTION

DEFINE-RECORD

WORKING-DATA-SECTION

NEW-DATA ARRAY-DATA

END-DiVISION

COMMENTS

In Version 1 all command keyword labels had to begin in column 1 ofthe coding sheet The position notation of Version 2 permits thecommand to begin in columns 1-72

The period preceding a statement

signifies that it is but a commentcard Accordingly it should be noted that while the END-DIVISION command must be present the division name identifier DICTIONARY-DIVISION

has not been implemented

Note Some parameters for each commandhave been altered even where general correspondence exists

OLD CONCOR VERSION 1 DECEMBER 1978

EDIT-SPECIFICATION

PROLOG

FILTER TYPE ( )

EPILOG

ALLOCATE (ALL) ASSERT (AST)

END ERROR

FILTER TYPE ( ) WITH

IF LET

RANGE (RNG) RECODE (REC) STOP --keyword

UPDATE (UPD) WRITE (WRT) XRECODE (XREC)

NEW CONCOR VERSION 2 DECEMBER 1979

EXECUTION-DIVISION

REPORT-CONTROL-SECTION

COUNT-IMPUTES GENERATE-EDIT-STATISTICS

EDIT-SPECIFICATION-SECTION

PROLOG-ROUTINE

FILTER-ROUTINE (name-of-record-type)

EPILOG-ROUTINE

ALLOCATE (ALLOC) ASSERT (ASRT) DRECODE (DRCD) END-DIVISION

EXIT

GRECODE (GRCD) IF LET LIST OUTPUT RANGE (RNG)

STOP UNTIL UPDATE (UPD) WRITE (WRT)

END-DIVISION

COMMENTS

Periods preceding division and section names signify that they are treated as comments by CONCOR

END-DIVISION signifies the termination of the CONCOR division organization and must always be present even though division identifiers have not been implemented

This figure additionally permits a comparison of the EDIT-SPECI-FICATION commands of both CONCOR language versions Note the changes in abbreviations and that some keyword options (not shown) have been altered even where general corresponshydence exists Individual commands are discussed on the following pages

CONCOR LANGUAGE COMMAND STATEMENTS

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

ALLOCATE (ALL) ALLOCATE (ALLOC) Allows for the assignment of values and with the UPDATE command providesthe imputation mechanism Major difference is this statement now providesfor a receiving-identifier list and statistics are generated as coded inthe GENERATE-EDIT-STATISTICS command option of the EXECUTION-DIVISION

ASSERT fAST) ASSERT (ASRT) The ASSERT command is the basic consistency editor command of CONCOR Command now provides for a NOERROR and MESSAGE option Additionallyby utilizinq the DUMPR DUMPO keywords provides the user with the option of using all the current value of all user-identifiers defined for the record type being processed Other changes to this command include the pass end-pass fail end-fail constructions for the P purpose of maintaining structured logic

(see RECODE) DRECODE (DRCD) Provides a method for recoding data items having a starting value of zero and continue in ascending sequence Replaces t previous XRECODE command

EXIT Provides the means to terminate EXECUTION (ESCAPE) of command statements within the UNTIL command

END END-IF END-THEN

A solitary END command is no lonqer permitted in conditional construction Note the similarity to COBOL pseudocode

END-ELSE

ERROR not implemented No longer supported in language

(continued)

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

FILTER TYPE ( ) WITH not implemented No longer are option statements supported in language This command once specified essentially a GOTO depending on construction

(see XRECODE) GRECODE (GRCD) This new command provides a method for the group recoding of input values

IF IF The IF statement has properties found virtually in all programming languages ie provides a means of evaluating a condition and controlling the execution of alternative logical statements Similar to pseudocode a CONCOR programmer must utilize the structured command IF THEN END-THEN ELSE END-ELSE construction in place of the IF THEN END ELSE END statements acceptable to the 1978 version

LET LET Virtually unchanged

LIST New command which allows for the generation of specific messages to be displayed in the report of edit statistics by questionnaire as well as specific individual identifiers

OUTPUT New command which provides the means by which records will be written to the output-file

(continuej)

4)bull

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 VERSION 1

RANGE (RNG)

RECODE (REC)

STOP- keyword

UPDATE (UPD)

(continued)

DECEMBER 1979 VERSION 2

RANGE (RNG)

name changed

STOP- keyword

UNTIL DO END-DO

UPDATE (UPD)

COMMENTS

As a basic editing command of CONCOR the December 1979 version permits the listing of the out-of-range values as well as the entire record or questionnaire through DUMPR and DUMPO optionswithin the command Another new option of this command involves the utilization of a PASS END-PASS FAIL END-FAIL construction permitting the specification of logicalcommand paths depending upon the outcome of the range test

Name changed to DRECODE Virtually identical with the DRC command in the COCENTS system

Unchanged

DO-END-DO one of the most significant and powerful enhancements to the CONCOR language this command provides a loopingcapability similar to most other high-level pogramminglanguages The number of iterations is under user control as well as the value initially assigned to the user identifier (counter) which is incremented with each successive execution of the command statements

Virtually unchanged it is the basic mechanism for maintaining arrays involved in imputation processes

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

WRITE (WRT) WRITE (WRT) The WRITE language statement while once utilized as the primary output command is now used to create an auxiliary or derivative file Statistics may be accumulated with the specification of the GENERATE-EDIT-STATISTICS option of the EXECUTION-DIVISION (See WRITE-FILE of DICTIONARY-DIVISION)

XRECODE (XREC) name changed Old command which has become the GRECODE command in the December 1979 version

0 0 0

DECEMBER 1978 VERSION 1

DECEMBER 1979 VERSION 2

Not implemented REPORT-DIViSION

DISPLAY-CONTROL-SECTION

DISPLAY-EDIT-STATISTICS

TOLERANCE-CONTROL-SECTION

ERROR-RATE-CHECK REJECT-FILE

END-DIVISION

COMMENTS

This new division (note division and section names treated as comments) provides the means to controlthe type of edit statistics reports to be generated Generally all reports produced by the commands ofthis division are organized according to the AREA-CONTROL command of the DICTIONARY-DIVISION Notethat this means the input data file must be preproshycessed (presorted external to CONCOR) on the basisof the AREA-CONTROL data field as CONCOR is incapableof merging uncontiguous records

There is currently no command to enable users to specify unique report headings on the listingfrom this division

APPENDIX G

ISPC FUTURE ENHANCEMENT LIST

G-1

Appendix G

Future Versions of CONCOR Design Considerations

The present version of the CONCOR system (Basic COBOL Version 2) was designed to provide adequate computing power to properly edit housing and population census da a using automatic correction techniquns As a result CONCOR should not be construed to be the ultimate software package for generalized data editing The developers of this version of CONCOR realize that there is a group of improvements that can be made to the system which would facilitate the census data editing process A second group of changes or enhancements also exist that could facilitate the editing of survey or other types of census data It should be noted that some of the modifications to meet these two objectives at times are mutually exclusive Some of the possible modifications and considerations are outlined below

1 Housing and Population Census Processing

1 1 Software

1111 Dictionary Division

(1) Implementation of the Dictionary Attributes Section This would allow the user to specify the dictionary size (number of pages) on disk and control other file characteristics such as volume specification and retention periods if applicable

(2) Addition of an optional input andor output file(s) which could be used to initialize and save values stored in arrays and used during the allocation process This feature would require additional syntax analysis and the creation of new commands to perform the movement of the data values

(3) An output record format area (Output Record Section) could be added This would allow the creation of edited output files in a different format and length than the file read into CONCOR This implementation would require other enhanceients to provide a simple means of moving values from th-iT i tato tile output file Unique naming of user-identifiers would also need to be addressed Another possiblity is the declaration of both the input and output location in the same command when specifying the contents of tile data record to be edited

(4) Addition of user-identifier naires to fields used in the controlling of summary statistics and unique questionnaire determination This would require modification to the system level data dictionary

46 G-2

(5) Increasing the number of area and questionnaire control fields

This implementation would require modification lo the

dictionary as tell as to the error records passed to the Report

Division Another problem would be the formatting of the extra

data values in the report headings

(6) Permit the mixing of difFerent data item types in the CONTROL

commands This would allow the greatest flexibilty in choosing

control fields In order to implement this enhancement the

dictionary and the generated COBOL program (EDITOR) would

require modification

(7) Give the user access to the values in the control fields during

the editing process Either the user would be prohibited from

changing these values (as is currently done with CONCOR

internal identifiers) to prevent the creation of new questionnaires or the summary information gathered on a

questionnaire basis would become meaningless Another

ramification of this enhancement is how to handle fields defined as alphanumeric as CONCOR internally works in binary

(8) Allow input data records with a similar format to be defined by

the same DEFINE-RECORD command statement This could be

accomplished by permitting multiple values in the RECORD-TYPE

clause of the DEFINE-RECORD command A possible problem here

is in the determination of which value to be move to the output

record when the record is to be written out using the OUTPUT

command 0

(9) Implement a COMMION-DATA concept to handle items command to all

record types This could be costly in terms of core storage

and additional conversion time if a storage area is allocated

for each identifier for each record type Only permitting one

common area for all records would require validation of the

data to ensure the values were exactly the same on each record

This wJould require the CONCOR program to perform many more

internal checks when reading the data file into the store area

(10) Permit the selective inclusion of data items found in the

DEFINE-RECORD command statement into the universe count used

for the determination of tolerance levels when using the COUNT-IIPUTES command statement

(11) Increase the number and types of data format referencing now

permitted External numeric data (type N) could be expanded to

handle 18 digit numbers Also signed numeric and packed

decimal data format could be added The signed numeric format

would allow both leading blanks and negative data values This

format is essential ihen dealing with data files generated by

FORTRAN programs

0

+

G-3 47

(12) Increase the number of comparison strings permitted for alphanumeric data items This would allow for easier processing of alphanumeric strings but would require modifications to the system data dictionary

(13) If a range of valid values were specified with the declaration of a user-identifier in the DEFINE-RECORD command an automatic range check could be performed prior to CONCOR giving the user access to the data items on the questionnaire The inclusion of such values are now done by many users in the form of comment statements This enhancement would also facilitate the use of the dictionary by tabulation software

(14) The language format used in the Dictionary Division could be modified to recognize both the space and comma character as valid delimiters The user would still be required to code two commas to receive the system default value for any parameter

(15) Provide a repetition factor in the initialization of arrays

(16) Print the data dictionary name in the titles of the dictionary documentation or provide a means by which the user may specify a title to appear in the listings

112 Execution Division

(M) Implement the Run Control Section This would allow the user to increase execution speed of the EDITOR program This would be accomplished by eliminating some of the system protection features such as division by zero and out of bounds checking In this secion the user could also specify new maximum values for the EDITOR error limits

(2) Provide an internal identifier (OCCURRENCE-PTR) which would contain the occurrence number of the record currently being filtered The only problem to resolve is how to treat this id-ntifier outside of the Filter Routines

(3) Implement a CREATE commnand This command would allow the user to create a record that Was not found on the input file This implementation would require special care as it gives the user the ability to change the valme of a CONCOR internal identifier (TYPE-CGUNT) and modify the contents of the store Another consequence of this command is what action is to be taken in the calculation of tolerance in terms of this new records inclusion into the universe of observations and number of changes

iK

48 G-4

(4) The code generated in the EDITOR program for the DRECODE

command should be modified to utilized a lookup table approach

This would decrease the execution time of the command but woul C

require additional communication between the GENED and GENDD

programs

(5) Permit the user to continue the message text field over

successive lines of code

(6) Implementation of a SET command command that would change

CONCORs default occurrence pointer for the duration of the SET

command This command would be ever helpful as the validation

of the existence of a record would only have to be done once

when the SET command was encounterd The major drawback to

this command is in the users ability to remember the action

taken during the last execution of the command when the

Execution Division command statements may cover many pages of

code

(7) Implement a method of specifying different data formats for

fields on the WRITE command statement This would give greater

flexibility to the user but would require major revision to the

routines now used to process the WRITE command

(8) Provide the user with two sets of internal identifiers The

first set currently exists in the system and is valid on the

basis of the current control area (if specified) being

executed The second set would be accumulated on a run total

basis and would require the creation of new CONCOR internal

ident ifiers

(9) Implement automatic array declarations for capturing allocation

frequency distributions This could be done by the system for

discrete values (especially if point number 13 above is

implemented) but a mechanism for handling continuous values

would need to be designed nother program would then be

required to format the information gathered into a readable

form

113 Report Division

(I) Provide a means by which the user may specify their own headings or titles -for the reports

(2) Give the user the ability to override page ejection as a means

of saving paper

(3) Provide a neans by which the statistics gathered may be

presented in other aqgregations other than the area-break and

total levels now pro ided This would require another

procedure to aggregate the statistics and pass them onto the

Report Division before the printing of the reports began Thi

extra procedure is required because of program size

cons i derat ions

G-5

49

114 System Level

(1) Generation of assembler (ALC) language code for the EDITOR program to be used on the IBM 360370 series type computers

(2) Modification of the GENSRC program i utilize a lookup table instead of the linear search technioa2 now used

(3) Modification of the RDMSGTXT program to look at continuation lines when searching for the text to be supplied by the system

(4) Hlodification to the system data dictionary to better arrange and make available information needed most frequently Also to examine the possibility of making the section dealing with the message text and report generation a separate file Investigate the industry standards for dictionary format and see how CONCOR meets the requirements set forth

(5) Make the parsing and syntacial routines interactive so as to increase the product ity of the CONCOR user This does not mean to imply that the editing process itself should be made interactive but rather only the development of the user code should be put online

12 Documentation

(1)The development of a self-teaching CONCOR manual

(2) The development of a CONCOR Users Guide

G-6

0

50

2 Survey or other Census Processing

21 Software

(1) Make optional the specification of the record type and questionnaire identification information as some files are not

hierarchical in nature

(2) Provide another input file as a means of doing a check-in procedure of sample cases expected This new input file would contain the master list of those oberservations selected to be

examined

(3) Allow floating point calculations

(4) Provide a mechanism for referencing repetitive sections of questionnaire This referencing (loop letter approach) could be accomplished in the same fashion as interrecord checking is now implemented Note that by using this approach two versions of the software would be required One version would be as CONCOR is now implemented and the second would accept flat data files where the subscript indicated a repetitive section on the survey document

(5) Add a weighting scheme to allow meaningful totals and summary

statistics based upon the sampling frame used to gather the data being edited

(6) Increase the number of user-identifiers allowed in the system because of the more detailed information found on survey forms

(7) Provide a means of manual correction of erroneous data items This correction scheme would utilize the data dictionary and allow for correction based upon the user-identifer name and

questionnaire identification CELADE has a COBOL version of

this program but it has not been finished nor tested

42

APPENDIX H

CONCOR SYSTEM INTERNAL VARIABLES

0 APPENO H

CONCOR SYSTEM INTERNAL VARIABLES

1 EOF-FLAG

2 ERROR-TN-QUESTIONNAIRE

3 CURRENT-POINTER-VALUE

4 TYPE-COUNT ( )

5 RECORD-COUNT

6 QUESTIONNAIRE-COUNT

7 RECORDS-IN-STORE

8 INVALID-RECORD-TYPE-COUNT

9 INCOMPLETE-FLAG

10 CONTINUATION-FLAG

11 INVALID-RECORD-FLAG

EOF-FLAG

ERRORS-IN-QUESTIONNAIRE

TYPE-COUNT ( )

IN-RECORD-COUNT

IN-QUESTIONNAIRE-COUNT

RECORDS-IN-STORE

INVALID-RECORD-TYPE-COUNT

INCOMPLETE-FLAG

CONTINUATION-FLAG

OUT-OF-RANGE

NOT-NUMBER

BLANK

(new internal counters) OUT-RECORD-COUNT OUTPUT-QUEST-COUNT WRITE-RECORD-COUNT

Change in spelling of variable

Not mentioned--possible ommission from manual

Not implemented

Note Current documentation equivocates when some variables

are reset--most are initialized with each control area break

Though implemented as reserved identifiers due to poor documentation of (old) CONCOR exact values and utility were unknown

These apparently new counters permit access to valuable totals within control areas

Note These are not cumulative counters Also outputshyquest-count violates naming conversion eg in out

APPENDIX I

CONCOR-EDITOR EXECUTION STATISTICS

C 0 N C o R - E D I T O R EXECUT ION S-TA-T1 ST I CS RUN DATEt 517I

APPENDIX I

CONTROL AREA 0 INPUTS RELD OUTPUTS BY OUTPUTCOHMAND OUTPUTS BY W0ITECOMP4JD NO VALI 0 SEQUENCE NUMBER 00000 00 0 RECTYPES 0

OUESTIONNAIRES RECORDS 0 QUEST KEYWORD- 0 ECORDS- RECORDS FORQUFSTshy

bullDIVISION BY ZERO LINE NUMBER i18 to DIVISION BY ZERO LINE NUMBER 118 bo DIVISION BY ZERO 00 LINE NUMBER 118

Oboo DIVISION BY ZERO 00 LINE NUMBER 118 LTNE |UJMP 000 DIVISION BY ZERO 000 LINE NUMBER 118

bo DIVISION BY ZERO 0 LINE NUMBER 118 00 RUN ABORTED DUE TO DIVISION BY ZERO ERRORS 00 1it LIST IR003 = Ro3t

1 6 42

112 LIST

Comments One of the most sever criticisms 114 LIST IDIVISIION BY ZErO ERROR Cw FOLLOWS of the old CONCOR package was that ]l LET RO01 003 = O it would cease processing for no

apparent reason--ABEND without generating any messages The most )16 LIST iRO01 R003 SET TO ZEROS Pou19 Ron3 conmon types of cessations are division by zero array coordinate 117 LIST RO0 0 = errors and user specified terminations A program to deliberately 118 LET P003 = R002 RO01 test CONCORS ability to providediagnostics was written The system 119 LIST LIST R003 R002RO01 =1 R0031 perfoned exceptionally well and correctly provided the line numbers in the 120 source program which caused the data exceptions The program text has been 121 superimposed upon the Execution Statistics page for illustration purposes 122 END TEST OF OPERANDS AND MSG

123

124

1 5 PEGISTERS RESET TO ZERO

125 LET R001 RO00 R003 = nt

0 N C 1 - Z D I T 0 it - ( r S T A T I S T I C S QJi DAT 0119

I ~ - ~APPENDIX b)APPENDIX TL I A Cr14A) 01 3TUTS JRtTF -0 VtLIQ AP-A I PUT R - 3ITD JTS LY UTPUT - Y C

I)ESTIjAAIES - PECLRW cU] I S T igry-t QEC UP )S irrkqS F] ) T

VLLII= FF= 15( C 1 u U-1 Tr JJ TI A)[ 9 )i 1= TI =WATE ) L i - U = VI L IJ) L T J3r-= I n

D 1 4A 1 EItR P IXSTI J A 1 L = 7 II I fIAT E I i Y C

= y 4 I 19f CZ U AT -Z-2j 1ESTIJh14 E CJ)PI) IATE= 2 V L 1E= L T I P1 =UM 11

= rU F r

F R L- -- I JE TI rjA OrPR eTE= rUMiF=E II V1 LIIF= LIT V M

tYfZU ttj gR R QUIESTI3Ii1 j Vi r IATE= LI n l FJw== 0R V J L T C ( -pJIT E~Q - cUESTT Y2ZjI 1= g rfiflUI)IATL= VeLUJE= P LIf PP= 1 3)

bull~ - A ED 011 1 Tn CCy -tENCE C rGR DI T FOR UP S )r 0 JCCUJ0EICF FJ71 ECGPO

173

175 UNTIL ROO1 gt 10 VARY P001 FROM 1 By Comments

In this example RO01 a 2 dimensional array 177 DO with 10 rows and 10 columns is attempting

178 UNTIL R002 - 10 VARY BO FROM 1 BY 1 to update the smaller (5 X 5) TA2 Nonshy- Uexistent element references caused the error

The program text has been superimposed on this

page for illustration purposes

180 ALLOCATE NOOP = TA1 (RO0] ROO)I

JAI UPDATE TA2 (RO01 R002) RO031

182 LIST N002 = R0037 NOO R003

1P3 LIST VALUES OF TA1 = RO03v TAl (RO01 R002)

184 LIST

185 LIST ITA2=TA2 RO01 Rn02)l

l6 END-DOI LIST

187 END-DO

deg r- -- shy

APPENJDIXI1 J( j

CONTROL AREA INPUTS READ OUTPUTS BY OUTPUT LOMANO OUTPUTS TEltCOMMNDfN

SEQUENCE NUMBER 11T A QUESTIONNAIRES RECORDS QUEST KEYWORD RECORDS SFf- 0 T rr-

JOB STOPPED DUETO USER STOP-RUN COMMAND ON LINC 215 - - -- 2 15 _ _ 4$ A

EDITOR -- NO RMAL END OF J B r T I1)

214 IF IN-RECORD-COUNT gt Comments In this example when the internal variable1215 ITHEN STOP-RUN END-THEN was than

215 TE END-THEN IN-RECORD-COUNT was greater than 500 processing was directed by the source COBOL CONCOR language

216 1 program to cease The program text has been - upon this page for illustrationI _-EC superimposed

S IF IN-RECORD-COUNT- lt 2 - - purposes ~~~~~~~~~~- bull - - ------I-

218~ THEN -LIST lt(THAN 20RECS INR

amp-~~~~~ ntai-beDcent

I

APPENDIX J

DIAGNOSTIC MEFSAGE GUIDE EXAMPLE

APPENDIX J DD-2

WARNING(DD-001) BLANK COMMAND STATEMENT LINE

EXPLANATION A blank command statement line was found in the user Dictionary Division command statement file

ACTION TAKEN Parsing continued with the next command statement line in the Dictionary Division command file

USER RESPONSE Put a comment indicator () in column 1 of the command statement line if spacing is desired if not remove the command statement line

ERROR (DD-002) ILLEGAL FIRST CHARACTER IN STRING

EXPLANATION A character not in the CONCOR legal character set (A-Z 0-9 - + the comma character () the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found as the first character in the string

ACTION TAKEN Parsing began again with the next string

USER RESPONSE Check for possible keying errors and ensure that the first character is a member of the valid CONCOR

character set

ERROR (DD-003) END OF LINE REACHED WITH NO DELIMITER FOR ILLEGAL STRING

EXPLANATION A string starting with a character not valid in the CONCOR legal character set (A-Z 0-9 - + the comma character ()p the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found without one of the delimiter characters ( =) when the end of the command line was reached

ACTION TAKEN Parsing began again with the next user command statement line in the Dictionary Division command file

USER RESPONSE Check for possible keying errors and ensure that each of the characters in the string is a member of the CONCOR valid character set (A-Z 0-9 - + the comma character () he delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ())

ERROR (DD-004) COMMENT INDICATOR () IN STRING

EXPLANATION The comment indicator character () was found embedded in a string

Page 32: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …

33

CON C OR WORKSHOP

January 7-18 1979 Washington DC

Staff

Robert R Bair

Luis Garcia

David Malkovsky

Sandra Mansfield

Selma Sawaya

Vivian Toro

APPENDIX E

CONCOR EVALUATION FORM

0

APPENDIX E

CONCOR WORKSHOP January 16 1980

BASIC COBOL VERSION 2

December 1979 Release

Participant Evaluation of Package

1 What is your impression of this version of CONCOR in terms of its usefulness and reliability for editing housing and population census data in less developed countries

2 If you are familiar with previous versions of CONCOR how does this current version compare to them

2D

35

2

3 How would you compare the utility of this version of CONCOR for less developed countries with other systems for editing data of which you are familiar

4 How well do you feel the documentation (Reference Manual and Diagnostic Messages Guide) supports the software

36

3

5 How well do you think this version of CONCOR would serve for editing other types of statistical censuses or surveys

6 If your organization does statistical data processing by computer would you recommend to your superiors that this software be used to edit the data Do you feel assistance would be required in the installation of this software on your organizations computer andor in training userR on the packages applicability to their work

37

7 In what way(s) could improvements or enhancements be made to this version

of the CONCOR software andor its documentation

8 Other comments

1shy

APPENDIX F

NEWOLD COMMAND COMPARISONS

0 0

OLD CONCOR VERSION 1 DECEMBER 1978

DATA-DICTIONARY

DEFINE INPUT-FILE

DEFINE ERROR-FILE

DEFINE COMMON-DATA

(contains the questionnaire-IDand record type location information)

DEFINE RECORD-TYPE=

DEFINE NEW-DATA

DEFINE ARRAY-DATA

APPENDIX F

NEW CONCOR VERSION 2 DECEMBER 1979

DICTIONARY-DIVISION

DICTIONARY-ATTRIBUTES-SECTION

DICTIONARY-NAME

FILE-SECTION

INPUT-FILE OUTPUT-FILE WRITE-FILE ERROR-FILE

DOENTIFICATION-CONTROL-SECTION

AREA-CONTROL

QUESTIONNAIRE-CONTROL

RECORD-CONTROL

INPUT-RECORD-SECTION

DEFINE-RECORD

WORKING-DATA-SECTION

NEW-DATA ARRAY-DATA

END-DiVISION

COMMENTS

In Version 1 all command keyword labels had to begin in column 1 ofthe coding sheet The position notation of Version 2 permits thecommand to begin in columns 1-72

The period preceding a statement

signifies that it is but a commentcard Accordingly it should be noted that while the END-DIVISION command must be present the division name identifier DICTIONARY-DIVISION

has not been implemented

Note Some parameters for each commandhave been altered even where general correspondence exists

OLD CONCOR VERSION 1 DECEMBER 1978

EDIT-SPECIFICATION

PROLOG

FILTER TYPE ( )

EPILOG

ALLOCATE (ALL) ASSERT (AST)

END ERROR

FILTER TYPE ( ) WITH

IF LET

RANGE (RNG) RECODE (REC) STOP --keyword

UPDATE (UPD) WRITE (WRT) XRECODE (XREC)

NEW CONCOR VERSION 2 DECEMBER 1979

EXECUTION-DIVISION

REPORT-CONTROL-SECTION

COUNT-IMPUTES GENERATE-EDIT-STATISTICS

EDIT-SPECIFICATION-SECTION

PROLOG-ROUTINE

FILTER-ROUTINE (name-of-record-type)

EPILOG-ROUTINE

ALLOCATE (ALLOC) ASSERT (ASRT) DRECODE (DRCD) END-DIVISION

EXIT

GRECODE (GRCD) IF LET LIST OUTPUT RANGE (RNG)

STOP UNTIL UPDATE (UPD) WRITE (WRT)

END-DIVISION

COMMENTS

Periods preceding division and section names signify that they are treated as comments by CONCOR

END-DIVISION signifies the termination of the CONCOR division organization and must always be present even though division identifiers have not been implemented

This figure additionally permits a comparison of the EDIT-SPECI-FICATION commands of both CONCOR language versions Note the changes in abbreviations and that some keyword options (not shown) have been altered even where general corresponshydence exists Individual commands are discussed on the following pages

CONCOR LANGUAGE COMMAND STATEMENTS

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

ALLOCATE (ALL) ALLOCATE (ALLOC) Allows for the assignment of values and with the UPDATE command providesthe imputation mechanism Major difference is this statement now providesfor a receiving-identifier list and statistics are generated as coded inthe GENERATE-EDIT-STATISTICS command option of the EXECUTION-DIVISION

ASSERT fAST) ASSERT (ASRT) The ASSERT command is the basic consistency editor command of CONCOR Command now provides for a NOERROR and MESSAGE option Additionallyby utilizinq the DUMPR DUMPO keywords provides the user with the option of using all the current value of all user-identifiers defined for the record type being processed Other changes to this command include the pass end-pass fail end-fail constructions for the P purpose of maintaining structured logic

(see RECODE) DRECODE (DRCD) Provides a method for recoding data items having a starting value of zero and continue in ascending sequence Replaces t previous XRECODE command

EXIT Provides the means to terminate EXECUTION (ESCAPE) of command statements within the UNTIL command

END END-IF END-THEN

A solitary END command is no lonqer permitted in conditional construction Note the similarity to COBOL pseudocode

END-ELSE

ERROR not implemented No longer supported in language

(continued)

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

FILTER TYPE ( ) WITH not implemented No longer are option statements supported in language This command once specified essentially a GOTO depending on construction

(see XRECODE) GRECODE (GRCD) This new command provides a method for the group recoding of input values

IF IF The IF statement has properties found virtually in all programming languages ie provides a means of evaluating a condition and controlling the execution of alternative logical statements Similar to pseudocode a CONCOR programmer must utilize the structured command IF THEN END-THEN ELSE END-ELSE construction in place of the IF THEN END ELSE END statements acceptable to the 1978 version

LET LET Virtually unchanged

LIST New command which allows for the generation of specific messages to be displayed in the report of edit statistics by questionnaire as well as specific individual identifiers

OUTPUT New command which provides the means by which records will be written to the output-file

(continuej)

4)bull

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 VERSION 1

RANGE (RNG)

RECODE (REC)

STOP- keyword

UPDATE (UPD)

(continued)

DECEMBER 1979 VERSION 2

RANGE (RNG)

name changed

STOP- keyword

UNTIL DO END-DO

UPDATE (UPD)

COMMENTS

As a basic editing command of CONCOR the December 1979 version permits the listing of the out-of-range values as well as the entire record or questionnaire through DUMPR and DUMPO optionswithin the command Another new option of this command involves the utilization of a PASS END-PASS FAIL END-FAIL construction permitting the specification of logicalcommand paths depending upon the outcome of the range test

Name changed to DRECODE Virtually identical with the DRC command in the COCENTS system

Unchanged

DO-END-DO one of the most significant and powerful enhancements to the CONCOR language this command provides a loopingcapability similar to most other high-level pogramminglanguages The number of iterations is under user control as well as the value initially assigned to the user identifier (counter) which is incremented with each successive execution of the command statements

Virtually unchanged it is the basic mechanism for maintaining arrays involved in imputation processes

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

WRITE (WRT) WRITE (WRT) The WRITE language statement while once utilized as the primary output command is now used to create an auxiliary or derivative file Statistics may be accumulated with the specification of the GENERATE-EDIT-STATISTICS option of the EXECUTION-DIVISION (See WRITE-FILE of DICTIONARY-DIVISION)

XRECODE (XREC) name changed Old command which has become the GRECODE command in the December 1979 version

0 0 0

DECEMBER 1978 VERSION 1

DECEMBER 1979 VERSION 2

Not implemented REPORT-DIViSION

DISPLAY-CONTROL-SECTION

DISPLAY-EDIT-STATISTICS

TOLERANCE-CONTROL-SECTION

ERROR-RATE-CHECK REJECT-FILE

END-DIVISION

COMMENTS

This new division (note division and section names treated as comments) provides the means to controlthe type of edit statistics reports to be generated Generally all reports produced by the commands ofthis division are organized according to the AREA-CONTROL command of the DICTIONARY-DIVISION Notethat this means the input data file must be preproshycessed (presorted external to CONCOR) on the basisof the AREA-CONTROL data field as CONCOR is incapableof merging uncontiguous records

There is currently no command to enable users to specify unique report headings on the listingfrom this division

APPENDIX G

ISPC FUTURE ENHANCEMENT LIST

G-1

Appendix G

Future Versions of CONCOR Design Considerations

The present version of the CONCOR system (Basic COBOL Version 2) was designed to provide adequate computing power to properly edit housing and population census da a using automatic correction techniquns As a result CONCOR should not be construed to be the ultimate software package for generalized data editing The developers of this version of CONCOR realize that there is a group of improvements that can be made to the system which would facilitate the census data editing process A second group of changes or enhancements also exist that could facilitate the editing of survey or other types of census data It should be noted that some of the modifications to meet these two objectives at times are mutually exclusive Some of the possible modifications and considerations are outlined below

1 Housing and Population Census Processing

1 1 Software

1111 Dictionary Division

(1) Implementation of the Dictionary Attributes Section This would allow the user to specify the dictionary size (number of pages) on disk and control other file characteristics such as volume specification and retention periods if applicable

(2) Addition of an optional input andor output file(s) which could be used to initialize and save values stored in arrays and used during the allocation process This feature would require additional syntax analysis and the creation of new commands to perform the movement of the data values

(3) An output record format area (Output Record Section) could be added This would allow the creation of edited output files in a different format and length than the file read into CONCOR This implementation would require other enhanceients to provide a simple means of moving values from th-iT i tato tile output file Unique naming of user-identifiers would also need to be addressed Another possiblity is the declaration of both the input and output location in the same command when specifying the contents of tile data record to be edited

(4) Addition of user-identifier naires to fields used in the controlling of summary statistics and unique questionnaire determination This would require modification to the system level data dictionary

46 G-2

(5) Increasing the number of area and questionnaire control fields

This implementation would require modification lo the

dictionary as tell as to the error records passed to the Report

Division Another problem would be the formatting of the extra

data values in the report headings

(6) Permit the mixing of difFerent data item types in the CONTROL

commands This would allow the greatest flexibilty in choosing

control fields In order to implement this enhancement the

dictionary and the generated COBOL program (EDITOR) would

require modification

(7) Give the user access to the values in the control fields during

the editing process Either the user would be prohibited from

changing these values (as is currently done with CONCOR

internal identifiers) to prevent the creation of new questionnaires or the summary information gathered on a

questionnaire basis would become meaningless Another

ramification of this enhancement is how to handle fields defined as alphanumeric as CONCOR internally works in binary

(8) Allow input data records with a similar format to be defined by

the same DEFINE-RECORD command statement This could be

accomplished by permitting multiple values in the RECORD-TYPE

clause of the DEFINE-RECORD command A possible problem here

is in the determination of which value to be move to the output

record when the record is to be written out using the OUTPUT

command 0

(9) Implement a COMMION-DATA concept to handle items command to all

record types This could be costly in terms of core storage

and additional conversion time if a storage area is allocated

for each identifier for each record type Only permitting one

common area for all records would require validation of the

data to ensure the values were exactly the same on each record

This wJould require the CONCOR program to perform many more

internal checks when reading the data file into the store area

(10) Permit the selective inclusion of data items found in the

DEFINE-RECORD command statement into the universe count used

for the determination of tolerance levels when using the COUNT-IIPUTES command statement

(11) Increase the number and types of data format referencing now

permitted External numeric data (type N) could be expanded to

handle 18 digit numbers Also signed numeric and packed

decimal data format could be added The signed numeric format

would allow both leading blanks and negative data values This

format is essential ihen dealing with data files generated by

FORTRAN programs

0

+

G-3 47

(12) Increase the number of comparison strings permitted for alphanumeric data items This would allow for easier processing of alphanumeric strings but would require modifications to the system data dictionary

(13) If a range of valid values were specified with the declaration of a user-identifier in the DEFINE-RECORD command an automatic range check could be performed prior to CONCOR giving the user access to the data items on the questionnaire The inclusion of such values are now done by many users in the form of comment statements This enhancement would also facilitate the use of the dictionary by tabulation software

(14) The language format used in the Dictionary Division could be modified to recognize both the space and comma character as valid delimiters The user would still be required to code two commas to receive the system default value for any parameter

(15) Provide a repetition factor in the initialization of arrays

(16) Print the data dictionary name in the titles of the dictionary documentation or provide a means by which the user may specify a title to appear in the listings

112 Execution Division

(M) Implement the Run Control Section This would allow the user to increase execution speed of the EDITOR program This would be accomplished by eliminating some of the system protection features such as division by zero and out of bounds checking In this secion the user could also specify new maximum values for the EDITOR error limits

(2) Provide an internal identifier (OCCURRENCE-PTR) which would contain the occurrence number of the record currently being filtered The only problem to resolve is how to treat this id-ntifier outside of the Filter Routines

(3) Implement a CREATE commnand This command would allow the user to create a record that Was not found on the input file This implementation would require special care as it gives the user the ability to change the valme of a CONCOR internal identifier (TYPE-CGUNT) and modify the contents of the store Another consequence of this command is what action is to be taken in the calculation of tolerance in terms of this new records inclusion into the universe of observations and number of changes

iK

48 G-4

(4) The code generated in the EDITOR program for the DRECODE

command should be modified to utilized a lookup table approach

This would decrease the execution time of the command but woul C

require additional communication between the GENED and GENDD

programs

(5) Permit the user to continue the message text field over

successive lines of code

(6) Implementation of a SET command command that would change

CONCORs default occurrence pointer for the duration of the SET

command This command would be ever helpful as the validation

of the existence of a record would only have to be done once

when the SET command was encounterd The major drawback to

this command is in the users ability to remember the action

taken during the last execution of the command when the

Execution Division command statements may cover many pages of

code

(7) Implement a method of specifying different data formats for

fields on the WRITE command statement This would give greater

flexibility to the user but would require major revision to the

routines now used to process the WRITE command

(8) Provide the user with two sets of internal identifiers The

first set currently exists in the system and is valid on the

basis of the current control area (if specified) being

executed The second set would be accumulated on a run total

basis and would require the creation of new CONCOR internal

ident ifiers

(9) Implement automatic array declarations for capturing allocation

frequency distributions This could be done by the system for

discrete values (especially if point number 13 above is

implemented) but a mechanism for handling continuous values

would need to be designed nother program would then be

required to format the information gathered into a readable

form

113 Report Division

(I) Provide a means by which the user may specify their own headings or titles -for the reports

(2) Give the user the ability to override page ejection as a means

of saving paper

(3) Provide a neans by which the statistics gathered may be

presented in other aqgregations other than the area-break and

total levels now pro ided This would require another

procedure to aggregate the statistics and pass them onto the

Report Division before the printing of the reports began Thi

extra procedure is required because of program size

cons i derat ions

G-5

49

114 System Level

(1) Generation of assembler (ALC) language code for the EDITOR program to be used on the IBM 360370 series type computers

(2) Modification of the GENSRC program i utilize a lookup table instead of the linear search technioa2 now used

(3) Modification of the RDMSGTXT program to look at continuation lines when searching for the text to be supplied by the system

(4) Hlodification to the system data dictionary to better arrange and make available information needed most frequently Also to examine the possibility of making the section dealing with the message text and report generation a separate file Investigate the industry standards for dictionary format and see how CONCOR meets the requirements set forth

(5) Make the parsing and syntacial routines interactive so as to increase the product ity of the CONCOR user This does not mean to imply that the editing process itself should be made interactive but rather only the development of the user code should be put online

12 Documentation

(1)The development of a self-teaching CONCOR manual

(2) The development of a CONCOR Users Guide

G-6

0

50

2 Survey or other Census Processing

21 Software

(1) Make optional the specification of the record type and questionnaire identification information as some files are not

hierarchical in nature

(2) Provide another input file as a means of doing a check-in procedure of sample cases expected This new input file would contain the master list of those oberservations selected to be

examined

(3) Allow floating point calculations

(4) Provide a mechanism for referencing repetitive sections of questionnaire This referencing (loop letter approach) could be accomplished in the same fashion as interrecord checking is now implemented Note that by using this approach two versions of the software would be required One version would be as CONCOR is now implemented and the second would accept flat data files where the subscript indicated a repetitive section on the survey document

(5) Add a weighting scheme to allow meaningful totals and summary

statistics based upon the sampling frame used to gather the data being edited

(6) Increase the number of user-identifiers allowed in the system because of the more detailed information found on survey forms

(7) Provide a means of manual correction of erroneous data items This correction scheme would utilize the data dictionary and allow for correction based upon the user-identifer name and

questionnaire identification CELADE has a COBOL version of

this program but it has not been finished nor tested

42

APPENDIX H

CONCOR SYSTEM INTERNAL VARIABLES

0 APPENO H

CONCOR SYSTEM INTERNAL VARIABLES

1 EOF-FLAG

2 ERROR-TN-QUESTIONNAIRE

3 CURRENT-POINTER-VALUE

4 TYPE-COUNT ( )

5 RECORD-COUNT

6 QUESTIONNAIRE-COUNT

7 RECORDS-IN-STORE

8 INVALID-RECORD-TYPE-COUNT

9 INCOMPLETE-FLAG

10 CONTINUATION-FLAG

11 INVALID-RECORD-FLAG

EOF-FLAG

ERRORS-IN-QUESTIONNAIRE

TYPE-COUNT ( )

IN-RECORD-COUNT

IN-QUESTIONNAIRE-COUNT

RECORDS-IN-STORE

INVALID-RECORD-TYPE-COUNT

INCOMPLETE-FLAG

CONTINUATION-FLAG

OUT-OF-RANGE

NOT-NUMBER

BLANK

(new internal counters) OUT-RECORD-COUNT OUTPUT-QUEST-COUNT WRITE-RECORD-COUNT

Change in spelling of variable

Not mentioned--possible ommission from manual

Not implemented

Note Current documentation equivocates when some variables

are reset--most are initialized with each control area break

Though implemented as reserved identifiers due to poor documentation of (old) CONCOR exact values and utility were unknown

These apparently new counters permit access to valuable totals within control areas

Note These are not cumulative counters Also outputshyquest-count violates naming conversion eg in out

APPENDIX I

CONCOR-EDITOR EXECUTION STATISTICS

C 0 N C o R - E D I T O R EXECUT ION S-TA-T1 ST I CS RUN DATEt 517I

APPENDIX I

CONTROL AREA 0 INPUTS RELD OUTPUTS BY OUTPUTCOHMAND OUTPUTS BY W0ITECOMP4JD NO VALI 0 SEQUENCE NUMBER 00000 00 0 RECTYPES 0

OUESTIONNAIRES RECORDS 0 QUEST KEYWORD- 0 ECORDS- RECORDS FORQUFSTshy

bullDIVISION BY ZERO LINE NUMBER i18 to DIVISION BY ZERO LINE NUMBER 118 bo DIVISION BY ZERO 00 LINE NUMBER 118

Oboo DIVISION BY ZERO 00 LINE NUMBER 118 LTNE |UJMP 000 DIVISION BY ZERO 000 LINE NUMBER 118

bo DIVISION BY ZERO 0 LINE NUMBER 118 00 RUN ABORTED DUE TO DIVISION BY ZERO ERRORS 00 1it LIST IR003 = Ro3t

1 6 42

112 LIST

Comments One of the most sever criticisms 114 LIST IDIVISIION BY ZErO ERROR Cw FOLLOWS of the old CONCOR package was that ]l LET RO01 003 = O it would cease processing for no

apparent reason--ABEND without generating any messages The most )16 LIST iRO01 R003 SET TO ZEROS Pou19 Ron3 conmon types of cessations are division by zero array coordinate 117 LIST RO0 0 = errors and user specified terminations A program to deliberately 118 LET P003 = R002 RO01 test CONCORS ability to providediagnostics was written The system 119 LIST LIST R003 R002RO01 =1 R0031 perfoned exceptionally well and correctly provided the line numbers in the 120 source program which caused the data exceptions The program text has been 121 superimposed upon the Execution Statistics page for illustration purposes 122 END TEST OF OPERANDS AND MSG

123

124

1 5 PEGISTERS RESET TO ZERO

125 LET R001 RO00 R003 = nt

0 N C 1 - Z D I T 0 it - ( r S T A T I S T I C S QJi DAT 0119

I ~ - ~APPENDIX b)APPENDIX TL I A Cr14A) 01 3TUTS JRtTF -0 VtLIQ AP-A I PUT R - 3ITD JTS LY UTPUT - Y C

I)ESTIjAAIES - PECLRW cU] I S T igry-t QEC UP )S irrkqS F] ) T

VLLII= FF= 15( C 1 u U-1 Tr JJ TI A)[ 9 )i 1= TI =WATE ) L i - U = VI L IJ) L T J3r-= I n

D 1 4A 1 EItR P IXSTI J A 1 L = 7 II I fIAT E I i Y C

= y 4 I 19f CZ U AT -Z-2j 1ESTIJh14 E CJ)PI) IATE= 2 V L 1E= L T I P1 =UM 11

= rU F r

F R L- -- I JE TI rjA OrPR eTE= rUMiF=E II V1 LIIF= LIT V M

tYfZU ttj gR R QUIESTI3Ii1 j Vi r IATE= LI n l FJw== 0R V J L T C ( -pJIT E~Q - cUESTT Y2ZjI 1= g rfiflUI)IATL= VeLUJE= P LIf PP= 1 3)

bull~ - A ED 011 1 Tn CCy -tENCE C rGR DI T FOR UP S )r 0 JCCUJ0EICF FJ71 ECGPO

173

175 UNTIL ROO1 gt 10 VARY P001 FROM 1 By Comments

In this example RO01 a 2 dimensional array 177 DO with 10 rows and 10 columns is attempting

178 UNTIL R002 - 10 VARY BO FROM 1 BY 1 to update the smaller (5 X 5) TA2 Nonshy- Uexistent element references caused the error

The program text has been superimposed on this

page for illustration purposes

180 ALLOCATE NOOP = TA1 (RO0] ROO)I

JAI UPDATE TA2 (RO01 R002) RO031

182 LIST N002 = R0037 NOO R003

1P3 LIST VALUES OF TA1 = RO03v TAl (RO01 R002)

184 LIST

185 LIST ITA2=TA2 RO01 Rn02)l

l6 END-DOI LIST

187 END-DO

deg r- -- shy

APPENJDIXI1 J( j

CONTROL AREA INPUTS READ OUTPUTS BY OUTPUT LOMANO OUTPUTS TEltCOMMNDfN

SEQUENCE NUMBER 11T A QUESTIONNAIRES RECORDS QUEST KEYWORD RECORDS SFf- 0 T rr-

JOB STOPPED DUETO USER STOP-RUN COMMAND ON LINC 215 - - -- 2 15 _ _ 4$ A

EDITOR -- NO RMAL END OF J B r T I1)

214 IF IN-RECORD-COUNT gt Comments In this example when the internal variable1215 ITHEN STOP-RUN END-THEN was than

215 TE END-THEN IN-RECORD-COUNT was greater than 500 processing was directed by the source COBOL CONCOR language

216 1 program to cease The program text has been - upon this page for illustrationI _-EC superimposed

S IF IN-RECORD-COUNT- lt 2 - - purposes ~~~~~~~~~~- bull - - ------I-

218~ THEN -LIST lt(THAN 20RECS INR

amp-~~~~~ ntai-beDcent

I

APPENDIX J

DIAGNOSTIC MEFSAGE GUIDE EXAMPLE

APPENDIX J DD-2

WARNING(DD-001) BLANK COMMAND STATEMENT LINE

EXPLANATION A blank command statement line was found in the user Dictionary Division command statement file

ACTION TAKEN Parsing continued with the next command statement line in the Dictionary Division command file

USER RESPONSE Put a comment indicator () in column 1 of the command statement line if spacing is desired if not remove the command statement line

ERROR (DD-002) ILLEGAL FIRST CHARACTER IN STRING

EXPLANATION A character not in the CONCOR legal character set (A-Z 0-9 - + the comma character () the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found as the first character in the string

ACTION TAKEN Parsing began again with the next string

USER RESPONSE Check for possible keying errors and ensure that the first character is a member of the valid CONCOR

character set

ERROR (DD-003) END OF LINE REACHED WITH NO DELIMITER FOR ILLEGAL STRING

EXPLANATION A string starting with a character not valid in the CONCOR legal character set (A-Z 0-9 - + the comma character ()p the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found without one of the delimiter characters ( =) when the end of the command line was reached

ACTION TAKEN Parsing began again with the next user command statement line in the Dictionary Division command file

USER RESPONSE Check for possible keying errors and ensure that each of the characters in the string is a member of the CONCOR valid character set (A-Z 0-9 - + the comma character () he delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ())

ERROR (DD-004) COMMENT INDICATOR () IN STRING

EXPLANATION The comment indicator character () was found embedded in a string

Page 33: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …

APPENDIX E

CONCOR EVALUATION FORM

0

APPENDIX E

CONCOR WORKSHOP January 16 1980

BASIC COBOL VERSION 2

December 1979 Release

Participant Evaluation of Package

1 What is your impression of this version of CONCOR in terms of its usefulness and reliability for editing housing and population census data in less developed countries

2 If you are familiar with previous versions of CONCOR how does this current version compare to them

2D

35

2

3 How would you compare the utility of this version of CONCOR for less developed countries with other systems for editing data of which you are familiar

4 How well do you feel the documentation (Reference Manual and Diagnostic Messages Guide) supports the software

36

3

5 How well do you think this version of CONCOR would serve for editing other types of statistical censuses or surveys

6 If your organization does statistical data processing by computer would you recommend to your superiors that this software be used to edit the data Do you feel assistance would be required in the installation of this software on your organizations computer andor in training userR on the packages applicability to their work

37

7 In what way(s) could improvements or enhancements be made to this version

of the CONCOR software andor its documentation

8 Other comments

1shy

APPENDIX F

NEWOLD COMMAND COMPARISONS

0 0

OLD CONCOR VERSION 1 DECEMBER 1978

DATA-DICTIONARY

DEFINE INPUT-FILE

DEFINE ERROR-FILE

DEFINE COMMON-DATA

(contains the questionnaire-IDand record type location information)

DEFINE RECORD-TYPE=

DEFINE NEW-DATA

DEFINE ARRAY-DATA

APPENDIX F

NEW CONCOR VERSION 2 DECEMBER 1979

DICTIONARY-DIVISION

DICTIONARY-ATTRIBUTES-SECTION

DICTIONARY-NAME

FILE-SECTION

INPUT-FILE OUTPUT-FILE WRITE-FILE ERROR-FILE

DOENTIFICATION-CONTROL-SECTION

AREA-CONTROL

QUESTIONNAIRE-CONTROL

RECORD-CONTROL

INPUT-RECORD-SECTION

DEFINE-RECORD

WORKING-DATA-SECTION

NEW-DATA ARRAY-DATA

END-DiVISION

COMMENTS

In Version 1 all command keyword labels had to begin in column 1 ofthe coding sheet The position notation of Version 2 permits thecommand to begin in columns 1-72

The period preceding a statement

signifies that it is but a commentcard Accordingly it should be noted that while the END-DIVISION command must be present the division name identifier DICTIONARY-DIVISION

has not been implemented

Note Some parameters for each commandhave been altered even where general correspondence exists

OLD CONCOR VERSION 1 DECEMBER 1978

EDIT-SPECIFICATION

PROLOG

FILTER TYPE ( )

EPILOG

ALLOCATE (ALL) ASSERT (AST)

END ERROR

FILTER TYPE ( ) WITH

IF LET

RANGE (RNG) RECODE (REC) STOP --keyword

UPDATE (UPD) WRITE (WRT) XRECODE (XREC)

NEW CONCOR VERSION 2 DECEMBER 1979

EXECUTION-DIVISION

REPORT-CONTROL-SECTION

COUNT-IMPUTES GENERATE-EDIT-STATISTICS

EDIT-SPECIFICATION-SECTION

PROLOG-ROUTINE

FILTER-ROUTINE (name-of-record-type)

EPILOG-ROUTINE

ALLOCATE (ALLOC) ASSERT (ASRT) DRECODE (DRCD) END-DIVISION

EXIT

GRECODE (GRCD) IF LET LIST OUTPUT RANGE (RNG)

STOP UNTIL UPDATE (UPD) WRITE (WRT)

END-DIVISION

COMMENTS

Periods preceding division and section names signify that they are treated as comments by CONCOR

END-DIVISION signifies the termination of the CONCOR division organization and must always be present even though division identifiers have not been implemented

This figure additionally permits a comparison of the EDIT-SPECI-FICATION commands of both CONCOR language versions Note the changes in abbreviations and that some keyword options (not shown) have been altered even where general corresponshydence exists Individual commands are discussed on the following pages

CONCOR LANGUAGE COMMAND STATEMENTS

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

ALLOCATE (ALL) ALLOCATE (ALLOC) Allows for the assignment of values and with the UPDATE command providesthe imputation mechanism Major difference is this statement now providesfor a receiving-identifier list and statistics are generated as coded inthe GENERATE-EDIT-STATISTICS command option of the EXECUTION-DIVISION

ASSERT fAST) ASSERT (ASRT) The ASSERT command is the basic consistency editor command of CONCOR Command now provides for a NOERROR and MESSAGE option Additionallyby utilizinq the DUMPR DUMPO keywords provides the user with the option of using all the current value of all user-identifiers defined for the record type being processed Other changes to this command include the pass end-pass fail end-fail constructions for the P purpose of maintaining structured logic

(see RECODE) DRECODE (DRCD) Provides a method for recoding data items having a starting value of zero and continue in ascending sequence Replaces t previous XRECODE command

EXIT Provides the means to terminate EXECUTION (ESCAPE) of command statements within the UNTIL command

END END-IF END-THEN

A solitary END command is no lonqer permitted in conditional construction Note the similarity to COBOL pseudocode

END-ELSE

ERROR not implemented No longer supported in language

(continued)

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

FILTER TYPE ( ) WITH not implemented No longer are option statements supported in language This command once specified essentially a GOTO depending on construction

(see XRECODE) GRECODE (GRCD) This new command provides a method for the group recoding of input values

IF IF The IF statement has properties found virtually in all programming languages ie provides a means of evaluating a condition and controlling the execution of alternative logical statements Similar to pseudocode a CONCOR programmer must utilize the structured command IF THEN END-THEN ELSE END-ELSE construction in place of the IF THEN END ELSE END statements acceptable to the 1978 version

LET LET Virtually unchanged

LIST New command which allows for the generation of specific messages to be displayed in the report of edit statistics by questionnaire as well as specific individual identifiers

OUTPUT New command which provides the means by which records will be written to the output-file

(continuej)

4)bull

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 VERSION 1

RANGE (RNG)

RECODE (REC)

STOP- keyword

UPDATE (UPD)

(continued)

DECEMBER 1979 VERSION 2

RANGE (RNG)

name changed

STOP- keyword

UNTIL DO END-DO

UPDATE (UPD)

COMMENTS

As a basic editing command of CONCOR the December 1979 version permits the listing of the out-of-range values as well as the entire record or questionnaire through DUMPR and DUMPO optionswithin the command Another new option of this command involves the utilization of a PASS END-PASS FAIL END-FAIL construction permitting the specification of logicalcommand paths depending upon the outcome of the range test

Name changed to DRECODE Virtually identical with the DRC command in the COCENTS system

Unchanged

DO-END-DO one of the most significant and powerful enhancements to the CONCOR language this command provides a loopingcapability similar to most other high-level pogramminglanguages The number of iterations is under user control as well as the value initially assigned to the user identifier (counter) which is incremented with each successive execution of the command statements

Virtually unchanged it is the basic mechanism for maintaining arrays involved in imputation processes

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

WRITE (WRT) WRITE (WRT) The WRITE language statement while once utilized as the primary output command is now used to create an auxiliary or derivative file Statistics may be accumulated with the specification of the GENERATE-EDIT-STATISTICS option of the EXECUTION-DIVISION (See WRITE-FILE of DICTIONARY-DIVISION)

XRECODE (XREC) name changed Old command which has become the GRECODE command in the December 1979 version

0 0 0

DECEMBER 1978 VERSION 1

DECEMBER 1979 VERSION 2

Not implemented REPORT-DIViSION

DISPLAY-CONTROL-SECTION

DISPLAY-EDIT-STATISTICS

TOLERANCE-CONTROL-SECTION

ERROR-RATE-CHECK REJECT-FILE

END-DIVISION

COMMENTS

This new division (note division and section names treated as comments) provides the means to controlthe type of edit statistics reports to be generated Generally all reports produced by the commands ofthis division are organized according to the AREA-CONTROL command of the DICTIONARY-DIVISION Notethat this means the input data file must be preproshycessed (presorted external to CONCOR) on the basisof the AREA-CONTROL data field as CONCOR is incapableof merging uncontiguous records

There is currently no command to enable users to specify unique report headings on the listingfrom this division

APPENDIX G

ISPC FUTURE ENHANCEMENT LIST

G-1

Appendix G

Future Versions of CONCOR Design Considerations

The present version of the CONCOR system (Basic COBOL Version 2) was designed to provide adequate computing power to properly edit housing and population census da a using automatic correction techniquns As a result CONCOR should not be construed to be the ultimate software package for generalized data editing The developers of this version of CONCOR realize that there is a group of improvements that can be made to the system which would facilitate the census data editing process A second group of changes or enhancements also exist that could facilitate the editing of survey or other types of census data It should be noted that some of the modifications to meet these two objectives at times are mutually exclusive Some of the possible modifications and considerations are outlined below

1 Housing and Population Census Processing

1 1 Software

1111 Dictionary Division

(1) Implementation of the Dictionary Attributes Section This would allow the user to specify the dictionary size (number of pages) on disk and control other file characteristics such as volume specification and retention periods if applicable

(2) Addition of an optional input andor output file(s) which could be used to initialize and save values stored in arrays and used during the allocation process This feature would require additional syntax analysis and the creation of new commands to perform the movement of the data values

(3) An output record format area (Output Record Section) could be added This would allow the creation of edited output files in a different format and length than the file read into CONCOR This implementation would require other enhanceients to provide a simple means of moving values from th-iT i tato tile output file Unique naming of user-identifiers would also need to be addressed Another possiblity is the declaration of both the input and output location in the same command when specifying the contents of tile data record to be edited

(4) Addition of user-identifier naires to fields used in the controlling of summary statistics and unique questionnaire determination This would require modification to the system level data dictionary

46 G-2

(5) Increasing the number of area and questionnaire control fields

This implementation would require modification lo the

dictionary as tell as to the error records passed to the Report

Division Another problem would be the formatting of the extra

data values in the report headings

(6) Permit the mixing of difFerent data item types in the CONTROL

commands This would allow the greatest flexibilty in choosing

control fields In order to implement this enhancement the

dictionary and the generated COBOL program (EDITOR) would

require modification

(7) Give the user access to the values in the control fields during

the editing process Either the user would be prohibited from

changing these values (as is currently done with CONCOR

internal identifiers) to prevent the creation of new questionnaires or the summary information gathered on a

questionnaire basis would become meaningless Another

ramification of this enhancement is how to handle fields defined as alphanumeric as CONCOR internally works in binary

(8) Allow input data records with a similar format to be defined by

the same DEFINE-RECORD command statement This could be

accomplished by permitting multiple values in the RECORD-TYPE

clause of the DEFINE-RECORD command A possible problem here

is in the determination of which value to be move to the output

record when the record is to be written out using the OUTPUT

command 0

(9) Implement a COMMION-DATA concept to handle items command to all

record types This could be costly in terms of core storage

and additional conversion time if a storage area is allocated

for each identifier for each record type Only permitting one

common area for all records would require validation of the

data to ensure the values were exactly the same on each record

This wJould require the CONCOR program to perform many more

internal checks when reading the data file into the store area

(10) Permit the selective inclusion of data items found in the

DEFINE-RECORD command statement into the universe count used

for the determination of tolerance levels when using the COUNT-IIPUTES command statement

(11) Increase the number and types of data format referencing now

permitted External numeric data (type N) could be expanded to

handle 18 digit numbers Also signed numeric and packed

decimal data format could be added The signed numeric format

would allow both leading blanks and negative data values This

format is essential ihen dealing with data files generated by

FORTRAN programs

0

+

G-3 47

(12) Increase the number of comparison strings permitted for alphanumeric data items This would allow for easier processing of alphanumeric strings but would require modifications to the system data dictionary

(13) If a range of valid values were specified with the declaration of a user-identifier in the DEFINE-RECORD command an automatic range check could be performed prior to CONCOR giving the user access to the data items on the questionnaire The inclusion of such values are now done by many users in the form of comment statements This enhancement would also facilitate the use of the dictionary by tabulation software

(14) The language format used in the Dictionary Division could be modified to recognize both the space and comma character as valid delimiters The user would still be required to code two commas to receive the system default value for any parameter

(15) Provide a repetition factor in the initialization of arrays

(16) Print the data dictionary name in the titles of the dictionary documentation or provide a means by which the user may specify a title to appear in the listings

112 Execution Division

(M) Implement the Run Control Section This would allow the user to increase execution speed of the EDITOR program This would be accomplished by eliminating some of the system protection features such as division by zero and out of bounds checking In this secion the user could also specify new maximum values for the EDITOR error limits

(2) Provide an internal identifier (OCCURRENCE-PTR) which would contain the occurrence number of the record currently being filtered The only problem to resolve is how to treat this id-ntifier outside of the Filter Routines

(3) Implement a CREATE commnand This command would allow the user to create a record that Was not found on the input file This implementation would require special care as it gives the user the ability to change the valme of a CONCOR internal identifier (TYPE-CGUNT) and modify the contents of the store Another consequence of this command is what action is to be taken in the calculation of tolerance in terms of this new records inclusion into the universe of observations and number of changes

iK

48 G-4

(4) The code generated in the EDITOR program for the DRECODE

command should be modified to utilized a lookup table approach

This would decrease the execution time of the command but woul C

require additional communication between the GENED and GENDD

programs

(5) Permit the user to continue the message text field over

successive lines of code

(6) Implementation of a SET command command that would change

CONCORs default occurrence pointer for the duration of the SET

command This command would be ever helpful as the validation

of the existence of a record would only have to be done once

when the SET command was encounterd The major drawback to

this command is in the users ability to remember the action

taken during the last execution of the command when the

Execution Division command statements may cover many pages of

code

(7) Implement a method of specifying different data formats for

fields on the WRITE command statement This would give greater

flexibility to the user but would require major revision to the

routines now used to process the WRITE command

(8) Provide the user with two sets of internal identifiers The

first set currently exists in the system and is valid on the

basis of the current control area (if specified) being

executed The second set would be accumulated on a run total

basis and would require the creation of new CONCOR internal

ident ifiers

(9) Implement automatic array declarations for capturing allocation

frequency distributions This could be done by the system for

discrete values (especially if point number 13 above is

implemented) but a mechanism for handling continuous values

would need to be designed nother program would then be

required to format the information gathered into a readable

form

113 Report Division

(I) Provide a means by which the user may specify their own headings or titles -for the reports

(2) Give the user the ability to override page ejection as a means

of saving paper

(3) Provide a neans by which the statistics gathered may be

presented in other aqgregations other than the area-break and

total levels now pro ided This would require another

procedure to aggregate the statistics and pass them onto the

Report Division before the printing of the reports began Thi

extra procedure is required because of program size

cons i derat ions

G-5

49

114 System Level

(1) Generation of assembler (ALC) language code for the EDITOR program to be used on the IBM 360370 series type computers

(2) Modification of the GENSRC program i utilize a lookup table instead of the linear search technioa2 now used

(3) Modification of the RDMSGTXT program to look at continuation lines when searching for the text to be supplied by the system

(4) Hlodification to the system data dictionary to better arrange and make available information needed most frequently Also to examine the possibility of making the section dealing with the message text and report generation a separate file Investigate the industry standards for dictionary format and see how CONCOR meets the requirements set forth

(5) Make the parsing and syntacial routines interactive so as to increase the product ity of the CONCOR user This does not mean to imply that the editing process itself should be made interactive but rather only the development of the user code should be put online

12 Documentation

(1)The development of a self-teaching CONCOR manual

(2) The development of a CONCOR Users Guide

G-6

0

50

2 Survey or other Census Processing

21 Software

(1) Make optional the specification of the record type and questionnaire identification information as some files are not

hierarchical in nature

(2) Provide another input file as a means of doing a check-in procedure of sample cases expected This new input file would contain the master list of those oberservations selected to be

examined

(3) Allow floating point calculations

(4) Provide a mechanism for referencing repetitive sections of questionnaire This referencing (loop letter approach) could be accomplished in the same fashion as interrecord checking is now implemented Note that by using this approach two versions of the software would be required One version would be as CONCOR is now implemented and the second would accept flat data files where the subscript indicated a repetitive section on the survey document

(5) Add a weighting scheme to allow meaningful totals and summary

statistics based upon the sampling frame used to gather the data being edited

(6) Increase the number of user-identifiers allowed in the system because of the more detailed information found on survey forms

(7) Provide a means of manual correction of erroneous data items This correction scheme would utilize the data dictionary and allow for correction based upon the user-identifer name and

questionnaire identification CELADE has a COBOL version of

this program but it has not been finished nor tested

42

APPENDIX H

CONCOR SYSTEM INTERNAL VARIABLES

0 APPENO H

CONCOR SYSTEM INTERNAL VARIABLES

1 EOF-FLAG

2 ERROR-TN-QUESTIONNAIRE

3 CURRENT-POINTER-VALUE

4 TYPE-COUNT ( )

5 RECORD-COUNT

6 QUESTIONNAIRE-COUNT

7 RECORDS-IN-STORE

8 INVALID-RECORD-TYPE-COUNT

9 INCOMPLETE-FLAG

10 CONTINUATION-FLAG

11 INVALID-RECORD-FLAG

EOF-FLAG

ERRORS-IN-QUESTIONNAIRE

TYPE-COUNT ( )

IN-RECORD-COUNT

IN-QUESTIONNAIRE-COUNT

RECORDS-IN-STORE

INVALID-RECORD-TYPE-COUNT

INCOMPLETE-FLAG

CONTINUATION-FLAG

OUT-OF-RANGE

NOT-NUMBER

BLANK

(new internal counters) OUT-RECORD-COUNT OUTPUT-QUEST-COUNT WRITE-RECORD-COUNT

Change in spelling of variable

Not mentioned--possible ommission from manual

Not implemented

Note Current documentation equivocates when some variables

are reset--most are initialized with each control area break

Though implemented as reserved identifiers due to poor documentation of (old) CONCOR exact values and utility were unknown

These apparently new counters permit access to valuable totals within control areas

Note These are not cumulative counters Also outputshyquest-count violates naming conversion eg in out

APPENDIX I

CONCOR-EDITOR EXECUTION STATISTICS

C 0 N C o R - E D I T O R EXECUT ION S-TA-T1 ST I CS RUN DATEt 517I

APPENDIX I

CONTROL AREA 0 INPUTS RELD OUTPUTS BY OUTPUTCOHMAND OUTPUTS BY W0ITECOMP4JD NO VALI 0 SEQUENCE NUMBER 00000 00 0 RECTYPES 0

OUESTIONNAIRES RECORDS 0 QUEST KEYWORD- 0 ECORDS- RECORDS FORQUFSTshy

bullDIVISION BY ZERO LINE NUMBER i18 to DIVISION BY ZERO LINE NUMBER 118 bo DIVISION BY ZERO 00 LINE NUMBER 118

Oboo DIVISION BY ZERO 00 LINE NUMBER 118 LTNE |UJMP 000 DIVISION BY ZERO 000 LINE NUMBER 118

bo DIVISION BY ZERO 0 LINE NUMBER 118 00 RUN ABORTED DUE TO DIVISION BY ZERO ERRORS 00 1it LIST IR003 = Ro3t

1 6 42

112 LIST

Comments One of the most sever criticisms 114 LIST IDIVISIION BY ZErO ERROR Cw FOLLOWS of the old CONCOR package was that ]l LET RO01 003 = O it would cease processing for no

apparent reason--ABEND without generating any messages The most )16 LIST iRO01 R003 SET TO ZEROS Pou19 Ron3 conmon types of cessations are division by zero array coordinate 117 LIST RO0 0 = errors and user specified terminations A program to deliberately 118 LET P003 = R002 RO01 test CONCORS ability to providediagnostics was written The system 119 LIST LIST R003 R002RO01 =1 R0031 perfoned exceptionally well and correctly provided the line numbers in the 120 source program which caused the data exceptions The program text has been 121 superimposed upon the Execution Statistics page for illustration purposes 122 END TEST OF OPERANDS AND MSG

123

124

1 5 PEGISTERS RESET TO ZERO

125 LET R001 RO00 R003 = nt

0 N C 1 - Z D I T 0 it - ( r S T A T I S T I C S QJi DAT 0119

I ~ - ~APPENDIX b)APPENDIX TL I A Cr14A) 01 3TUTS JRtTF -0 VtLIQ AP-A I PUT R - 3ITD JTS LY UTPUT - Y C

I)ESTIjAAIES - PECLRW cU] I S T igry-t QEC UP )S irrkqS F] ) T

VLLII= FF= 15( C 1 u U-1 Tr JJ TI A)[ 9 )i 1= TI =WATE ) L i - U = VI L IJ) L T J3r-= I n

D 1 4A 1 EItR P IXSTI J A 1 L = 7 II I fIAT E I i Y C

= y 4 I 19f CZ U AT -Z-2j 1ESTIJh14 E CJ)PI) IATE= 2 V L 1E= L T I P1 =UM 11

= rU F r

F R L- -- I JE TI rjA OrPR eTE= rUMiF=E II V1 LIIF= LIT V M

tYfZU ttj gR R QUIESTI3Ii1 j Vi r IATE= LI n l FJw== 0R V J L T C ( -pJIT E~Q - cUESTT Y2ZjI 1= g rfiflUI)IATL= VeLUJE= P LIf PP= 1 3)

bull~ - A ED 011 1 Tn CCy -tENCE C rGR DI T FOR UP S )r 0 JCCUJ0EICF FJ71 ECGPO

173

175 UNTIL ROO1 gt 10 VARY P001 FROM 1 By Comments

In this example RO01 a 2 dimensional array 177 DO with 10 rows and 10 columns is attempting

178 UNTIL R002 - 10 VARY BO FROM 1 BY 1 to update the smaller (5 X 5) TA2 Nonshy- Uexistent element references caused the error

The program text has been superimposed on this

page for illustration purposes

180 ALLOCATE NOOP = TA1 (RO0] ROO)I

JAI UPDATE TA2 (RO01 R002) RO031

182 LIST N002 = R0037 NOO R003

1P3 LIST VALUES OF TA1 = RO03v TAl (RO01 R002)

184 LIST

185 LIST ITA2=TA2 RO01 Rn02)l

l6 END-DOI LIST

187 END-DO

deg r- -- shy

APPENJDIXI1 J( j

CONTROL AREA INPUTS READ OUTPUTS BY OUTPUT LOMANO OUTPUTS TEltCOMMNDfN

SEQUENCE NUMBER 11T A QUESTIONNAIRES RECORDS QUEST KEYWORD RECORDS SFf- 0 T rr-

JOB STOPPED DUETO USER STOP-RUN COMMAND ON LINC 215 - - -- 2 15 _ _ 4$ A

EDITOR -- NO RMAL END OF J B r T I1)

214 IF IN-RECORD-COUNT gt Comments In this example when the internal variable1215 ITHEN STOP-RUN END-THEN was than

215 TE END-THEN IN-RECORD-COUNT was greater than 500 processing was directed by the source COBOL CONCOR language

216 1 program to cease The program text has been - upon this page for illustrationI _-EC superimposed

S IF IN-RECORD-COUNT- lt 2 - - purposes ~~~~~~~~~~- bull - - ------I-

218~ THEN -LIST lt(THAN 20RECS INR

amp-~~~~~ ntai-beDcent

I

APPENDIX J

DIAGNOSTIC MEFSAGE GUIDE EXAMPLE

APPENDIX J DD-2

WARNING(DD-001) BLANK COMMAND STATEMENT LINE

EXPLANATION A blank command statement line was found in the user Dictionary Division command statement file

ACTION TAKEN Parsing continued with the next command statement line in the Dictionary Division command file

USER RESPONSE Put a comment indicator () in column 1 of the command statement line if spacing is desired if not remove the command statement line

ERROR (DD-002) ILLEGAL FIRST CHARACTER IN STRING

EXPLANATION A character not in the CONCOR legal character set (A-Z 0-9 - + the comma character () the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found as the first character in the string

ACTION TAKEN Parsing began again with the next string

USER RESPONSE Check for possible keying errors and ensure that the first character is a member of the valid CONCOR

character set

ERROR (DD-003) END OF LINE REACHED WITH NO DELIMITER FOR ILLEGAL STRING

EXPLANATION A string starting with a character not valid in the CONCOR legal character set (A-Z 0-9 - + the comma character ()p the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found without one of the delimiter characters ( =) when the end of the command line was reached

ACTION TAKEN Parsing began again with the next user command statement line in the Dictionary Division command file

USER RESPONSE Check for possible keying errors and ensure that each of the characters in the string is a member of the CONCOR valid character set (A-Z 0-9 - + the comma character () he delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ())

ERROR (DD-004) COMMENT INDICATOR () IN STRING

EXPLANATION The comment indicator character () was found embedded in a string

Page 34: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …

0

APPENDIX E

CONCOR WORKSHOP January 16 1980

BASIC COBOL VERSION 2

December 1979 Release

Participant Evaluation of Package

1 What is your impression of this version of CONCOR in terms of its usefulness and reliability for editing housing and population census data in less developed countries

2 If you are familiar with previous versions of CONCOR how does this current version compare to them

2D

35

2

3 How would you compare the utility of this version of CONCOR for less developed countries with other systems for editing data of which you are familiar

4 How well do you feel the documentation (Reference Manual and Diagnostic Messages Guide) supports the software

36

3

5 How well do you think this version of CONCOR would serve for editing other types of statistical censuses or surveys

6 If your organization does statistical data processing by computer would you recommend to your superiors that this software be used to edit the data Do you feel assistance would be required in the installation of this software on your organizations computer andor in training userR on the packages applicability to their work

37

7 In what way(s) could improvements or enhancements be made to this version

of the CONCOR software andor its documentation

8 Other comments

1shy

APPENDIX F

NEWOLD COMMAND COMPARISONS

0 0

OLD CONCOR VERSION 1 DECEMBER 1978

DATA-DICTIONARY

DEFINE INPUT-FILE

DEFINE ERROR-FILE

DEFINE COMMON-DATA

(contains the questionnaire-IDand record type location information)

DEFINE RECORD-TYPE=

DEFINE NEW-DATA

DEFINE ARRAY-DATA

APPENDIX F

NEW CONCOR VERSION 2 DECEMBER 1979

DICTIONARY-DIVISION

DICTIONARY-ATTRIBUTES-SECTION

DICTIONARY-NAME

FILE-SECTION

INPUT-FILE OUTPUT-FILE WRITE-FILE ERROR-FILE

DOENTIFICATION-CONTROL-SECTION

AREA-CONTROL

QUESTIONNAIRE-CONTROL

RECORD-CONTROL

INPUT-RECORD-SECTION

DEFINE-RECORD

WORKING-DATA-SECTION

NEW-DATA ARRAY-DATA

END-DiVISION

COMMENTS

In Version 1 all command keyword labels had to begin in column 1 ofthe coding sheet The position notation of Version 2 permits thecommand to begin in columns 1-72

The period preceding a statement

signifies that it is but a commentcard Accordingly it should be noted that while the END-DIVISION command must be present the division name identifier DICTIONARY-DIVISION

has not been implemented

Note Some parameters for each commandhave been altered even where general correspondence exists

OLD CONCOR VERSION 1 DECEMBER 1978

EDIT-SPECIFICATION

PROLOG

FILTER TYPE ( )

EPILOG

ALLOCATE (ALL) ASSERT (AST)

END ERROR

FILTER TYPE ( ) WITH

IF LET

RANGE (RNG) RECODE (REC) STOP --keyword

UPDATE (UPD) WRITE (WRT) XRECODE (XREC)

NEW CONCOR VERSION 2 DECEMBER 1979

EXECUTION-DIVISION

REPORT-CONTROL-SECTION

COUNT-IMPUTES GENERATE-EDIT-STATISTICS

EDIT-SPECIFICATION-SECTION

PROLOG-ROUTINE

FILTER-ROUTINE (name-of-record-type)

EPILOG-ROUTINE

ALLOCATE (ALLOC) ASSERT (ASRT) DRECODE (DRCD) END-DIVISION

EXIT

GRECODE (GRCD) IF LET LIST OUTPUT RANGE (RNG)

STOP UNTIL UPDATE (UPD) WRITE (WRT)

END-DIVISION

COMMENTS

Periods preceding division and section names signify that they are treated as comments by CONCOR

END-DIVISION signifies the termination of the CONCOR division organization and must always be present even though division identifiers have not been implemented

This figure additionally permits a comparison of the EDIT-SPECI-FICATION commands of both CONCOR language versions Note the changes in abbreviations and that some keyword options (not shown) have been altered even where general corresponshydence exists Individual commands are discussed on the following pages

CONCOR LANGUAGE COMMAND STATEMENTS

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

ALLOCATE (ALL) ALLOCATE (ALLOC) Allows for the assignment of values and with the UPDATE command providesthe imputation mechanism Major difference is this statement now providesfor a receiving-identifier list and statistics are generated as coded inthe GENERATE-EDIT-STATISTICS command option of the EXECUTION-DIVISION

ASSERT fAST) ASSERT (ASRT) The ASSERT command is the basic consistency editor command of CONCOR Command now provides for a NOERROR and MESSAGE option Additionallyby utilizinq the DUMPR DUMPO keywords provides the user with the option of using all the current value of all user-identifiers defined for the record type being processed Other changes to this command include the pass end-pass fail end-fail constructions for the P purpose of maintaining structured logic

(see RECODE) DRECODE (DRCD) Provides a method for recoding data items having a starting value of zero and continue in ascending sequence Replaces t previous XRECODE command

EXIT Provides the means to terminate EXECUTION (ESCAPE) of command statements within the UNTIL command

END END-IF END-THEN

A solitary END command is no lonqer permitted in conditional construction Note the similarity to COBOL pseudocode

END-ELSE

ERROR not implemented No longer supported in language

(continued)

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

FILTER TYPE ( ) WITH not implemented No longer are option statements supported in language This command once specified essentially a GOTO depending on construction

(see XRECODE) GRECODE (GRCD) This new command provides a method for the group recoding of input values

IF IF The IF statement has properties found virtually in all programming languages ie provides a means of evaluating a condition and controlling the execution of alternative logical statements Similar to pseudocode a CONCOR programmer must utilize the structured command IF THEN END-THEN ELSE END-ELSE construction in place of the IF THEN END ELSE END statements acceptable to the 1978 version

LET LET Virtually unchanged

LIST New command which allows for the generation of specific messages to be displayed in the report of edit statistics by questionnaire as well as specific individual identifiers

OUTPUT New command which provides the means by which records will be written to the output-file

(continuej)

4)bull

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 VERSION 1

RANGE (RNG)

RECODE (REC)

STOP- keyword

UPDATE (UPD)

(continued)

DECEMBER 1979 VERSION 2

RANGE (RNG)

name changed

STOP- keyword

UNTIL DO END-DO

UPDATE (UPD)

COMMENTS

As a basic editing command of CONCOR the December 1979 version permits the listing of the out-of-range values as well as the entire record or questionnaire through DUMPR and DUMPO optionswithin the command Another new option of this command involves the utilization of a PASS END-PASS FAIL END-FAIL construction permitting the specification of logicalcommand paths depending upon the outcome of the range test

Name changed to DRECODE Virtually identical with the DRC command in the COCENTS system

Unchanged

DO-END-DO one of the most significant and powerful enhancements to the CONCOR language this command provides a loopingcapability similar to most other high-level pogramminglanguages The number of iterations is under user control as well as the value initially assigned to the user identifier (counter) which is incremented with each successive execution of the command statements

Virtually unchanged it is the basic mechanism for maintaining arrays involved in imputation processes

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

WRITE (WRT) WRITE (WRT) The WRITE language statement while once utilized as the primary output command is now used to create an auxiliary or derivative file Statistics may be accumulated with the specification of the GENERATE-EDIT-STATISTICS option of the EXECUTION-DIVISION (See WRITE-FILE of DICTIONARY-DIVISION)

XRECODE (XREC) name changed Old command which has become the GRECODE command in the December 1979 version

0 0 0

DECEMBER 1978 VERSION 1

DECEMBER 1979 VERSION 2

Not implemented REPORT-DIViSION

DISPLAY-CONTROL-SECTION

DISPLAY-EDIT-STATISTICS

TOLERANCE-CONTROL-SECTION

ERROR-RATE-CHECK REJECT-FILE

END-DIVISION

COMMENTS

This new division (note division and section names treated as comments) provides the means to controlthe type of edit statistics reports to be generated Generally all reports produced by the commands ofthis division are organized according to the AREA-CONTROL command of the DICTIONARY-DIVISION Notethat this means the input data file must be preproshycessed (presorted external to CONCOR) on the basisof the AREA-CONTROL data field as CONCOR is incapableof merging uncontiguous records

There is currently no command to enable users to specify unique report headings on the listingfrom this division

APPENDIX G

ISPC FUTURE ENHANCEMENT LIST

G-1

Appendix G

Future Versions of CONCOR Design Considerations

The present version of the CONCOR system (Basic COBOL Version 2) was designed to provide adequate computing power to properly edit housing and population census da a using automatic correction techniquns As a result CONCOR should not be construed to be the ultimate software package for generalized data editing The developers of this version of CONCOR realize that there is a group of improvements that can be made to the system which would facilitate the census data editing process A second group of changes or enhancements also exist that could facilitate the editing of survey or other types of census data It should be noted that some of the modifications to meet these two objectives at times are mutually exclusive Some of the possible modifications and considerations are outlined below

1 Housing and Population Census Processing

1 1 Software

1111 Dictionary Division

(1) Implementation of the Dictionary Attributes Section This would allow the user to specify the dictionary size (number of pages) on disk and control other file characteristics such as volume specification and retention periods if applicable

(2) Addition of an optional input andor output file(s) which could be used to initialize and save values stored in arrays and used during the allocation process This feature would require additional syntax analysis and the creation of new commands to perform the movement of the data values

(3) An output record format area (Output Record Section) could be added This would allow the creation of edited output files in a different format and length than the file read into CONCOR This implementation would require other enhanceients to provide a simple means of moving values from th-iT i tato tile output file Unique naming of user-identifiers would also need to be addressed Another possiblity is the declaration of both the input and output location in the same command when specifying the contents of tile data record to be edited

(4) Addition of user-identifier naires to fields used in the controlling of summary statistics and unique questionnaire determination This would require modification to the system level data dictionary

46 G-2

(5) Increasing the number of area and questionnaire control fields

This implementation would require modification lo the

dictionary as tell as to the error records passed to the Report

Division Another problem would be the formatting of the extra

data values in the report headings

(6) Permit the mixing of difFerent data item types in the CONTROL

commands This would allow the greatest flexibilty in choosing

control fields In order to implement this enhancement the

dictionary and the generated COBOL program (EDITOR) would

require modification

(7) Give the user access to the values in the control fields during

the editing process Either the user would be prohibited from

changing these values (as is currently done with CONCOR

internal identifiers) to prevent the creation of new questionnaires or the summary information gathered on a

questionnaire basis would become meaningless Another

ramification of this enhancement is how to handle fields defined as alphanumeric as CONCOR internally works in binary

(8) Allow input data records with a similar format to be defined by

the same DEFINE-RECORD command statement This could be

accomplished by permitting multiple values in the RECORD-TYPE

clause of the DEFINE-RECORD command A possible problem here

is in the determination of which value to be move to the output

record when the record is to be written out using the OUTPUT

command 0

(9) Implement a COMMION-DATA concept to handle items command to all

record types This could be costly in terms of core storage

and additional conversion time if a storage area is allocated

for each identifier for each record type Only permitting one

common area for all records would require validation of the

data to ensure the values were exactly the same on each record

This wJould require the CONCOR program to perform many more

internal checks when reading the data file into the store area

(10) Permit the selective inclusion of data items found in the

DEFINE-RECORD command statement into the universe count used

for the determination of tolerance levels when using the COUNT-IIPUTES command statement

(11) Increase the number and types of data format referencing now

permitted External numeric data (type N) could be expanded to

handle 18 digit numbers Also signed numeric and packed

decimal data format could be added The signed numeric format

would allow both leading blanks and negative data values This

format is essential ihen dealing with data files generated by

FORTRAN programs

0

+

G-3 47

(12) Increase the number of comparison strings permitted for alphanumeric data items This would allow for easier processing of alphanumeric strings but would require modifications to the system data dictionary

(13) If a range of valid values were specified with the declaration of a user-identifier in the DEFINE-RECORD command an automatic range check could be performed prior to CONCOR giving the user access to the data items on the questionnaire The inclusion of such values are now done by many users in the form of comment statements This enhancement would also facilitate the use of the dictionary by tabulation software

(14) The language format used in the Dictionary Division could be modified to recognize both the space and comma character as valid delimiters The user would still be required to code two commas to receive the system default value for any parameter

(15) Provide a repetition factor in the initialization of arrays

(16) Print the data dictionary name in the titles of the dictionary documentation or provide a means by which the user may specify a title to appear in the listings

112 Execution Division

(M) Implement the Run Control Section This would allow the user to increase execution speed of the EDITOR program This would be accomplished by eliminating some of the system protection features such as division by zero and out of bounds checking In this secion the user could also specify new maximum values for the EDITOR error limits

(2) Provide an internal identifier (OCCURRENCE-PTR) which would contain the occurrence number of the record currently being filtered The only problem to resolve is how to treat this id-ntifier outside of the Filter Routines

(3) Implement a CREATE commnand This command would allow the user to create a record that Was not found on the input file This implementation would require special care as it gives the user the ability to change the valme of a CONCOR internal identifier (TYPE-CGUNT) and modify the contents of the store Another consequence of this command is what action is to be taken in the calculation of tolerance in terms of this new records inclusion into the universe of observations and number of changes

iK

48 G-4

(4) The code generated in the EDITOR program for the DRECODE

command should be modified to utilized a lookup table approach

This would decrease the execution time of the command but woul C

require additional communication between the GENED and GENDD

programs

(5) Permit the user to continue the message text field over

successive lines of code

(6) Implementation of a SET command command that would change

CONCORs default occurrence pointer for the duration of the SET

command This command would be ever helpful as the validation

of the existence of a record would only have to be done once

when the SET command was encounterd The major drawback to

this command is in the users ability to remember the action

taken during the last execution of the command when the

Execution Division command statements may cover many pages of

code

(7) Implement a method of specifying different data formats for

fields on the WRITE command statement This would give greater

flexibility to the user but would require major revision to the

routines now used to process the WRITE command

(8) Provide the user with two sets of internal identifiers The

first set currently exists in the system and is valid on the

basis of the current control area (if specified) being

executed The second set would be accumulated on a run total

basis and would require the creation of new CONCOR internal

ident ifiers

(9) Implement automatic array declarations for capturing allocation

frequency distributions This could be done by the system for

discrete values (especially if point number 13 above is

implemented) but a mechanism for handling continuous values

would need to be designed nother program would then be

required to format the information gathered into a readable

form

113 Report Division

(I) Provide a means by which the user may specify their own headings or titles -for the reports

(2) Give the user the ability to override page ejection as a means

of saving paper

(3) Provide a neans by which the statistics gathered may be

presented in other aqgregations other than the area-break and

total levels now pro ided This would require another

procedure to aggregate the statistics and pass them onto the

Report Division before the printing of the reports began Thi

extra procedure is required because of program size

cons i derat ions

G-5

49

114 System Level

(1) Generation of assembler (ALC) language code for the EDITOR program to be used on the IBM 360370 series type computers

(2) Modification of the GENSRC program i utilize a lookup table instead of the linear search technioa2 now used

(3) Modification of the RDMSGTXT program to look at continuation lines when searching for the text to be supplied by the system

(4) Hlodification to the system data dictionary to better arrange and make available information needed most frequently Also to examine the possibility of making the section dealing with the message text and report generation a separate file Investigate the industry standards for dictionary format and see how CONCOR meets the requirements set forth

(5) Make the parsing and syntacial routines interactive so as to increase the product ity of the CONCOR user This does not mean to imply that the editing process itself should be made interactive but rather only the development of the user code should be put online

12 Documentation

(1)The development of a self-teaching CONCOR manual

(2) The development of a CONCOR Users Guide

G-6

0

50

2 Survey or other Census Processing

21 Software

(1) Make optional the specification of the record type and questionnaire identification information as some files are not

hierarchical in nature

(2) Provide another input file as a means of doing a check-in procedure of sample cases expected This new input file would contain the master list of those oberservations selected to be

examined

(3) Allow floating point calculations

(4) Provide a mechanism for referencing repetitive sections of questionnaire This referencing (loop letter approach) could be accomplished in the same fashion as interrecord checking is now implemented Note that by using this approach two versions of the software would be required One version would be as CONCOR is now implemented and the second would accept flat data files where the subscript indicated a repetitive section on the survey document

(5) Add a weighting scheme to allow meaningful totals and summary

statistics based upon the sampling frame used to gather the data being edited

(6) Increase the number of user-identifiers allowed in the system because of the more detailed information found on survey forms

(7) Provide a means of manual correction of erroneous data items This correction scheme would utilize the data dictionary and allow for correction based upon the user-identifer name and

questionnaire identification CELADE has a COBOL version of

this program but it has not been finished nor tested

42

APPENDIX H

CONCOR SYSTEM INTERNAL VARIABLES

0 APPENO H

CONCOR SYSTEM INTERNAL VARIABLES

1 EOF-FLAG

2 ERROR-TN-QUESTIONNAIRE

3 CURRENT-POINTER-VALUE

4 TYPE-COUNT ( )

5 RECORD-COUNT

6 QUESTIONNAIRE-COUNT

7 RECORDS-IN-STORE

8 INVALID-RECORD-TYPE-COUNT

9 INCOMPLETE-FLAG

10 CONTINUATION-FLAG

11 INVALID-RECORD-FLAG

EOF-FLAG

ERRORS-IN-QUESTIONNAIRE

TYPE-COUNT ( )

IN-RECORD-COUNT

IN-QUESTIONNAIRE-COUNT

RECORDS-IN-STORE

INVALID-RECORD-TYPE-COUNT

INCOMPLETE-FLAG

CONTINUATION-FLAG

OUT-OF-RANGE

NOT-NUMBER

BLANK

(new internal counters) OUT-RECORD-COUNT OUTPUT-QUEST-COUNT WRITE-RECORD-COUNT

Change in spelling of variable

Not mentioned--possible ommission from manual

Not implemented

Note Current documentation equivocates when some variables

are reset--most are initialized with each control area break

Though implemented as reserved identifiers due to poor documentation of (old) CONCOR exact values and utility were unknown

These apparently new counters permit access to valuable totals within control areas

Note These are not cumulative counters Also outputshyquest-count violates naming conversion eg in out

APPENDIX I

CONCOR-EDITOR EXECUTION STATISTICS

C 0 N C o R - E D I T O R EXECUT ION S-TA-T1 ST I CS RUN DATEt 517I

APPENDIX I

CONTROL AREA 0 INPUTS RELD OUTPUTS BY OUTPUTCOHMAND OUTPUTS BY W0ITECOMP4JD NO VALI 0 SEQUENCE NUMBER 00000 00 0 RECTYPES 0

OUESTIONNAIRES RECORDS 0 QUEST KEYWORD- 0 ECORDS- RECORDS FORQUFSTshy

bullDIVISION BY ZERO LINE NUMBER i18 to DIVISION BY ZERO LINE NUMBER 118 bo DIVISION BY ZERO 00 LINE NUMBER 118

Oboo DIVISION BY ZERO 00 LINE NUMBER 118 LTNE |UJMP 000 DIVISION BY ZERO 000 LINE NUMBER 118

bo DIVISION BY ZERO 0 LINE NUMBER 118 00 RUN ABORTED DUE TO DIVISION BY ZERO ERRORS 00 1it LIST IR003 = Ro3t

1 6 42

112 LIST

Comments One of the most sever criticisms 114 LIST IDIVISIION BY ZErO ERROR Cw FOLLOWS of the old CONCOR package was that ]l LET RO01 003 = O it would cease processing for no

apparent reason--ABEND without generating any messages The most )16 LIST iRO01 R003 SET TO ZEROS Pou19 Ron3 conmon types of cessations are division by zero array coordinate 117 LIST RO0 0 = errors and user specified terminations A program to deliberately 118 LET P003 = R002 RO01 test CONCORS ability to providediagnostics was written The system 119 LIST LIST R003 R002RO01 =1 R0031 perfoned exceptionally well and correctly provided the line numbers in the 120 source program which caused the data exceptions The program text has been 121 superimposed upon the Execution Statistics page for illustration purposes 122 END TEST OF OPERANDS AND MSG

123

124

1 5 PEGISTERS RESET TO ZERO

125 LET R001 RO00 R003 = nt

0 N C 1 - Z D I T 0 it - ( r S T A T I S T I C S QJi DAT 0119

I ~ - ~APPENDIX b)APPENDIX TL I A Cr14A) 01 3TUTS JRtTF -0 VtLIQ AP-A I PUT R - 3ITD JTS LY UTPUT - Y C

I)ESTIjAAIES - PECLRW cU] I S T igry-t QEC UP )S irrkqS F] ) T

VLLII= FF= 15( C 1 u U-1 Tr JJ TI A)[ 9 )i 1= TI =WATE ) L i - U = VI L IJ) L T J3r-= I n

D 1 4A 1 EItR P IXSTI J A 1 L = 7 II I fIAT E I i Y C

= y 4 I 19f CZ U AT -Z-2j 1ESTIJh14 E CJ)PI) IATE= 2 V L 1E= L T I P1 =UM 11

= rU F r

F R L- -- I JE TI rjA OrPR eTE= rUMiF=E II V1 LIIF= LIT V M

tYfZU ttj gR R QUIESTI3Ii1 j Vi r IATE= LI n l FJw== 0R V J L T C ( -pJIT E~Q - cUESTT Y2ZjI 1= g rfiflUI)IATL= VeLUJE= P LIf PP= 1 3)

bull~ - A ED 011 1 Tn CCy -tENCE C rGR DI T FOR UP S )r 0 JCCUJ0EICF FJ71 ECGPO

173

175 UNTIL ROO1 gt 10 VARY P001 FROM 1 By Comments

In this example RO01 a 2 dimensional array 177 DO with 10 rows and 10 columns is attempting

178 UNTIL R002 - 10 VARY BO FROM 1 BY 1 to update the smaller (5 X 5) TA2 Nonshy- Uexistent element references caused the error

The program text has been superimposed on this

page for illustration purposes

180 ALLOCATE NOOP = TA1 (RO0] ROO)I

JAI UPDATE TA2 (RO01 R002) RO031

182 LIST N002 = R0037 NOO R003

1P3 LIST VALUES OF TA1 = RO03v TAl (RO01 R002)

184 LIST

185 LIST ITA2=TA2 RO01 Rn02)l

l6 END-DOI LIST

187 END-DO

deg r- -- shy

APPENJDIXI1 J( j

CONTROL AREA INPUTS READ OUTPUTS BY OUTPUT LOMANO OUTPUTS TEltCOMMNDfN

SEQUENCE NUMBER 11T A QUESTIONNAIRES RECORDS QUEST KEYWORD RECORDS SFf- 0 T rr-

JOB STOPPED DUETO USER STOP-RUN COMMAND ON LINC 215 - - -- 2 15 _ _ 4$ A

EDITOR -- NO RMAL END OF J B r T I1)

214 IF IN-RECORD-COUNT gt Comments In this example when the internal variable1215 ITHEN STOP-RUN END-THEN was than

215 TE END-THEN IN-RECORD-COUNT was greater than 500 processing was directed by the source COBOL CONCOR language

216 1 program to cease The program text has been - upon this page for illustrationI _-EC superimposed

S IF IN-RECORD-COUNT- lt 2 - - purposes ~~~~~~~~~~- bull - - ------I-

218~ THEN -LIST lt(THAN 20RECS INR

amp-~~~~~ ntai-beDcent

I

APPENDIX J

DIAGNOSTIC MEFSAGE GUIDE EXAMPLE

APPENDIX J DD-2

WARNING(DD-001) BLANK COMMAND STATEMENT LINE

EXPLANATION A blank command statement line was found in the user Dictionary Division command statement file

ACTION TAKEN Parsing continued with the next command statement line in the Dictionary Division command file

USER RESPONSE Put a comment indicator () in column 1 of the command statement line if spacing is desired if not remove the command statement line

ERROR (DD-002) ILLEGAL FIRST CHARACTER IN STRING

EXPLANATION A character not in the CONCOR legal character set (A-Z 0-9 - + the comma character () the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found as the first character in the string

ACTION TAKEN Parsing began again with the next string

USER RESPONSE Check for possible keying errors and ensure that the first character is a member of the valid CONCOR

character set

ERROR (DD-003) END OF LINE REACHED WITH NO DELIMITER FOR ILLEGAL STRING

EXPLANATION A string starting with a character not valid in the CONCOR legal character set (A-Z 0-9 - + the comma character ()p the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found without one of the delimiter characters ( =) when the end of the command line was reached

ACTION TAKEN Parsing began again with the next user command statement line in the Dictionary Division command file

USER RESPONSE Check for possible keying errors and ensure that each of the characters in the string is a member of the CONCOR valid character set (A-Z 0-9 - + the comma character () he delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ())

ERROR (DD-004) COMMENT INDICATOR () IN STRING

EXPLANATION The comment indicator character () was found embedded in a string

Page 35: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …

35

2

3 How would you compare the utility of this version of CONCOR for less developed countries with other systems for editing data of which you are familiar

4 How well do you feel the documentation (Reference Manual and Diagnostic Messages Guide) supports the software

36

3

5 How well do you think this version of CONCOR would serve for editing other types of statistical censuses or surveys

6 If your organization does statistical data processing by computer would you recommend to your superiors that this software be used to edit the data Do you feel assistance would be required in the installation of this software on your organizations computer andor in training userR on the packages applicability to their work

37

7 In what way(s) could improvements or enhancements be made to this version

of the CONCOR software andor its documentation

8 Other comments

1shy

APPENDIX F

NEWOLD COMMAND COMPARISONS

0 0

OLD CONCOR VERSION 1 DECEMBER 1978

DATA-DICTIONARY

DEFINE INPUT-FILE

DEFINE ERROR-FILE

DEFINE COMMON-DATA

(contains the questionnaire-IDand record type location information)

DEFINE RECORD-TYPE=

DEFINE NEW-DATA

DEFINE ARRAY-DATA

APPENDIX F

NEW CONCOR VERSION 2 DECEMBER 1979

DICTIONARY-DIVISION

DICTIONARY-ATTRIBUTES-SECTION

DICTIONARY-NAME

FILE-SECTION

INPUT-FILE OUTPUT-FILE WRITE-FILE ERROR-FILE

DOENTIFICATION-CONTROL-SECTION

AREA-CONTROL

QUESTIONNAIRE-CONTROL

RECORD-CONTROL

INPUT-RECORD-SECTION

DEFINE-RECORD

WORKING-DATA-SECTION

NEW-DATA ARRAY-DATA

END-DiVISION

COMMENTS

In Version 1 all command keyword labels had to begin in column 1 ofthe coding sheet The position notation of Version 2 permits thecommand to begin in columns 1-72

The period preceding a statement

signifies that it is but a commentcard Accordingly it should be noted that while the END-DIVISION command must be present the division name identifier DICTIONARY-DIVISION

has not been implemented

Note Some parameters for each commandhave been altered even where general correspondence exists

OLD CONCOR VERSION 1 DECEMBER 1978

EDIT-SPECIFICATION

PROLOG

FILTER TYPE ( )

EPILOG

ALLOCATE (ALL) ASSERT (AST)

END ERROR

FILTER TYPE ( ) WITH

IF LET

RANGE (RNG) RECODE (REC) STOP --keyword

UPDATE (UPD) WRITE (WRT) XRECODE (XREC)

NEW CONCOR VERSION 2 DECEMBER 1979

EXECUTION-DIVISION

REPORT-CONTROL-SECTION

COUNT-IMPUTES GENERATE-EDIT-STATISTICS

EDIT-SPECIFICATION-SECTION

PROLOG-ROUTINE

FILTER-ROUTINE (name-of-record-type)

EPILOG-ROUTINE

ALLOCATE (ALLOC) ASSERT (ASRT) DRECODE (DRCD) END-DIVISION

EXIT

GRECODE (GRCD) IF LET LIST OUTPUT RANGE (RNG)

STOP UNTIL UPDATE (UPD) WRITE (WRT)

END-DIVISION

COMMENTS

Periods preceding division and section names signify that they are treated as comments by CONCOR

END-DIVISION signifies the termination of the CONCOR division organization and must always be present even though division identifiers have not been implemented

This figure additionally permits a comparison of the EDIT-SPECI-FICATION commands of both CONCOR language versions Note the changes in abbreviations and that some keyword options (not shown) have been altered even where general corresponshydence exists Individual commands are discussed on the following pages

CONCOR LANGUAGE COMMAND STATEMENTS

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

ALLOCATE (ALL) ALLOCATE (ALLOC) Allows for the assignment of values and with the UPDATE command providesthe imputation mechanism Major difference is this statement now providesfor a receiving-identifier list and statistics are generated as coded inthe GENERATE-EDIT-STATISTICS command option of the EXECUTION-DIVISION

ASSERT fAST) ASSERT (ASRT) The ASSERT command is the basic consistency editor command of CONCOR Command now provides for a NOERROR and MESSAGE option Additionallyby utilizinq the DUMPR DUMPO keywords provides the user with the option of using all the current value of all user-identifiers defined for the record type being processed Other changes to this command include the pass end-pass fail end-fail constructions for the P purpose of maintaining structured logic

(see RECODE) DRECODE (DRCD) Provides a method for recoding data items having a starting value of zero and continue in ascending sequence Replaces t previous XRECODE command

EXIT Provides the means to terminate EXECUTION (ESCAPE) of command statements within the UNTIL command

END END-IF END-THEN

A solitary END command is no lonqer permitted in conditional construction Note the similarity to COBOL pseudocode

END-ELSE

ERROR not implemented No longer supported in language

(continued)

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

FILTER TYPE ( ) WITH not implemented No longer are option statements supported in language This command once specified essentially a GOTO depending on construction

(see XRECODE) GRECODE (GRCD) This new command provides a method for the group recoding of input values

IF IF The IF statement has properties found virtually in all programming languages ie provides a means of evaluating a condition and controlling the execution of alternative logical statements Similar to pseudocode a CONCOR programmer must utilize the structured command IF THEN END-THEN ELSE END-ELSE construction in place of the IF THEN END ELSE END statements acceptable to the 1978 version

LET LET Virtually unchanged

LIST New command which allows for the generation of specific messages to be displayed in the report of edit statistics by questionnaire as well as specific individual identifiers

OUTPUT New command which provides the means by which records will be written to the output-file

(continuej)

4)bull

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 VERSION 1

RANGE (RNG)

RECODE (REC)

STOP- keyword

UPDATE (UPD)

(continued)

DECEMBER 1979 VERSION 2

RANGE (RNG)

name changed

STOP- keyword

UNTIL DO END-DO

UPDATE (UPD)

COMMENTS

As a basic editing command of CONCOR the December 1979 version permits the listing of the out-of-range values as well as the entire record or questionnaire through DUMPR and DUMPO optionswithin the command Another new option of this command involves the utilization of a PASS END-PASS FAIL END-FAIL construction permitting the specification of logicalcommand paths depending upon the outcome of the range test

Name changed to DRECODE Virtually identical with the DRC command in the COCENTS system

Unchanged

DO-END-DO one of the most significant and powerful enhancements to the CONCOR language this command provides a loopingcapability similar to most other high-level pogramminglanguages The number of iterations is under user control as well as the value initially assigned to the user identifier (counter) which is incremented with each successive execution of the command statements

Virtually unchanged it is the basic mechanism for maintaining arrays involved in imputation processes

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

WRITE (WRT) WRITE (WRT) The WRITE language statement while once utilized as the primary output command is now used to create an auxiliary or derivative file Statistics may be accumulated with the specification of the GENERATE-EDIT-STATISTICS option of the EXECUTION-DIVISION (See WRITE-FILE of DICTIONARY-DIVISION)

XRECODE (XREC) name changed Old command which has become the GRECODE command in the December 1979 version

0 0 0

DECEMBER 1978 VERSION 1

DECEMBER 1979 VERSION 2

Not implemented REPORT-DIViSION

DISPLAY-CONTROL-SECTION

DISPLAY-EDIT-STATISTICS

TOLERANCE-CONTROL-SECTION

ERROR-RATE-CHECK REJECT-FILE

END-DIVISION

COMMENTS

This new division (note division and section names treated as comments) provides the means to controlthe type of edit statistics reports to be generated Generally all reports produced by the commands ofthis division are organized according to the AREA-CONTROL command of the DICTIONARY-DIVISION Notethat this means the input data file must be preproshycessed (presorted external to CONCOR) on the basisof the AREA-CONTROL data field as CONCOR is incapableof merging uncontiguous records

There is currently no command to enable users to specify unique report headings on the listingfrom this division

APPENDIX G

ISPC FUTURE ENHANCEMENT LIST

G-1

Appendix G

Future Versions of CONCOR Design Considerations

The present version of the CONCOR system (Basic COBOL Version 2) was designed to provide adequate computing power to properly edit housing and population census da a using automatic correction techniquns As a result CONCOR should not be construed to be the ultimate software package for generalized data editing The developers of this version of CONCOR realize that there is a group of improvements that can be made to the system which would facilitate the census data editing process A second group of changes or enhancements also exist that could facilitate the editing of survey or other types of census data It should be noted that some of the modifications to meet these two objectives at times are mutually exclusive Some of the possible modifications and considerations are outlined below

1 Housing and Population Census Processing

1 1 Software

1111 Dictionary Division

(1) Implementation of the Dictionary Attributes Section This would allow the user to specify the dictionary size (number of pages) on disk and control other file characteristics such as volume specification and retention periods if applicable

(2) Addition of an optional input andor output file(s) which could be used to initialize and save values stored in arrays and used during the allocation process This feature would require additional syntax analysis and the creation of new commands to perform the movement of the data values

(3) An output record format area (Output Record Section) could be added This would allow the creation of edited output files in a different format and length than the file read into CONCOR This implementation would require other enhanceients to provide a simple means of moving values from th-iT i tato tile output file Unique naming of user-identifiers would also need to be addressed Another possiblity is the declaration of both the input and output location in the same command when specifying the contents of tile data record to be edited

(4) Addition of user-identifier naires to fields used in the controlling of summary statistics and unique questionnaire determination This would require modification to the system level data dictionary

46 G-2

(5) Increasing the number of area and questionnaire control fields

This implementation would require modification lo the

dictionary as tell as to the error records passed to the Report

Division Another problem would be the formatting of the extra

data values in the report headings

(6) Permit the mixing of difFerent data item types in the CONTROL

commands This would allow the greatest flexibilty in choosing

control fields In order to implement this enhancement the

dictionary and the generated COBOL program (EDITOR) would

require modification

(7) Give the user access to the values in the control fields during

the editing process Either the user would be prohibited from

changing these values (as is currently done with CONCOR

internal identifiers) to prevent the creation of new questionnaires or the summary information gathered on a

questionnaire basis would become meaningless Another

ramification of this enhancement is how to handle fields defined as alphanumeric as CONCOR internally works in binary

(8) Allow input data records with a similar format to be defined by

the same DEFINE-RECORD command statement This could be

accomplished by permitting multiple values in the RECORD-TYPE

clause of the DEFINE-RECORD command A possible problem here

is in the determination of which value to be move to the output

record when the record is to be written out using the OUTPUT

command 0

(9) Implement a COMMION-DATA concept to handle items command to all

record types This could be costly in terms of core storage

and additional conversion time if a storage area is allocated

for each identifier for each record type Only permitting one

common area for all records would require validation of the

data to ensure the values were exactly the same on each record

This wJould require the CONCOR program to perform many more

internal checks when reading the data file into the store area

(10) Permit the selective inclusion of data items found in the

DEFINE-RECORD command statement into the universe count used

for the determination of tolerance levels when using the COUNT-IIPUTES command statement

(11) Increase the number and types of data format referencing now

permitted External numeric data (type N) could be expanded to

handle 18 digit numbers Also signed numeric and packed

decimal data format could be added The signed numeric format

would allow both leading blanks and negative data values This

format is essential ihen dealing with data files generated by

FORTRAN programs

0

+

G-3 47

(12) Increase the number of comparison strings permitted for alphanumeric data items This would allow for easier processing of alphanumeric strings but would require modifications to the system data dictionary

(13) If a range of valid values were specified with the declaration of a user-identifier in the DEFINE-RECORD command an automatic range check could be performed prior to CONCOR giving the user access to the data items on the questionnaire The inclusion of such values are now done by many users in the form of comment statements This enhancement would also facilitate the use of the dictionary by tabulation software

(14) The language format used in the Dictionary Division could be modified to recognize both the space and comma character as valid delimiters The user would still be required to code two commas to receive the system default value for any parameter

(15) Provide a repetition factor in the initialization of arrays

(16) Print the data dictionary name in the titles of the dictionary documentation or provide a means by which the user may specify a title to appear in the listings

112 Execution Division

(M) Implement the Run Control Section This would allow the user to increase execution speed of the EDITOR program This would be accomplished by eliminating some of the system protection features such as division by zero and out of bounds checking In this secion the user could also specify new maximum values for the EDITOR error limits

(2) Provide an internal identifier (OCCURRENCE-PTR) which would contain the occurrence number of the record currently being filtered The only problem to resolve is how to treat this id-ntifier outside of the Filter Routines

(3) Implement a CREATE commnand This command would allow the user to create a record that Was not found on the input file This implementation would require special care as it gives the user the ability to change the valme of a CONCOR internal identifier (TYPE-CGUNT) and modify the contents of the store Another consequence of this command is what action is to be taken in the calculation of tolerance in terms of this new records inclusion into the universe of observations and number of changes

iK

48 G-4

(4) The code generated in the EDITOR program for the DRECODE

command should be modified to utilized a lookup table approach

This would decrease the execution time of the command but woul C

require additional communication between the GENED and GENDD

programs

(5) Permit the user to continue the message text field over

successive lines of code

(6) Implementation of a SET command command that would change

CONCORs default occurrence pointer for the duration of the SET

command This command would be ever helpful as the validation

of the existence of a record would only have to be done once

when the SET command was encounterd The major drawback to

this command is in the users ability to remember the action

taken during the last execution of the command when the

Execution Division command statements may cover many pages of

code

(7) Implement a method of specifying different data formats for

fields on the WRITE command statement This would give greater

flexibility to the user but would require major revision to the

routines now used to process the WRITE command

(8) Provide the user with two sets of internal identifiers The

first set currently exists in the system and is valid on the

basis of the current control area (if specified) being

executed The second set would be accumulated on a run total

basis and would require the creation of new CONCOR internal

ident ifiers

(9) Implement automatic array declarations for capturing allocation

frequency distributions This could be done by the system for

discrete values (especially if point number 13 above is

implemented) but a mechanism for handling continuous values

would need to be designed nother program would then be

required to format the information gathered into a readable

form

113 Report Division

(I) Provide a means by which the user may specify their own headings or titles -for the reports

(2) Give the user the ability to override page ejection as a means

of saving paper

(3) Provide a neans by which the statistics gathered may be

presented in other aqgregations other than the area-break and

total levels now pro ided This would require another

procedure to aggregate the statistics and pass them onto the

Report Division before the printing of the reports began Thi

extra procedure is required because of program size

cons i derat ions

G-5

49

114 System Level

(1) Generation of assembler (ALC) language code for the EDITOR program to be used on the IBM 360370 series type computers

(2) Modification of the GENSRC program i utilize a lookup table instead of the linear search technioa2 now used

(3) Modification of the RDMSGTXT program to look at continuation lines when searching for the text to be supplied by the system

(4) Hlodification to the system data dictionary to better arrange and make available information needed most frequently Also to examine the possibility of making the section dealing with the message text and report generation a separate file Investigate the industry standards for dictionary format and see how CONCOR meets the requirements set forth

(5) Make the parsing and syntacial routines interactive so as to increase the product ity of the CONCOR user This does not mean to imply that the editing process itself should be made interactive but rather only the development of the user code should be put online

12 Documentation

(1)The development of a self-teaching CONCOR manual

(2) The development of a CONCOR Users Guide

G-6

0

50

2 Survey or other Census Processing

21 Software

(1) Make optional the specification of the record type and questionnaire identification information as some files are not

hierarchical in nature

(2) Provide another input file as a means of doing a check-in procedure of sample cases expected This new input file would contain the master list of those oberservations selected to be

examined

(3) Allow floating point calculations

(4) Provide a mechanism for referencing repetitive sections of questionnaire This referencing (loop letter approach) could be accomplished in the same fashion as interrecord checking is now implemented Note that by using this approach two versions of the software would be required One version would be as CONCOR is now implemented and the second would accept flat data files where the subscript indicated a repetitive section on the survey document

(5) Add a weighting scheme to allow meaningful totals and summary

statistics based upon the sampling frame used to gather the data being edited

(6) Increase the number of user-identifiers allowed in the system because of the more detailed information found on survey forms

(7) Provide a means of manual correction of erroneous data items This correction scheme would utilize the data dictionary and allow for correction based upon the user-identifer name and

questionnaire identification CELADE has a COBOL version of

this program but it has not been finished nor tested

42

APPENDIX H

CONCOR SYSTEM INTERNAL VARIABLES

0 APPENO H

CONCOR SYSTEM INTERNAL VARIABLES

1 EOF-FLAG

2 ERROR-TN-QUESTIONNAIRE

3 CURRENT-POINTER-VALUE

4 TYPE-COUNT ( )

5 RECORD-COUNT

6 QUESTIONNAIRE-COUNT

7 RECORDS-IN-STORE

8 INVALID-RECORD-TYPE-COUNT

9 INCOMPLETE-FLAG

10 CONTINUATION-FLAG

11 INVALID-RECORD-FLAG

EOF-FLAG

ERRORS-IN-QUESTIONNAIRE

TYPE-COUNT ( )

IN-RECORD-COUNT

IN-QUESTIONNAIRE-COUNT

RECORDS-IN-STORE

INVALID-RECORD-TYPE-COUNT

INCOMPLETE-FLAG

CONTINUATION-FLAG

OUT-OF-RANGE

NOT-NUMBER

BLANK

(new internal counters) OUT-RECORD-COUNT OUTPUT-QUEST-COUNT WRITE-RECORD-COUNT

Change in spelling of variable

Not mentioned--possible ommission from manual

Not implemented

Note Current documentation equivocates when some variables

are reset--most are initialized with each control area break

Though implemented as reserved identifiers due to poor documentation of (old) CONCOR exact values and utility were unknown

These apparently new counters permit access to valuable totals within control areas

Note These are not cumulative counters Also outputshyquest-count violates naming conversion eg in out

APPENDIX I

CONCOR-EDITOR EXECUTION STATISTICS

C 0 N C o R - E D I T O R EXECUT ION S-TA-T1 ST I CS RUN DATEt 517I

APPENDIX I

CONTROL AREA 0 INPUTS RELD OUTPUTS BY OUTPUTCOHMAND OUTPUTS BY W0ITECOMP4JD NO VALI 0 SEQUENCE NUMBER 00000 00 0 RECTYPES 0

OUESTIONNAIRES RECORDS 0 QUEST KEYWORD- 0 ECORDS- RECORDS FORQUFSTshy

bullDIVISION BY ZERO LINE NUMBER i18 to DIVISION BY ZERO LINE NUMBER 118 bo DIVISION BY ZERO 00 LINE NUMBER 118

Oboo DIVISION BY ZERO 00 LINE NUMBER 118 LTNE |UJMP 000 DIVISION BY ZERO 000 LINE NUMBER 118

bo DIVISION BY ZERO 0 LINE NUMBER 118 00 RUN ABORTED DUE TO DIVISION BY ZERO ERRORS 00 1it LIST IR003 = Ro3t

1 6 42

112 LIST

Comments One of the most sever criticisms 114 LIST IDIVISIION BY ZErO ERROR Cw FOLLOWS of the old CONCOR package was that ]l LET RO01 003 = O it would cease processing for no

apparent reason--ABEND without generating any messages The most )16 LIST iRO01 R003 SET TO ZEROS Pou19 Ron3 conmon types of cessations are division by zero array coordinate 117 LIST RO0 0 = errors and user specified terminations A program to deliberately 118 LET P003 = R002 RO01 test CONCORS ability to providediagnostics was written The system 119 LIST LIST R003 R002RO01 =1 R0031 perfoned exceptionally well and correctly provided the line numbers in the 120 source program which caused the data exceptions The program text has been 121 superimposed upon the Execution Statistics page for illustration purposes 122 END TEST OF OPERANDS AND MSG

123

124

1 5 PEGISTERS RESET TO ZERO

125 LET R001 RO00 R003 = nt

0 N C 1 - Z D I T 0 it - ( r S T A T I S T I C S QJi DAT 0119

I ~ - ~APPENDIX b)APPENDIX TL I A Cr14A) 01 3TUTS JRtTF -0 VtLIQ AP-A I PUT R - 3ITD JTS LY UTPUT - Y C

I)ESTIjAAIES - PECLRW cU] I S T igry-t QEC UP )S irrkqS F] ) T

VLLII= FF= 15( C 1 u U-1 Tr JJ TI A)[ 9 )i 1= TI =WATE ) L i - U = VI L IJ) L T J3r-= I n

D 1 4A 1 EItR P IXSTI J A 1 L = 7 II I fIAT E I i Y C

= y 4 I 19f CZ U AT -Z-2j 1ESTIJh14 E CJ)PI) IATE= 2 V L 1E= L T I P1 =UM 11

= rU F r

F R L- -- I JE TI rjA OrPR eTE= rUMiF=E II V1 LIIF= LIT V M

tYfZU ttj gR R QUIESTI3Ii1 j Vi r IATE= LI n l FJw== 0R V J L T C ( -pJIT E~Q - cUESTT Y2ZjI 1= g rfiflUI)IATL= VeLUJE= P LIf PP= 1 3)

bull~ - A ED 011 1 Tn CCy -tENCE C rGR DI T FOR UP S )r 0 JCCUJ0EICF FJ71 ECGPO

173

175 UNTIL ROO1 gt 10 VARY P001 FROM 1 By Comments

In this example RO01 a 2 dimensional array 177 DO with 10 rows and 10 columns is attempting

178 UNTIL R002 - 10 VARY BO FROM 1 BY 1 to update the smaller (5 X 5) TA2 Nonshy- Uexistent element references caused the error

The program text has been superimposed on this

page for illustration purposes

180 ALLOCATE NOOP = TA1 (RO0] ROO)I

JAI UPDATE TA2 (RO01 R002) RO031

182 LIST N002 = R0037 NOO R003

1P3 LIST VALUES OF TA1 = RO03v TAl (RO01 R002)

184 LIST

185 LIST ITA2=TA2 RO01 Rn02)l

l6 END-DOI LIST

187 END-DO

deg r- -- shy

APPENJDIXI1 J( j

CONTROL AREA INPUTS READ OUTPUTS BY OUTPUT LOMANO OUTPUTS TEltCOMMNDfN

SEQUENCE NUMBER 11T A QUESTIONNAIRES RECORDS QUEST KEYWORD RECORDS SFf- 0 T rr-

JOB STOPPED DUETO USER STOP-RUN COMMAND ON LINC 215 - - -- 2 15 _ _ 4$ A

EDITOR -- NO RMAL END OF J B r T I1)

214 IF IN-RECORD-COUNT gt Comments In this example when the internal variable1215 ITHEN STOP-RUN END-THEN was than

215 TE END-THEN IN-RECORD-COUNT was greater than 500 processing was directed by the source COBOL CONCOR language

216 1 program to cease The program text has been - upon this page for illustrationI _-EC superimposed

S IF IN-RECORD-COUNT- lt 2 - - purposes ~~~~~~~~~~- bull - - ------I-

218~ THEN -LIST lt(THAN 20RECS INR

amp-~~~~~ ntai-beDcent

I

APPENDIX J

DIAGNOSTIC MEFSAGE GUIDE EXAMPLE

APPENDIX J DD-2

WARNING(DD-001) BLANK COMMAND STATEMENT LINE

EXPLANATION A blank command statement line was found in the user Dictionary Division command statement file

ACTION TAKEN Parsing continued with the next command statement line in the Dictionary Division command file

USER RESPONSE Put a comment indicator () in column 1 of the command statement line if spacing is desired if not remove the command statement line

ERROR (DD-002) ILLEGAL FIRST CHARACTER IN STRING

EXPLANATION A character not in the CONCOR legal character set (A-Z 0-9 - + the comma character () the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found as the first character in the string

ACTION TAKEN Parsing began again with the next string

USER RESPONSE Check for possible keying errors and ensure that the first character is a member of the valid CONCOR

character set

ERROR (DD-003) END OF LINE REACHED WITH NO DELIMITER FOR ILLEGAL STRING

EXPLANATION A string starting with a character not valid in the CONCOR legal character set (A-Z 0-9 - + the comma character ()p the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found without one of the delimiter characters ( =) when the end of the command line was reached

ACTION TAKEN Parsing began again with the next user command statement line in the Dictionary Division command file

USER RESPONSE Check for possible keying errors and ensure that each of the characters in the string is a member of the CONCOR valid character set (A-Z 0-9 - + the comma character () he delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ())

ERROR (DD-004) COMMENT INDICATOR () IN STRING

EXPLANATION The comment indicator character () was found embedded in a string

Page 36: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …

36

3

5 How well do you think this version of CONCOR would serve for editing other types of statistical censuses or surveys

6 If your organization does statistical data processing by computer would you recommend to your superiors that this software be used to edit the data Do you feel assistance would be required in the installation of this software on your organizations computer andor in training userR on the packages applicability to their work

37

7 In what way(s) could improvements or enhancements be made to this version

of the CONCOR software andor its documentation

8 Other comments

1shy

APPENDIX F

NEWOLD COMMAND COMPARISONS

0 0

OLD CONCOR VERSION 1 DECEMBER 1978

DATA-DICTIONARY

DEFINE INPUT-FILE

DEFINE ERROR-FILE

DEFINE COMMON-DATA

(contains the questionnaire-IDand record type location information)

DEFINE RECORD-TYPE=

DEFINE NEW-DATA

DEFINE ARRAY-DATA

APPENDIX F

NEW CONCOR VERSION 2 DECEMBER 1979

DICTIONARY-DIVISION

DICTIONARY-ATTRIBUTES-SECTION

DICTIONARY-NAME

FILE-SECTION

INPUT-FILE OUTPUT-FILE WRITE-FILE ERROR-FILE

DOENTIFICATION-CONTROL-SECTION

AREA-CONTROL

QUESTIONNAIRE-CONTROL

RECORD-CONTROL

INPUT-RECORD-SECTION

DEFINE-RECORD

WORKING-DATA-SECTION

NEW-DATA ARRAY-DATA

END-DiVISION

COMMENTS

In Version 1 all command keyword labels had to begin in column 1 ofthe coding sheet The position notation of Version 2 permits thecommand to begin in columns 1-72

The period preceding a statement

signifies that it is but a commentcard Accordingly it should be noted that while the END-DIVISION command must be present the division name identifier DICTIONARY-DIVISION

has not been implemented

Note Some parameters for each commandhave been altered even where general correspondence exists

OLD CONCOR VERSION 1 DECEMBER 1978

EDIT-SPECIFICATION

PROLOG

FILTER TYPE ( )

EPILOG

ALLOCATE (ALL) ASSERT (AST)

END ERROR

FILTER TYPE ( ) WITH

IF LET

RANGE (RNG) RECODE (REC) STOP --keyword

UPDATE (UPD) WRITE (WRT) XRECODE (XREC)

NEW CONCOR VERSION 2 DECEMBER 1979

EXECUTION-DIVISION

REPORT-CONTROL-SECTION

COUNT-IMPUTES GENERATE-EDIT-STATISTICS

EDIT-SPECIFICATION-SECTION

PROLOG-ROUTINE

FILTER-ROUTINE (name-of-record-type)

EPILOG-ROUTINE

ALLOCATE (ALLOC) ASSERT (ASRT) DRECODE (DRCD) END-DIVISION

EXIT

GRECODE (GRCD) IF LET LIST OUTPUT RANGE (RNG)

STOP UNTIL UPDATE (UPD) WRITE (WRT)

END-DIVISION

COMMENTS

Periods preceding division and section names signify that they are treated as comments by CONCOR

END-DIVISION signifies the termination of the CONCOR division organization and must always be present even though division identifiers have not been implemented

This figure additionally permits a comparison of the EDIT-SPECI-FICATION commands of both CONCOR language versions Note the changes in abbreviations and that some keyword options (not shown) have been altered even where general corresponshydence exists Individual commands are discussed on the following pages

CONCOR LANGUAGE COMMAND STATEMENTS

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

ALLOCATE (ALL) ALLOCATE (ALLOC) Allows for the assignment of values and with the UPDATE command providesthe imputation mechanism Major difference is this statement now providesfor a receiving-identifier list and statistics are generated as coded inthe GENERATE-EDIT-STATISTICS command option of the EXECUTION-DIVISION

ASSERT fAST) ASSERT (ASRT) The ASSERT command is the basic consistency editor command of CONCOR Command now provides for a NOERROR and MESSAGE option Additionallyby utilizinq the DUMPR DUMPO keywords provides the user with the option of using all the current value of all user-identifiers defined for the record type being processed Other changes to this command include the pass end-pass fail end-fail constructions for the P purpose of maintaining structured logic

(see RECODE) DRECODE (DRCD) Provides a method for recoding data items having a starting value of zero and continue in ascending sequence Replaces t previous XRECODE command

EXIT Provides the means to terminate EXECUTION (ESCAPE) of command statements within the UNTIL command

END END-IF END-THEN

A solitary END command is no lonqer permitted in conditional construction Note the similarity to COBOL pseudocode

END-ELSE

ERROR not implemented No longer supported in language

(continued)

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

FILTER TYPE ( ) WITH not implemented No longer are option statements supported in language This command once specified essentially a GOTO depending on construction

(see XRECODE) GRECODE (GRCD) This new command provides a method for the group recoding of input values

IF IF The IF statement has properties found virtually in all programming languages ie provides a means of evaluating a condition and controlling the execution of alternative logical statements Similar to pseudocode a CONCOR programmer must utilize the structured command IF THEN END-THEN ELSE END-ELSE construction in place of the IF THEN END ELSE END statements acceptable to the 1978 version

LET LET Virtually unchanged

LIST New command which allows for the generation of specific messages to be displayed in the report of edit statistics by questionnaire as well as specific individual identifiers

OUTPUT New command which provides the means by which records will be written to the output-file

(continuej)

4)bull

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 VERSION 1

RANGE (RNG)

RECODE (REC)

STOP- keyword

UPDATE (UPD)

(continued)

DECEMBER 1979 VERSION 2

RANGE (RNG)

name changed

STOP- keyword

UNTIL DO END-DO

UPDATE (UPD)

COMMENTS

As a basic editing command of CONCOR the December 1979 version permits the listing of the out-of-range values as well as the entire record or questionnaire through DUMPR and DUMPO optionswithin the command Another new option of this command involves the utilization of a PASS END-PASS FAIL END-FAIL construction permitting the specification of logicalcommand paths depending upon the outcome of the range test

Name changed to DRECODE Virtually identical with the DRC command in the COCENTS system

Unchanged

DO-END-DO one of the most significant and powerful enhancements to the CONCOR language this command provides a loopingcapability similar to most other high-level pogramminglanguages The number of iterations is under user control as well as the value initially assigned to the user identifier (counter) which is incremented with each successive execution of the command statements

Virtually unchanged it is the basic mechanism for maintaining arrays involved in imputation processes

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

WRITE (WRT) WRITE (WRT) The WRITE language statement while once utilized as the primary output command is now used to create an auxiliary or derivative file Statistics may be accumulated with the specification of the GENERATE-EDIT-STATISTICS option of the EXECUTION-DIVISION (See WRITE-FILE of DICTIONARY-DIVISION)

XRECODE (XREC) name changed Old command which has become the GRECODE command in the December 1979 version

0 0 0

DECEMBER 1978 VERSION 1

DECEMBER 1979 VERSION 2

Not implemented REPORT-DIViSION

DISPLAY-CONTROL-SECTION

DISPLAY-EDIT-STATISTICS

TOLERANCE-CONTROL-SECTION

ERROR-RATE-CHECK REJECT-FILE

END-DIVISION

COMMENTS

This new division (note division and section names treated as comments) provides the means to controlthe type of edit statistics reports to be generated Generally all reports produced by the commands ofthis division are organized according to the AREA-CONTROL command of the DICTIONARY-DIVISION Notethat this means the input data file must be preproshycessed (presorted external to CONCOR) on the basisof the AREA-CONTROL data field as CONCOR is incapableof merging uncontiguous records

There is currently no command to enable users to specify unique report headings on the listingfrom this division

APPENDIX G

ISPC FUTURE ENHANCEMENT LIST

G-1

Appendix G

Future Versions of CONCOR Design Considerations

The present version of the CONCOR system (Basic COBOL Version 2) was designed to provide adequate computing power to properly edit housing and population census da a using automatic correction techniquns As a result CONCOR should not be construed to be the ultimate software package for generalized data editing The developers of this version of CONCOR realize that there is a group of improvements that can be made to the system which would facilitate the census data editing process A second group of changes or enhancements also exist that could facilitate the editing of survey or other types of census data It should be noted that some of the modifications to meet these two objectives at times are mutually exclusive Some of the possible modifications and considerations are outlined below

1 Housing and Population Census Processing

1 1 Software

1111 Dictionary Division

(1) Implementation of the Dictionary Attributes Section This would allow the user to specify the dictionary size (number of pages) on disk and control other file characteristics such as volume specification and retention periods if applicable

(2) Addition of an optional input andor output file(s) which could be used to initialize and save values stored in arrays and used during the allocation process This feature would require additional syntax analysis and the creation of new commands to perform the movement of the data values

(3) An output record format area (Output Record Section) could be added This would allow the creation of edited output files in a different format and length than the file read into CONCOR This implementation would require other enhanceients to provide a simple means of moving values from th-iT i tato tile output file Unique naming of user-identifiers would also need to be addressed Another possiblity is the declaration of both the input and output location in the same command when specifying the contents of tile data record to be edited

(4) Addition of user-identifier naires to fields used in the controlling of summary statistics and unique questionnaire determination This would require modification to the system level data dictionary

46 G-2

(5) Increasing the number of area and questionnaire control fields

This implementation would require modification lo the

dictionary as tell as to the error records passed to the Report

Division Another problem would be the formatting of the extra

data values in the report headings

(6) Permit the mixing of difFerent data item types in the CONTROL

commands This would allow the greatest flexibilty in choosing

control fields In order to implement this enhancement the

dictionary and the generated COBOL program (EDITOR) would

require modification

(7) Give the user access to the values in the control fields during

the editing process Either the user would be prohibited from

changing these values (as is currently done with CONCOR

internal identifiers) to prevent the creation of new questionnaires or the summary information gathered on a

questionnaire basis would become meaningless Another

ramification of this enhancement is how to handle fields defined as alphanumeric as CONCOR internally works in binary

(8) Allow input data records with a similar format to be defined by

the same DEFINE-RECORD command statement This could be

accomplished by permitting multiple values in the RECORD-TYPE

clause of the DEFINE-RECORD command A possible problem here

is in the determination of which value to be move to the output

record when the record is to be written out using the OUTPUT

command 0

(9) Implement a COMMION-DATA concept to handle items command to all

record types This could be costly in terms of core storage

and additional conversion time if a storage area is allocated

for each identifier for each record type Only permitting one

common area for all records would require validation of the

data to ensure the values were exactly the same on each record

This wJould require the CONCOR program to perform many more

internal checks when reading the data file into the store area

(10) Permit the selective inclusion of data items found in the

DEFINE-RECORD command statement into the universe count used

for the determination of tolerance levels when using the COUNT-IIPUTES command statement

(11) Increase the number and types of data format referencing now

permitted External numeric data (type N) could be expanded to

handle 18 digit numbers Also signed numeric and packed

decimal data format could be added The signed numeric format

would allow both leading blanks and negative data values This

format is essential ihen dealing with data files generated by

FORTRAN programs

0

+

G-3 47

(12) Increase the number of comparison strings permitted for alphanumeric data items This would allow for easier processing of alphanumeric strings but would require modifications to the system data dictionary

(13) If a range of valid values were specified with the declaration of a user-identifier in the DEFINE-RECORD command an automatic range check could be performed prior to CONCOR giving the user access to the data items on the questionnaire The inclusion of such values are now done by many users in the form of comment statements This enhancement would also facilitate the use of the dictionary by tabulation software

(14) The language format used in the Dictionary Division could be modified to recognize both the space and comma character as valid delimiters The user would still be required to code two commas to receive the system default value for any parameter

(15) Provide a repetition factor in the initialization of arrays

(16) Print the data dictionary name in the titles of the dictionary documentation or provide a means by which the user may specify a title to appear in the listings

112 Execution Division

(M) Implement the Run Control Section This would allow the user to increase execution speed of the EDITOR program This would be accomplished by eliminating some of the system protection features such as division by zero and out of bounds checking In this secion the user could also specify new maximum values for the EDITOR error limits

(2) Provide an internal identifier (OCCURRENCE-PTR) which would contain the occurrence number of the record currently being filtered The only problem to resolve is how to treat this id-ntifier outside of the Filter Routines

(3) Implement a CREATE commnand This command would allow the user to create a record that Was not found on the input file This implementation would require special care as it gives the user the ability to change the valme of a CONCOR internal identifier (TYPE-CGUNT) and modify the contents of the store Another consequence of this command is what action is to be taken in the calculation of tolerance in terms of this new records inclusion into the universe of observations and number of changes

iK

48 G-4

(4) The code generated in the EDITOR program for the DRECODE

command should be modified to utilized a lookup table approach

This would decrease the execution time of the command but woul C

require additional communication between the GENED and GENDD

programs

(5) Permit the user to continue the message text field over

successive lines of code

(6) Implementation of a SET command command that would change

CONCORs default occurrence pointer for the duration of the SET

command This command would be ever helpful as the validation

of the existence of a record would only have to be done once

when the SET command was encounterd The major drawback to

this command is in the users ability to remember the action

taken during the last execution of the command when the

Execution Division command statements may cover many pages of

code

(7) Implement a method of specifying different data formats for

fields on the WRITE command statement This would give greater

flexibility to the user but would require major revision to the

routines now used to process the WRITE command

(8) Provide the user with two sets of internal identifiers The

first set currently exists in the system and is valid on the

basis of the current control area (if specified) being

executed The second set would be accumulated on a run total

basis and would require the creation of new CONCOR internal

ident ifiers

(9) Implement automatic array declarations for capturing allocation

frequency distributions This could be done by the system for

discrete values (especially if point number 13 above is

implemented) but a mechanism for handling continuous values

would need to be designed nother program would then be

required to format the information gathered into a readable

form

113 Report Division

(I) Provide a means by which the user may specify their own headings or titles -for the reports

(2) Give the user the ability to override page ejection as a means

of saving paper

(3) Provide a neans by which the statistics gathered may be

presented in other aqgregations other than the area-break and

total levels now pro ided This would require another

procedure to aggregate the statistics and pass them onto the

Report Division before the printing of the reports began Thi

extra procedure is required because of program size

cons i derat ions

G-5

49

114 System Level

(1) Generation of assembler (ALC) language code for the EDITOR program to be used on the IBM 360370 series type computers

(2) Modification of the GENSRC program i utilize a lookup table instead of the linear search technioa2 now used

(3) Modification of the RDMSGTXT program to look at continuation lines when searching for the text to be supplied by the system

(4) Hlodification to the system data dictionary to better arrange and make available information needed most frequently Also to examine the possibility of making the section dealing with the message text and report generation a separate file Investigate the industry standards for dictionary format and see how CONCOR meets the requirements set forth

(5) Make the parsing and syntacial routines interactive so as to increase the product ity of the CONCOR user This does not mean to imply that the editing process itself should be made interactive but rather only the development of the user code should be put online

12 Documentation

(1)The development of a self-teaching CONCOR manual

(2) The development of a CONCOR Users Guide

G-6

0

50

2 Survey or other Census Processing

21 Software

(1) Make optional the specification of the record type and questionnaire identification information as some files are not

hierarchical in nature

(2) Provide another input file as a means of doing a check-in procedure of sample cases expected This new input file would contain the master list of those oberservations selected to be

examined

(3) Allow floating point calculations

(4) Provide a mechanism for referencing repetitive sections of questionnaire This referencing (loop letter approach) could be accomplished in the same fashion as interrecord checking is now implemented Note that by using this approach two versions of the software would be required One version would be as CONCOR is now implemented and the second would accept flat data files where the subscript indicated a repetitive section on the survey document

(5) Add a weighting scheme to allow meaningful totals and summary

statistics based upon the sampling frame used to gather the data being edited

(6) Increase the number of user-identifiers allowed in the system because of the more detailed information found on survey forms

(7) Provide a means of manual correction of erroneous data items This correction scheme would utilize the data dictionary and allow for correction based upon the user-identifer name and

questionnaire identification CELADE has a COBOL version of

this program but it has not been finished nor tested

42

APPENDIX H

CONCOR SYSTEM INTERNAL VARIABLES

0 APPENO H

CONCOR SYSTEM INTERNAL VARIABLES

1 EOF-FLAG

2 ERROR-TN-QUESTIONNAIRE

3 CURRENT-POINTER-VALUE

4 TYPE-COUNT ( )

5 RECORD-COUNT

6 QUESTIONNAIRE-COUNT

7 RECORDS-IN-STORE

8 INVALID-RECORD-TYPE-COUNT

9 INCOMPLETE-FLAG

10 CONTINUATION-FLAG

11 INVALID-RECORD-FLAG

EOF-FLAG

ERRORS-IN-QUESTIONNAIRE

TYPE-COUNT ( )

IN-RECORD-COUNT

IN-QUESTIONNAIRE-COUNT

RECORDS-IN-STORE

INVALID-RECORD-TYPE-COUNT

INCOMPLETE-FLAG

CONTINUATION-FLAG

OUT-OF-RANGE

NOT-NUMBER

BLANK

(new internal counters) OUT-RECORD-COUNT OUTPUT-QUEST-COUNT WRITE-RECORD-COUNT

Change in spelling of variable

Not mentioned--possible ommission from manual

Not implemented

Note Current documentation equivocates when some variables

are reset--most are initialized with each control area break

Though implemented as reserved identifiers due to poor documentation of (old) CONCOR exact values and utility were unknown

These apparently new counters permit access to valuable totals within control areas

Note These are not cumulative counters Also outputshyquest-count violates naming conversion eg in out

APPENDIX I

CONCOR-EDITOR EXECUTION STATISTICS

C 0 N C o R - E D I T O R EXECUT ION S-TA-T1 ST I CS RUN DATEt 517I

APPENDIX I

CONTROL AREA 0 INPUTS RELD OUTPUTS BY OUTPUTCOHMAND OUTPUTS BY W0ITECOMP4JD NO VALI 0 SEQUENCE NUMBER 00000 00 0 RECTYPES 0

OUESTIONNAIRES RECORDS 0 QUEST KEYWORD- 0 ECORDS- RECORDS FORQUFSTshy

bullDIVISION BY ZERO LINE NUMBER i18 to DIVISION BY ZERO LINE NUMBER 118 bo DIVISION BY ZERO 00 LINE NUMBER 118

Oboo DIVISION BY ZERO 00 LINE NUMBER 118 LTNE |UJMP 000 DIVISION BY ZERO 000 LINE NUMBER 118

bo DIVISION BY ZERO 0 LINE NUMBER 118 00 RUN ABORTED DUE TO DIVISION BY ZERO ERRORS 00 1it LIST IR003 = Ro3t

1 6 42

112 LIST

Comments One of the most sever criticisms 114 LIST IDIVISIION BY ZErO ERROR Cw FOLLOWS of the old CONCOR package was that ]l LET RO01 003 = O it would cease processing for no

apparent reason--ABEND without generating any messages The most )16 LIST iRO01 R003 SET TO ZEROS Pou19 Ron3 conmon types of cessations are division by zero array coordinate 117 LIST RO0 0 = errors and user specified terminations A program to deliberately 118 LET P003 = R002 RO01 test CONCORS ability to providediagnostics was written The system 119 LIST LIST R003 R002RO01 =1 R0031 perfoned exceptionally well and correctly provided the line numbers in the 120 source program which caused the data exceptions The program text has been 121 superimposed upon the Execution Statistics page for illustration purposes 122 END TEST OF OPERANDS AND MSG

123

124

1 5 PEGISTERS RESET TO ZERO

125 LET R001 RO00 R003 = nt

0 N C 1 - Z D I T 0 it - ( r S T A T I S T I C S QJi DAT 0119

I ~ - ~APPENDIX b)APPENDIX TL I A Cr14A) 01 3TUTS JRtTF -0 VtLIQ AP-A I PUT R - 3ITD JTS LY UTPUT - Y C

I)ESTIjAAIES - PECLRW cU] I S T igry-t QEC UP )S irrkqS F] ) T

VLLII= FF= 15( C 1 u U-1 Tr JJ TI A)[ 9 )i 1= TI =WATE ) L i - U = VI L IJ) L T J3r-= I n

D 1 4A 1 EItR P IXSTI J A 1 L = 7 II I fIAT E I i Y C

= y 4 I 19f CZ U AT -Z-2j 1ESTIJh14 E CJ)PI) IATE= 2 V L 1E= L T I P1 =UM 11

= rU F r

F R L- -- I JE TI rjA OrPR eTE= rUMiF=E II V1 LIIF= LIT V M

tYfZU ttj gR R QUIESTI3Ii1 j Vi r IATE= LI n l FJw== 0R V J L T C ( -pJIT E~Q - cUESTT Y2ZjI 1= g rfiflUI)IATL= VeLUJE= P LIf PP= 1 3)

bull~ - A ED 011 1 Tn CCy -tENCE C rGR DI T FOR UP S )r 0 JCCUJ0EICF FJ71 ECGPO

173

175 UNTIL ROO1 gt 10 VARY P001 FROM 1 By Comments

In this example RO01 a 2 dimensional array 177 DO with 10 rows and 10 columns is attempting

178 UNTIL R002 - 10 VARY BO FROM 1 BY 1 to update the smaller (5 X 5) TA2 Nonshy- Uexistent element references caused the error

The program text has been superimposed on this

page for illustration purposes

180 ALLOCATE NOOP = TA1 (RO0] ROO)I

JAI UPDATE TA2 (RO01 R002) RO031

182 LIST N002 = R0037 NOO R003

1P3 LIST VALUES OF TA1 = RO03v TAl (RO01 R002)

184 LIST

185 LIST ITA2=TA2 RO01 Rn02)l

l6 END-DOI LIST

187 END-DO

deg r- -- shy

APPENJDIXI1 J( j

CONTROL AREA INPUTS READ OUTPUTS BY OUTPUT LOMANO OUTPUTS TEltCOMMNDfN

SEQUENCE NUMBER 11T A QUESTIONNAIRES RECORDS QUEST KEYWORD RECORDS SFf- 0 T rr-

JOB STOPPED DUETO USER STOP-RUN COMMAND ON LINC 215 - - -- 2 15 _ _ 4$ A

EDITOR -- NO RMAL END OF J B r T I1)

214 IF IN-RECORD-COUNT gt Comments In this example when the internal variable1215 ITHEN STOP-RUN END-THEN was than

215 TE END-THEN IN-RECORD-COUNT was greater than 500 processing was directed by the source COBOL CONCOR language

216 1 program to cease The program text has been - upon this page for illustrationI _-EC superimposed

S IF IN-RECORD-COUNT- lt 2 - - purposes ~~~~~~~~~~- bull - - ------I-

218~ THEN -LIST lt(THAN 20RECS INR

amp-~~~~~ ntai-beDcent

I

APPENDIX J

DIAGNOSTIC MEFSAGE GUIDE EXAMPLE

APPENDIX J DD-2

WARNING(DD-001) BLANK COMMAND STATEMENT LINE

EXPLANATION A blank command statement line was found in the user Dictionary Division command statement file

ACTION TAKEN Parsing continued with the next command statement line in the Dictionary Division command file

USER RESPONSE Put a comment indicator () in column 1 of the command statement line if spacing is desired if not remove the command statement line

ERROR (DD-002) ILLEGAL FIRST CHARACTER IN STRING

EXPLANATION A character not in the CONCOR legal character set (A-Z 0-9 - + the comma character () the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found as the first character in the string

ACTION TAKEN Parsing began again with the next string

USER RESPONSE Check for possible keying errors and ensure that the first character is a member of the valid CONCOR

character set

ERROR (DD-003) END OF LINE REACHED WITH NO DELIMITER FOR ILLEGAL STRING

EXPLANATION A string starting with a character not valid in the CONCOR legal character set (A-Z 0-9 - + the comma character ()p the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found without one of the delimiter characters ( =) when the end of the command line was reached

ACTION TAKEN Parsing began again with the next user command statement line in the Dictionary Division command file

USER RESPONSE Check for possible keying errors and ensure that each of the characters in the string is a member of the CONCOR valid character set (A-Z 0-9 - + the comma character () he delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ())

ERROR (DD-004) COMMENT INDICATOR () IN STRING

EXPLANATION The comment indicator character () was found embedded in a string

Page 37: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …

37

7 In what way(s) could improvements or enhancements be made to this version

of the CONCOR software andor its documentation

8 Other comments

1shy

APPENDIX F

NEWOLD COMMAND COMPARISONS

0 0

OLD CONCOR VERSION 1 DECEMBER 1978

DATA-DICTIONARY

DEFINE INPUT-FILE

DEFINE ERROR-FILE

DEFINE COMMON-DATA

(contains the questionnaire-IDand record type location information)

DEFINE RECORD-TYPE=

DEFINE NEW-DATA

DEFINE ARRAY-DATA

APPENDIX F

NEW CONCOR VERSION 2 DECEMBER 1979

DICTIONARY-DIVISION

DICTIONARY-ATTRIBUTES-SECTION

DICTIONARY-NAME

FILE-SECTION

INPUT-FILE OUTPUT-FILE WRITE-FILE ERROR-FILE

DOENTIFICATION-CONTROL-SECTION

AREA-CONTROL

QUESTIONNAIRE-CONTROL

RECORD-CONTROL

INPUT-RECORD-SECTION

DEFINE-RECORD

WORKING-DATA-SECTION

NEW-DATA ARRAY-DATA

END-DiVISION

COMMENTS

In Version 1 all command keyword labels had to begin in column 1 ofthe coding sheet The position notation of Version 2 permits thecommand to begin in columns 1-72

The period preceding a statement

signifies that it is but a commentcard Accordingly it should be noted that while the END-DIVISION command must be present the division name identifier DICTIONARY-DIVISION

has not been implemented

Note Some parameters for each commandhave been altered even where general correspondence exists

OLD CONCOR VERSION 1 DECEMBER 1978

EDIT-SPECIFICATION

PROLOG

FILTER TYPE ( )

EPILOG

ALLOCATE (ALL) ASSERT (AST)

END ERROR

FILTER TYPE ( ) WITH

IF LET

RANGE (RNG) RECODE (REC) STOP --keyword

UPDATE (UPD) WRITE (WRT) XRECODE (XREC)

NEW CONCOR VERSION 2 DECEMBER 1979

EXECUTION-DIVISION

REPORT-CONTROL-SECTION

COUNT-IMPUTES GENERATE-EDIT-STATISTICS

EDIT-SPECIFICATION-SECTION

PROLOG-ROUTINE

FILTER-ROUTINE (name-of-record-type)

EPILOG-ROUTINE

ALLOCATE (ALLOC) ASSERT (ASRT) DRECODE (DRCD) END-DIVISION

EXIT

GRECODE (GRCD) IF LET LIST OUTPUT RANGE (RNG)

STOP UNTIL UPDATE (UPD) WRITE (WRT)

END-DIVISION

COMMENTS

Periods preceding division and section names signify that they are treated as comments by CONCOR

END-DIVISION signifies the termination of the CONCOR division organization and must always be present even though division identifiers have not been implemented

This figure additionally permits a comparison of the EDIT-SPECI-FICATION commands of both CONCOR language versions Note the changes in abbreviations and that some keyword options (not shown) have been altered even where general corresponshydence exists Individual commands are discussed on the following pages

CONCOR LANGUAGE COMMAND STATEMENTS

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

ALLOCATE (ALL) ALLOCATE (ALLOC) Allows for the assignment of values and with the UPDATE command providesthe imputation mechanism Major difference is this statement now providesfor a receiving-identifier list and statistics are generated as coded inthe GENERATE-EDIT-STATISTICS command option of the EXECUTION-DIVISION

ASSERT fAST) ASSERT (ASRT) The ASSERT command is the basic consistency editor command of CONCOR Command now provides for a NOERROR and MESSAGE option Additionallyby utilizinq the DUMPR DUMPO keywords provides the user with the option of using all the current value of all user-identifiers defined for the record type being processed Other changes to this command include the pass end-pass fail end-fail constructions for the P purpose of maintaining structured logic

(see RECODE) DRECODE (DRCD) Provides a method for recoding data items having a starting value of zero and continue in ascending sequence Replaces t previous XRECODE command

EXIT Provides the means to terminate EXECUTION (ESCAPE) of command statements within the UNTIL command

END END-IF END-THEN

A solitary END command is no lonqer permitted in conditional construction Note the similarity to COBOL pseudocode

END-ELSE

ERROR not implemented No longer supported in language

(continued)

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

FILTER TYPE ( ) WITH not implemented No longer are option statements supported in language This command once specified essentially a GOTO depending on construction

(see XRECODE) GRECODE (GRCD) This new command provides a method for the group recoding of input values

IF IF The IF statement has properties found virtually in all programming languages ie provides a means of evaluating a condition and controlling the execution of alternative logical statements Similar to pseudocode a CONCOR programmer must utilize the structured command IF THEN END-THEN ELSE END-ELSE construction in place of the IF THEN END ELSE END statements acceptable to the 1978 version

LET LET Virtually unchanged

LIST New command which allows for the generation of specific messages to be displayed in the report of edit statistics by questionnaire as well as specific individual identifiers

OUTPUT New command which provides the means by which records will be written to the output-file

(continuej)

4)bull

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 VERSION 1

RANGE (RNG)

RECODE (REC)

STOP- keyword

UPDATE (UPD)

(continued)

DECEMBER 1979 VERSION 2

RANGE (RNG)

name changed

STOP- keyword

UNTIL DO END-DO

UPDATE (UPD)

COMMENTS

As a basic editing command of CONCOR the December 1979 version permits the listing of the out-of-range values as well as the entire record or questionnaire through DUMPR and DUMPO optionswithin the command Another new option of this command involves the utilization of a PASS END-PASS FAIL END-FAIL construction permitting the specification of logicalcommand paths depending upon the outcome of the range test

Name changed to DRECODE Virtually identical with the DRC command in the COCENTS system

Unchanged

DO-END-DO one of the most significant and powerful enhancements to the CONCOR language this command provides a loopingcapability similar to most other high-level pogramminglanguages The number of iterations is under user control as well as the value initially assigned to the user identifier (counter) which is incremented with each successive execution of the command statements

Virtually unchanged it is the basic mechanism for maintaining arrays involved in imputation processes

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

WRITE (WRT) WRITE (WRT) The WRITE language statement while once utilized as the primary output command is now used to create an auxiliary or derivative file Statistics may be accumulated with the specification of the GENERATE-EDIT-STATISTICS option of the EXECUTION-DIVISION (See WRITE-FILE of DICTIONARY-DIVISION)

XRECODE (XREC) name changed Old command which has become the GRECODE command in the December 1979 version

0 0 0

DECEMBER 1978 VERSION 1

DECEMBER 1979 VERSION 2

Not implemented REPORT-DIViSION

DISPLAY-CONTROL-SECTION

DISPLAY-EDIT-STATISTICS

TOLERANCE-CONTROL-SECTION

ERROR-RATE-CHECK REJECT-FILE

END-DIVISION

COMMENTS

This new division (note division and section names treated as comments) provides the means to controlthe type of edit statistics reports to be generated Generally all reports produced by the commands ofthis division are organized according to the AREA-CONTROL command of the DICTIONARY-DIVISION Notethat this means the input data file must be preproshycessed (presorted external to CONCOR) on the basisof the AREA-CONTROL data field as CONCOR is incapableof merging uncontiguous records

There is currently no command to enable users to specify unique report headings on the listingfrom this division

APPENDIX G

ISPC FUTURE ENHANCEMENT LIST

G-1

Appendix G

Future Versions of CONCOR Design Considerations

The present version of the CONCOR system (Basic COBOL Version 2) was designed to provide adequate computing power to properly edit housing and population census da a using automatic correction techniquns As a result CONCOR should not be construed to be the ultimate software package for generalized data editing The developers of this version of CONCOR realize that there is a group of improvements that can be made to the system which would facilitate the census data editing process A second group of changes or enhancements also exist that could facilitate the editing of survey or other types of census data It should be noted that some of the modifications to meet these two objectives at times are mutually exclusive Some of the possible modifications and considerations are outlined below

1 Housing and Population Census Processing

1 1 Software

1111 Dictionary Division

(1) Implementation of the Dictionary Attributes Section This would allow the user to specify the dictionary size (number of pages) on disk and control other file characteristics such as volume specification and retention periods if applicable

(2) Addition of an optional input andor output file(s) which could be used to initialize and save values stored in arrays and used during the allocation process This feature would require additional syntax analysis and the creation of new commands to perform the movement of the data values

(3) An output record format area (Output Record Section) could be added This would allow the creation of edited output files in a different format and length than the file read into CONCOR This implementation would require other enhanceients to provide a simple means of moving values from th-iT i tato tile output file Unique naming of user-identifiers would also need to be addressed Another possiblity is the declaration of both the input and output location in the same command when specifying the contents of tile data record to be edited

(4) Addition of user-identifier naires to fields used in the controlling of summary statistics and unique questionnaire determination This would require modification to the system level data dictionary

46 G-2

(5) Increasing the number of area and questionnaire control fields

This implementation would require modification lo the

dictionary as tell as to the error records passed to the Report

Division Another problem would be the formatting of the extra

data values in the report headings

(6) Permit the mixing of difFerent data item types in the CONTROL

commands This would allow the greatest flexibilty in choosing

control fields In order to implement this enhancement the

dictionary and the generated COBOL program (EDITOR) would

require modification

(7) Give the user access to the values in the control fields during

the editing process Either the user would be prohibited from

changing these values (as is currently done with CONCOR

internal identifiers) to prevent the creation of new questionnaires or the summary information gathered on a

questionnaire basis would become meaningless Another

ramification of this enhancement is how to handle fields defined as alphanumeric as CONCOR internally works in binary

(8) Allow input data records with a similar format to be defined by

the same DEFINE-RECORD command statement This could be

accomplished by permitting multiple values in the RECORD-TYPE

clause of the DEFINE-RECORD command A possible problem here

is in the determination of which value to be move to the output

record when the record is to be written out using the OUTPUT

command 0

(9) Implement a COMMION-DATA concept to handle items command to all

record types This could be costly in terms of core storage

and additional conversion time if a storage area is allocated

for each identifier for each record type Only permitting one

common area for all records would require validation of the

data to ensure the values were exactly the same on each record

This wJould require the CONCOR program to perform many more

internal checks when reading the data file into the store area

(10) Permit the selective inclusion of data items found in the

DEFINE-RECORD command statement into the universe count used

for the determination of tolerance levels when using the COUNT-IIPUTES command statement

(11) Increase the number and types of data format referencing now

permitted External numeric data (type N) could be expanded to

handle 18 digit numbers Also signed numeric and packed

decimal data format could be added The signed numeric format

would allow both leading blanks and negative data values This

format is essential ihen dealing with data files generated by

FORTRAN programs

0

+

G-3 47

(12) Increase the number of comparison strings permitted for alphanumeric data items This would allow for easier processing of alphanumeric strings but would require modifications to the system data dictionary

(13) If a range of valid values were specified with the declaration of a user-identifier in the DEFINE-RECORD command an automatic range check could be performed prior to CONCOR giving the user access to the data items on the questionnaire The inclusion of such values are now done by many users in the form of comment statements This enhancement would also facilitate the use of the dictionary by tabulation software

(14) The language format used in the Dictionary Division could be modified to recognize both the space and comma character as valid delimiters The user would still be required to code two commas to receive the system default value for any parameter

(15) Provide a repetition factor in the initialization of arrays

(16) Print the data dictionary name in the titles of the dictionary documentation or provide a means by which the user may specify a title to appear in the listings

112 Execution Division

(M) Implement the Run Control Section This would allow the user to increase execution speed of the EDITOR program This would be accomplished by eliminating some of the system protection features such as division by zero and out of bounds checking In this secion the user could also specify new maximum values for the EDITOR error limits

(2) Provide an internal identifier (OCCURRENCE-PTR) which would contain the occurrence number of the record currently being filtered The only problem to resolve is how to treat this id-ntifier outside of the Filter Routines

(3) Implement a CREATE commnand This command would allow the user to create a record that Was not found on the input file This implementation would require special care as it gives the user the ability to change the valme of a CONCOR internal identifier (TYPE-CGUNT) and modify the contents of the store Another consequence of this command is what action is to be taken in the calculation of tolerance in terms of this new records inclusion into the universe of observations and number of changes

iK

48 G-4

(4) The code generated in the EDITOR program for the DRECODE

command should be modified to utilized a lookup table approach

This would decrease the execution time of the command but woul C

require additional communication between the GENED and GENDD

programs

(5) Permit the user to continue the message text field over

successive lines of code

(6) Implementation of a SET command command that would change

CONCORs default occurrence pointer for the duration of the SET

command This command would be ever helpful as the validation

of the existence of a record would only have to be done once

when the SET command was encounterd The major drawback to

this command is in the users ability to remember the action

taken during the last execution of the command when the

Execution Division command statements may cover many pages of

code

(7) Implement a method of specifying different data formats for

fields on the WRITE command statement This would give greater

flexibility to the user but would require major revision to the

routines now used to process the WRITE command

(8) Provide the user with two sets of internal identifiers The

first set currently exists in the system and is valid on the

basis of the current control area (if specified) being

executed The second set would be accumulated on a run total

basis and would require the creation of new CONCOR internal

ident ifiers

(9) Implement automatic array declarations for capturing allocation

frequency distributions This could be done by the system for

discrete values (especially if point number 13 above is

implemented) but a mechanism for handling continuous values

would need to be designed nother program would then be

required to format the information gathered into a readable

form

113 Report Division

(I) Provide a means by which the user may specify their own headings or titles -for the reports

(2) Give the user the ability to override page ejection as a means

of saving paper

(3) Provide a neans by which the statistics gathered may be

presented in other aqgregations other than the area-break and

total levels now pro ided This would require another

procedure to aggregate the statistics and pass them onto the

Report Division before the printing of the reports began Thi

extra procedure is required because of program size

cons i derat ions

G-5

49

114 System Level

(1) Generation of assembler (ALC) language code for the EDITOR program to be used on the IBM 360370 series type computers

(2) Modification of the GENSRC program i utilize a lookup table instead of the linear search technioa2 now used

(3) Modification of the RDMSGTXT program to look at continuation lines when searching for the text to be supplied by the system

(4) Hlodification to the system data dictionary to better arrange and make available information needed most frequently Also to examine the possibility of making the section dealing with the message text and report generation a separate file Investigate the industry standards for dictionary format and see how CONCOR meets the requirements set forth

(5) Make the parsing and syntacial routines interactive so as to increase the product ity of the CONCOR user This does not mean to imply that the editing process itself should be made interactive but rather only the development of the user code should be put online

12 Documentation

(1)The development of a self-teaching CONCOR manual

(2) The development of a CONCOR Users Guide

G-6

0

50

2 Survey or other Census Processing

21 Software

(1) Make optional the specification of the record type and questionnaire identification information as some files are not

hierarchical in nature

(2) Provide another input file as a means of doing a check-in procedure of sample cases expected This new input file would contain the master list of those oberservations selected to be

examined

(3) Allow floating point calculations

(4) Provide a mechanism for referencing repetitive sections of questionnaire This referencing (loop letter approach) could be accomplished in the same fashion as interrecord checking is now implemented Note that by using this approach two versions of the software would be required One version would be as CONCOR is now implemented and the second would accept flat data files where the subscript indicated a repetitive section on the survey document

(5) Add a weighting scheme to allow meaningful totals and summary

statistics based upon the sampling frame used to gather the data being edited

(6) Increase the number of user-identifiers allowed in the system because of the more detailed information found on survey forms

(7) Provide a means of manual correction of erroneous data items This correction scheme would utilize the data dictionary and allow for correction based upon the user-identifer name and

questionnaire identification CELADE has a COBOL version of

this program but it has not been finished nor tested

42

APPENDIX H

CONCOR SYSTEM INTERNAL VARIABLES

0 APPENO H

CONCOR SYSTEM INTERNAL VARIABLES

1 EOF-FLAG

2 ERROR-TN-QUESTIONNAIRE

3 CURRENT-POINTER-VALUE

4 TYPE-COUNT ( )

5 RECORD-COUNT

6 QUESTIONNAIRE-COUNT

7 RECORDS-IN-STORE

8 INVALID-RECORD-TYPE-COUNT

9 INCOMPLETE-FLAG

10 CONTINUATION-FLAG

11 INVALID-RECORD-FLAG

EOF-FLAG

ERRORS-IN-QUESTIONNAIRE

TYPE-COUNT ( )

IN-RECORD-COUNT

IN-QUESTIONNAIRE-COUNT

RECORDS-IN-STORE

INVALID-RECORD-TYPE-COUNT

INCOMPLETE-FLAG

CONTINUATION-FLAG

OUT-OF-RANGE

NOT-NUMBER

BLANK

(new internal counters) OUT-RECORD-COUNT OUTPUT-QUEST-COUNT WRITE-RECORD-COUNT

Change in spelling of variable

Not mentioned--possible ommission from manual

Not implemented

Note Current documentation equivocates when some variables

are reset--most are initialized with each control area break

Though implemented as reserved identifiers due to poor documentation of (old) CONCOR exact values and utility were unknown

These apparently new counters permit access to valuable totals within control areas

Note These are not cumulative counters Also outputshyquest-count violates naming conversion eg in out

APPENDIX I

CONCOR-EDITOR EXECUTION STATISTICS

C 0 N C o R - E D I T O R EXECUT ION S-TA-T1 ST I CS RUN DATEt 517I

APPENDIX I

CONTROL AREA 0 INPUTS RELD OUTPUTS BY OUTPUTCOHMAND OUTPUTS BY W0ITECOMP4JD NO VALI 0 SEQUENCE NUMBER 00000 00 0 RECTYPES 0

OUESTIONNAIRES RECORDS 0 QUEST KEYWORD- 0 ECORDS- RECORDS FORQUFSTshy

bullDIVISION BY ZERO LINE NUMBER i18 to DIVISION BY ZERO LINE NUMBER 118 bo DIVISION BY ZERO 00 LINE NUMBER 118

Oboo DIVISION BY ZERO 00 LINE NUMBER 118 LTNE |UJMP 000 DIVISION BY ZERO 000 LINE NUMBER 118

bo DIVISION BY ZERO 0 LINE NUMBER 118 00 RUN ABORTED DUE TO DIVISION BY ZERO ERRORS 00 1it LIST IR003 = Ro3t

1 6 42

112 LIST

Comments One of the most sever criticisms 114 LIST IDIVISIION BY ZErO ERROR Cw FOLLOWS of the old CONCOR package was that ]l LET RO01 003 = O it would cease processing for no

apparent reason--ABEND without generating any messages The most )16 LIST iRO01 R003 SET TO ZEROS Pou19 Ron3 conmon types of cessations are division by zero array coordinate 117 LIST RO0 0 = errors and user specified terminations A program to deliberately 118 LET P003 = R002 RO01 test CONCORS ability to providediagnostics was written The system 119 LIST LIST R003 R002RO01 =1 R0031 perfoned exceptionally well and correctly provided the line numbers in the 120 source program which caused the data exceptions The program text has been 121 superimposed upon the Execution Statistics page for illustration purposes 122 END TEST OF OPERANDS AND MSG

123

124

1 5 PEGISTERS RESET TO ZERO

125 LET R001 RO00 R003 = nt

0 N C 1 - Z D I T 0 it - ( r S T A T I S T I C S QJi DAT 0119

I ~ - ~APPENDIX b)APPENDIX TL I A Cr14A) 01 3TUTS JRtTF -0 VtLIQ AP-A I PUT R - 3ITD JTS LY UTPUT - Y C

I)ESTIjAAIES - PECLRW cU] I S T igry-t QEC UP )S irrkqS F] ) T

VLLII= FF= 15( C 1 u U-1 Tr JJ TI A)[ 9 )i 1= TI =WATE ) L i - U = VI L IJ) L T J3r-= I n

D 1 4A 1 EItR P IXSTI J A 1 L = 7 II I fIAT E I i Y C

= y 4 I 19f CZ U AT -Z-2j 1ESTIJh14 E CJ)PI) IATE= 2 V L 1E= L T I P1 =UM 11

= rU F r

F R L- -- I JE TI rjA OrPR eTE= rUMiF=E II V1 LIIF= LIT V M

tYfZU ttj gR R QUIESTI3Ii1 j Vi r IATE= LI n l FJw== 0R V J L T C ( -pJIT E~Q - cUESTT Y2ZjI 1= g rfiflUI)IATL= VeLUJE= P LIf PP= 1 3)

bull~ - A ED 011 1 Tn CCy -tENCE C rGR DI T FOR UP S )r 0 JCCUJ0EICF FJ71 ECGPO

173

175 UNTIL ROO1 gt 10 VARY P001 FROM 1 By Comments

In this example RO01 a 2 dimensional array 177 DO with 10 rows and 10 columns is attempting

178 UNTIL R002 - 10 VARY BO FROM 1 BY 1 to update the smaller (5 X 5) TA2 Nonshy- Uexistent element references caused the error

The program text has been superimposed on this

page for illustration purposes

180 ALLOCATE NOOP = TA1 (RO0] ROO)I

JAI UPDATE TA2 (RO01 R002) RO031

182 LIST N002 = R0037 NOO R003

1P3 LIST VALUES OF TA1 = RO03v TAl (RO01 R002)

184 LIST

185 LIST ITA2=TA2 RO01 Rn02)l

l6 END-DOI LIST

187 END-DO

deg r- -- shy

APPENJDIXI1 J( j

CONTROL AREA INPUTS READ OUTPUTS BY OUTPUT LOMANO OUTPUTS TEltCOMMNDfN

SEQUENCE NUMBER 11T A QUESTIONNAIRES RECORDS QUEST KEYWORD RECORDS SFf- 0 T rr-

JOB STOPPED DUETO USER STOP-RUN COMMAND ON LINC 215 - - -- 2 15 _ _ 4$ A

EDITOR -- NO RMAL END OF J B r T I1)

214 IF IN-RECORD-COUNT gt Comments In this example when the internal variable1215 ITHEN STOP-RUN END-THEN was than

215 TE END-THEN IN-RECORD-COUNT was greater than 500 processing was directed by the source COBOL CONCOR language

216 1 program to cease The program text has been - upon this page for illustrationI _-EC superimposed

S IF IN-RECORD-COUNT- lt 2 - - purposes ~~~~~~~~~~- bull - - ------I-

218~ THEN -LIST lt(THAN 20RECS INR

amp-~~~~~ ntai-beDcent

I

APPENDIX J

DIAGNOSTIC MEFSAGE GUIDE EXAMPLE

APPENDIX J DD-2

WARNING(DD-001) BLANK COMMAND STATEMENT LINE

EXPLANATION A blank command statement line was found in the user Dictionary Division command statement file

ACTION TAKEN Parsing continued with the next command statement line in the Dictionary Division command file

USER RESPONSE Put a comment indicator () in column 1 of the command statement line if spacing is desired if not remove the command statement line

ERROR (DD-002) ILLEGAL FIRST CHARACTER IN STRING

EXPLANATION A character not in the CONCOR legal character set (A-Z 0-9 - + the comma character () the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found as the first character in the string

ACTION TAKEN Parsing began again with the next string

USER RESPONSE Check for possible keying errors and ensure that the first character is a member of the valid CONCOR

character set

ERROR (DD-003) END OF LINE REACHED WITH NO DELIMITER FOR ILLEGAL STRING

EXPLANATION A string starting with a character not valid in the CONCOR legal character set (A-Z 0-9 - + the comma character ()p the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found without one of the delimiter characters ( =) when the end of the command line was reached

ACTION TAKEN Parsing began again with the next user command statement line in the Dictionary Division command file

USER RESPONSE Check for possible keying errors and ensure that each of the characters in the string is a member of the CONCOR valid character set (A-Z 0-9 - + the comma character () he delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ())

ERROR (DD-004) COMMENT INDICATOR () IN STRING

EXPLANATION The comment indicator character () was found embedded in a string

Page 38: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …

APPENDIX F

NEWOLD COMMAND COMPARISONS

0 0

OLD CONCOR VERSION 1 DECEMBER 1978

DATA-DICTIONARY

DEFINE INPUT-FILE

DEFINE ERROR-FILE

DEFINE COMMON-DATA

(contains the questionnaire-IDand record type location information)

DEFINE RECORD-TYPE=

DEFINE NEW-DATA

DEFINE ARRAY-DATA

APPENDIX F

NEW CONCOR VERSION 2 DECEMBER 1979

DICTIONARY-DIVISION

DICTIONARY-ATTRIBUTES-SECTION

DICTIONARY-NAME

FILE-SECTION

INPUT-FILE OUTPUT-FILE WRITE-FILE ERROR-FILE

DOENTIFICATION-CONTROL-SECTION

AREA-CONTROL

QUESTIONNAIRE-CONTROL

RECORD-CONTROL

INPUT-RECORD-SECTION

DEFINE-RECORD

WORKING-DATA-SECTION

NEW-DATA ARRAY-DATA

END-DiVISION

COMMENTS

In Version 1 all command keyword labels had to begin in column 1 ofthe coding sheet The position notation of Version 2 permits thecommand to begin in columns 1-72

The period preceding a statement

signifies that it is but a commentcard Accordingly it should be noted that while the END-DIVISION command must be present the division name identifier DICTIONARY-DIVISION

has not been implemented

Note Some parameters for each commandhave been altered even where general correspondence exists

OLD CONCOR VERSION 1 DECEMBER 1978

EDIT-SPECIFICATION

PROLOG

FILTER TYPE ( )

EPILOG

ALLOCATE (ALL) ASSERT (AST)

END ERROR

FILTER TYPE ( ) WITH

IF LET

RANGE (RNG) RECODE (REC) STOP --keyword

UPDATE (UPD) WRITE (WRT) XRECODE (XREC)

NEW CONCOR VERSION 2 DECEMBER 1979

EXECUTION-DIVISION

REPORT-CONTROL-SECTION

COUNT-IMPUTES GENERATE-EDIT-STATISTICS

EDIT-SPECIFICATION-SECTION

PROLOG-ROUTINE

FILTER-ROUTINE (name-of-record-type)

EPILOG-ROUTINE

ALLOCATE (ALLOC) ASSERT (ASRT) DRECODE (DRCD) END-DIVISION

EXIT

GRECODE (GRCD) IF LET LIST OUTPUT RANGE (RNG)

STOP UNTIL UPDATE (UPD) WRITE (WRT)

END-DIVISION

COMMENTS

Periods preceding division and section names signify that they are treated as comments by CONCOR

END-DIVISION signifies the termination of the CONCOR division organization and must always be present even though division identifiers have not been implemented

This figure additionally permits a comparison of the EDIT-SPECI-FICATION commands of both CONCOR language versions Note the changes in abbreviations and that some keyword options (not shown) have been altered even where general corresponshydence exists Individual commands are discussed on the following pages

CONCOR LANGUAGE COMMAND STATEMENTS

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

ALLOCATE (ALL) ALLOCATE (ALLOC) Allows for the assignment of values and with the UPDATE command providesthe imputation mechanism Major difference is this statement now providesfor a receiving-identifier list and statistics are generated as coded inthe GENERATE-EDIT-STATISTICS command option of the EXECUTION-DIVISION

ASSERT fAST) ASSERT (ASRT) The ASSERT command is the basic consistency editor command of CONCOR Command now provides for a NOERROR and MESSAGE option Additionallyby utilizinq the DUMPR DUMPO keywords provides the user with the option of using all the current value of all user-identifiers defined for the record type being processed Other changes to this command include the pass end-pass fail end-fail constructions for the P purpose of maintaining structured logic

(see RECODE) DRECODE (DRCD) Provides a method for recoding data items having a starting value of zero and continue in ascending sequence Replaces t previous XRECODE command

EXIT Provides the means to terminate EXECUTION (ESCAPE) of command statements within the UNTIL command

END END-IF END-THEN

A solitary END command is no lonqer permitted in conditional construction Note the similarity to COBOL pseudocode

END-ELSE

ERROR not implemented No longer supported in language

(continued)

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

FILTER TYPE ( ) WITH not implemented No longer are option statements supported in language This command once specified essentially a GOTO depending on construction

(see XRECODE) GRECODE (GRCD) This new command provides a method for the group recoding of input values

IF IF The IF statement has properties found virtually in all programming languages ie provides a means of evaluating a condition and controlling the execution of alternative logical statements Similar to pseudocode a CONCOR programmer must utilize the structured command IF THEN END-THEN ELSE END-ELSE construction in place of the IF THEN END ELSE END statements acceptable to the 1978 version

LET LET Virtually unchanged

LIST New command which allows for the generation of specific messages to be displayed in the report of edit statistics by questionnaire as well as specific individual identifiers

OUTPUT New command which provides the means by which records will be written to the output-file

(continuej)

4)bull

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 VERSION 1

RANGE (RNG)

RECODE (REC)

STOP- keyword

UPDATE (UPD)

(continued)

DECEMBER 1979 VERSION 2

RANGE (RNG)

name changed

STOP- keyword

UNTIL DO END-DO

UPDATE (UPD)

COMMENTS

As a basic editing command of CONCOR the December 1979 version permits the listing of the out-of-range values as well as the entire record or questionnaire through DUMPR and DUMPO optionswithin the command Another new option of this command involves the utilization of a PASS END-PASS FAIL END-FAIL construction permitting the specification of logicalcommand paths depending upon the outcome of the range test

Name changed to DRECODE Virtually identical with the DRC command in the COCENTS system

Unchanged

DO-END-DO one of the most significant and powerful enhancements to the CONCOR language this command provides a loopingcapability similar to most other high-level pogramminglanguages The number of iterations is under user control as well as the value initially assigned to the user identifier (counter) which is incremented with each successive execution of the command statements

Virtually unchanged it is the basic mechanism for maintaining arrays involved in imputation processes

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

WRITE (WRT) WRITE (WRT) The WRITE language statement while once utilized as the primary output command is now used to create an auxiliary or derivative file Statistics may be accumulated with the specification of the GENERATE-EDIT-STATISTICS option of the EXECUTION-DIVISION (See WRITE-FILE of DICTIONARY-DIVISION)

XRECODE (XREC) name changed Old command which has become the GRECODE command in the December 1979 version

0 0 0

DECEMBER 1978 VERSION 1

DECEMBER 1979 VERSION 2

Not implemented REPORT-DIViSION

DISPLAY-CONTROL-SECTION

DISPLAY-EDIT-STATISTICS

TOLERANCE-CONTROL-SECTION

ERROR-RATE-CHECK REJECT-FILE

END-DIVISION

COMMENTS

This new division (note division and section names treated as comments) provides the means to controlthe type of edit statistics reports to be generated Generally all reports produced by the commands ofthis division are organized according to the AREA-CONTROL command of the DICTIONARY-DIVISION Notethat this means the input data file must be preproshycessed (presorted external to CONCOR) on the basisof the AREA-CONTROL data field as CONCOR is incapableof merging uncontiguous records

There is currently no command to enable users to specify unique report headings on the listingfrom this division

APPENDIX G

ISPC FUTURE ENHANCEMENT LIST

G-1

Appendix G

Future Versions of CONCOR Design Considerations

The present version of the CONCOR system (Basic COBOL Version 2) was designed to provide adequate computing power to properly edit housing and population census da a using automatic correction techniquns As a result CONCOR should not be construed to be the ultimate software package for generalized data editing The developers of this version of CONCOR realize that there is a group of improvements that can be made to the system which would facilitate the census data editing process A second group of changes or enhancements also exist that could facilitate the editing of survey or other types of census data It should be noted that some of the modifications to meet these two objectives at times are mutually exclusive Some of the possible modifications and considerations are outlined below

1 Housing and Population Census Processing

1 1 Software

1111 Dictionary Division

(1) Implementation of the Dictionary Attributes Section This would allow the user to specify the dictionary size (number of pages) on disk and control other file characteristics such as volume specification and retention periods if applicable

(2) Addition of an optional input andor output file(s) which could be used to initialize and save values stored in arrays and used during the allocation process This feature would require additional syntax analysis and the creation of new commands to perform the movement of the data values

(3) An output record format area (Output Record Section) could be added This would allow the creation of edited output files in a different format and length than the file read into CONCOR This implementation would require other enhanceients to provide a simple means of moving values from th-iT i tato tile output file Unique naming of user-identifiers would also need to be addressed Another possiblity is the declaration of both the input and output location in the same command when specifying the contents of tile data record to be edited

(4) Addition of user-identifier naires to fields used in the controlling of summary statistics and unique questionnaire determination This would require modification to the system level data dictionary

46 G-2

(5) Increasing the number of area and questionnaire control fields

This implementation would require modification lo the

dictionary as tell as to the error records passed to the Report

Division Another problem would be the formatting of the extra

data values in the report headings

(6) Permit the mixing of difFerent data item types in the CONTROL

commands This would allow the greatest flexibilty in choosing

control fields In order to implement this enhancement the

dictionary and the generated COBOL program (EDITOR) would

require modification

(7) Give the user access to the values in the control fields during

the editing process Either the user would be prohibited from

changing these values (as is currently done with CONCOR

internal identifiers) to prevent the creation of new questionnaires or the summary information gathered on a

questionnaire basis would become meaningless Another

ramification of this enhancement is how to handle fields defined as alphanumeric as CONCOR internally works in binary

(8) Allow input data records with a similar format to be defined by

the same DEFINE-RECORD command statement This could be

accomplished by permitting multiple values in the RECORD-TYPE

clause of the DEFINE-RECORD command A possible problem here

is in the determination of which value to be move to the output

record when the record is to be written out using the OUTPUT

command 0

(9) Implement a COMMION-DATA concept to handle items command to all

record types This could be costly in terms of core storage

and additional conversion time if a storage area is allocated

for each identifier for each record type Only permitting one

common area for all records would require validation of the

data to ensure the values were exactly the same on each record

This wJould require the CONCOR program to perform many more

internal checks when reading the data file into the store area

(10) Permit the selective inclusion of data items found in the

DEFINE-RECORD command statement into the universe count used

for the determination of tolerance levels when using the COUNT-IIPUTES command statement

(11) Increase the number and types of data format referencing now

permitted External numeric data (type N) could be expanded to

handle 18 digit numbers Also signed numeric and packed

decimal data format could be added The signed numeric format

would allow both leading blanks and negative data values This

format is essential ihen dealing with data files generated by

FORTRAN programs

0

+

G-3 47

(12) Increase the number of comparison strings permitted for alphanumeric data items This would allow for easier processing of alphanumeric strings but would require modifications to the system data dictionary

(13) If a range of valid values were specified with the declaration of a user-identifier in the DEFINE-RECORD command an automatic range check could be performed prior to CONCOR giving the user access to the data items on the questionnaire The inclusion of such values are now done by many users in the form of comment statements This enhancement would also facilitate the use of the dictionary by tabulation software

(14) The language format used in the Dictionary Division could be modified to recognize both the space and comma character as valid delimiters The user would still be required to code two commas to receive the system default value for any parameter

(15) Provide a repetition factor in the initialization of arrays

(16) Print the data dictionary name in the titles of the dictionary documentation or provide a means by which the user may specify a title to appear in the listings

112 Execution Division

(M) Implement the Run Control Section This would allow the user to increase execution speed of the EDITOR program This would be accomplished by eliminating some of the system protection features such as division by zero and out of bounds checking In this secion the user could also specify new maximum values for the EDITOR error limits

(2) Provide an internal identifier (OCCURRENCE-PTR) which would contain the occurrence number of the record currently being filtered The only problem to resolve is how to treat this id-ntifier outside of the Filter Routines

(3) Implement a CREATE commnand This command would allow the user to create a record that Was not found on the input file This implementation would require special care as it gives the user the ability to change the valme of a CONCOR internal identifier (TYPE-CGUNT) and modify the contents of the store Another consequence of this command is what action is to be taken in the calculation of tolerance in terms of this new records inclusion into the universe of observations and number of changes

iK

48 G-4

(4) The code generated in the EDITOR program for the DRECODE

command should be modified to utilized a lookup table approach

This would decrease the execution time of the command but woul C

require additional communication between the GENED and GENDD

programs

(5) Permit the user to continue the message text field over

successive lines of code

(6) Implementation of a SET command command that would change

CONCORs default occurrence pointer for the duration of the SET

command This command would be ever helpful as the validation

of the existence of a record would only have to be done once

when the SET command was encounterd The major drawback to

this command is in the users ability to remember the action

taken during the last execution of the command when the

Execution Division command statements may cover many pages of

code

(7) Implement a method of specifying different data formats for

fields on the WRITE command statement This would give greater

flexibility to the user but would require major revision to the

routines now used to process the WRITE command

(8) Provide the user with two sets of internal identifiers The

first set currently exists in the system and is valid on the

basis of the current control area (if specified) being

executed The second set would be accumulated on a run total

basis and would require the creation of new CONCOR internal

ident ifiers

(9) Implement automatic array declarations for capturing allocation

frequency distributions This could be done by the system for

discrete values (especially if point number 13 above is

implemented) but a mechanism for handling continuous values

would need to be designed nother program would then be

required to format the information gathered into a readable

form

113 Report Division

(I) Provide a means by which the user may specify their own headings or titles -for the reports

(2) Give the user the ability to override page ejection as a means

of saving paper

(3) Provide a neans by which the statistics gathered may be

presented in other aqgregations other than the area-break and

total levels now pro ided This would require another

procedure to aggregate the statistics and pass them onto the

Report Division before the printing of the reports began Thi

extra procedure is required because of program size

cons i derat ions

G-5

49

114 System Level

(1) Generation of assembler (ALC) language code for the EDITOR program to be used on the IBM 360370 series type computers

(2) Modification of the GENSRC program i utilize a lookup table instead of the linear search technioa2 now used

(3) Modification of the RDMSGTXT program to look at continuation lines when searching for the text to be supplied by the system

(4) Hlodification to the system data dictionary to better arrange and make available information needed most frequently Also to examine the possibility of making the section dealing with the message text and report generation a separate file Investigate the industry standards for dictionary format and see how CONCOR meets the requirements set forth

(5) Make the parsing and syntacial routines interactive so as to increase the product ity of the CONCOR user This does not mean to imply that the editing process itself should be made interactive but rather only the development of the user code should be put online

12 Documentation

(1)The development of a self-teaching CONCOR manual

(2) The development of a CONCOR Users Guide

G-6

0

50

2 Survey or other Census Processing

21 Software

(1) Make optional the specification of the record type and questionnaire identification information as some files are not

hierarchical in nature

(2) Provide another input file as a means of doing a check-in procedure of sample cases expected This new input file would contain the master list of those oberservations selected to be

examined

(3) Allow floating point calculations

(4) Provide a mechanism for referencing repetitive sections of questionnaire This referencing (loop letter approach) could be accomplished in the same fashion as interrecord checking is now implemented Note that by using this approach two versions of the software would be required One version would be as CONCOR is now implemented and the second would accept flat data files where the subscript indicated a repetitive section on the survey document

(5) Add a weighting scheme to allow meaningful totals and summary

statistics based upon the sampling frame used to gather the data being edited

(6) Increase the number of user-identifiers allowed in the system because of the more detailed information found on survey forms

(7) Provide a means of manual correction of erroneous data items This correction scheme would utilize the data dictionary and allow for correction based upon the user-identifer name and

questionnaire identification CELADE has a COBOL version of

this program but it has not been finished nor tested

42

APPENDIX H

CONCOR SYSTEM INTERNAL VARIABLES

0 APPENO H

CONCOR SYSTEM INTERNAL VARIABLES

1 EOF-FLAG

2 ERROR-TN-QUESTIONNAIRE

3 CURRENT-POINTER-VALUE

4 TYPE-COUNT ( )

5 RECORD-COUNT

6 QUESTIONNAIRE-COUNT

7 RECORDS-IN-STORE

8 INVALID-RECORD-TYPE-COUNT

9 INCOMPLETE-FLAG

10 CONTINUATION-FLAG

11 INVALID-RECORD-FLAG

EOF-FLAG

ERRORS-IN-QUESTIONNAIRE

TYPE-COUNT ( )

IN-RECORD-COUNT

IN-QUESTIONNAIRE-COUNT

RECORDS-IN-STORE

INVALID-RECORD-TYPE-COUNT

INCOMPLETE-FLAG

CONTINUATION-FLAG

OUT-OF-RANGE

NOT-NUMBER

BLANK

(new internal counters) OUT-RECORD-COUNT OUTPUT-QUEST-COUNT WRITE-RECORD-COUNT

Change in spelling of variable

Not mentioned--possible ommission from manual

Not implemented

Note Current documentation equivocates when some variables

are reset--most are initialized with each control area break

Though implemented as reserved identifiers due to poor documentation of (old) CONCOR exact values and utility were unknown

These apparently new counters permit access to valuable totals within control areas

Note These are not cumulative counters Also outputshyquest-count violates naming conversion eg in out

APPENDIX I

CONCOR-EDITOR EXECUTION STATISTICS

C 0 N C o R - E D I T O R EXECUT ION S-TA-T1 ST I CS RUN DATEt 517I

APPENDIX I

CONTROL AREA 0 INPUTS RELD OUTPUTS BY OUTPUTCOHMAND OUTPUTS BY W0ITECOMP4JD NO VALI 0 SEQUENCE NUMBER 00000 00 0 RECTYPES 0

OUESTIONNAIRES RECORDS 0 QUEST KEYWORD- 0 ECORDS- RECORDS FORQUFSTshy

bullDIVISION BY ZERO LINE NUMBER i18 to DIVISION BY ZERO LINE NUMBER 118 bo DIVISION BY ZERO 00 LINE NUMBER 118

Oboo DIVISION BY ZERO 00 LINE NUMBER 118 LTNE |UJMP 000 DIVISION BY ZERO 000 LINE NUMBER 118

bo DIVISION BY ZERO 0 LINE NUMBER 118 00 RUN ABORTED DUE TO DIVISION BY ZERO ERRORS 00 1it LIST IR003 = Ro3t

1 6 42

112 LIST

Comments One of the most sever criticisms 114 LIST IDIVISIION BY ZErO ERROR Cw FOLLOWS of the old CONCOR package was that ]l LET RO01 003 = O it would cease processing for no

apparent reason--ABEND without generating any messages The most )16 LIST iRO01 R003 SET TO ZEROS Pou19 Ron3 conmon types of cessations are division by zero array coordinate 117 LIST RO0 0 = errors and user specified terminations A program to deliberately 118 LET P003 = R002 RO01 test CONCORS ability to providediagnostics was written The system 119 LIST LIST R003 R002RO01 =1 R0031 perfoned exceptionally well and correctly provided the line numbers in the 120 source program which caused the data exceptions The program text has been 121 superimposed upon the Execution Statistics page for illustration purposes 122 END TEST OF OPERANDS AND MSG

123

124

1 5 PEGISTERS RESET TO ZERO

125 LET R001 RO00 R003 = nt

0 N C 1 - Z D I T 0 it - ( r S T A T I S T I C S QJi DAT 0119

I ~ - ~APPENDIX b)APPENDIX TL I A Cr14A) 01 3TUTS JRtTF -0 VtLIQ AP-A I PUT R - 3ITD JTS LY UTPUT - Y C

I)ESTIjAAIES - PECLRW cU] I S T igry-t QEC UP )S irrkqS F] ) T

VLLII= FF= 15( C 1 u U-1 Tr JJ TI A)[ 9 )i 1= TI =WATE ) L i - U = VI L IJ) L T J3r-= I n

D 1 4A 1 EItR P IXSTI J A 1 L = 7 II I fIAT E I i Y C

= y 4 I 19f CZ U AT -Z-2j 1ESTIJh14 E CJ)PI) IATE= 2 V L 1E= L T I P1 =UM 11

= rU F r

F R L- -- I JE TI rjA OrPR eTE= rUMiF=E II V1 LIIF= LIT V M

tYfZU ttj gR R QUIESTI3Ii1 j Vi r IATE= LI n l FJw== 0R V J L T C ( -pJIT E~Q - cUESTT Y2ZjI 1= g rfiflUI)IATL= VeLUJE= P LIf PP= 1 3)

bull~ - A ED 011 1 Tn CCy -tENCE C rGR DI T FOR UP S )r 0 JCCUJ0EICF FJ71 ECGPO

173

175 UNTIL ROO1 gt 10 VARY P001 FROM 1 By Comments

In this example RO01 a 2 dimensional array 177 DO with 10 rows and 10 columns is attempting

178 UNTIL R002 - 10 VARY BO FROM 1 BY 1 to update the smaller (5 X 5) TA2 Nonshy- Uexistent element references caused the error

The program text has been superimposed on this

page for illustration purposes

180 ALLOCATE NOOP = TA1 (RO0] ROO)I

JAI UPDATE TA2 (RO01 R002) RO031

182 LIST N002 = R0037 NOO R003

1P3 LIST VALUES OF TA1 = RO03v TAl (RO01 R002)

184 LIST

185 LIST ITA2=TA2 RO01 Rn02)l

l6 END-DOI LIST

187 END-DO

deg r- -- shy

APPENJDIXI1 J( j

CONTROL AREA INPUTS READ OUTPUTS BY OUTPUT LOMANO OUTPUTS TEltCOMMNDfN

SEQUENCE NUMBER 11T A QUESTIONNAIRES RECORDS QUEST KEYWORD RECORDS SFf- 0 T rr-

JOB STOPPED DUETO USER STOP-RUN COMMAND ON LINC 215 - - -- 2 15 _ _ 4$ A

EDITOR -- NO RMAL END OF J B r T I1)

214 IF IN-RECORD-COUNT gt Comments In this example when the internal variable1215 ITHEN STOP-RUN END-THEN was than

215 TE END-THEN IN-RECORD-COUNT was greater than 500 processing was directed by the source COBOL CONCOR language

216 1 program to cease The program text has been - upon this page for illustrationI _-EC superimposed

S IF IN-RECORD-COUNT- lt 2 - - purposes ~~~~~~~~~~- bull - - ------I-

218~ THEN -LIST lt(THAN 20RECS INR

amp-~~~~~ ntai-beDcent

I

APPENDIX J

DIAGNOSTIC MEFSAGE GUIDE EXAMPLE

APPENDIX J DD-2

WARNING(DD-001) BLANK COMMAND STATEMENT LINE

EXPLANATION A blank command statement line was found in the user Dictionary Division command statement file

ACTION TAKEN Parsing continued with the next command statement line in the Dictionary Division command file

USER RESPONSE Put a comment indicator () in column 1 of the command statement line if spacing is desired if not remove the command statement line

ERROR (DD-002) ILLEGAL FIRST CHARACTER IN STRING

EXPLANATION A character not in the CONCOR legal character set (A-Z 0-9 - + the comma character () the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found as the first character in the string

ACTION TAKEN Parsing began again with the next string

USER RESPONSE Check for possible keying errors and ensure that the first character is a member of the valid CONCOR

character set

ERROR (DD-003) END OF LINE REACHED WITH NO DELIMITER FOR ILLEGAL STRING

EXPLANATION A string starting with a character not valid in the CONCOR legal character set (A-Z 0-9 - + the comma character ()p the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found without one of the delimiter characters ( =) when the end of the command line was reached

ACTION TAKEN Parsing began again with the next user command statement line in the Dictionary Division command file

USER RESPONSE Check for possible keying errors and ensure that each of the characters in the string is a member of the CONCOR valid character set (A-Z 0-9 - + the comma character () he delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ())

ERROR (DD-004) COMMENT INDICATOR () IN STRING

EXPLANATION The comment indicator character () was found embedded in a string

Page 39: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …

0 0

OLD CONCOR VERSION 1 DECEMBER 1978

DATA-DICTIONARY

DEFINE INPUT-FILE

DEFINE ERROR-FILE

DEFINE COMMON-DATA

(contains the questionnaire-IDand record type location information)

DEFINE RECORD-TYPE=

DEFINE NEW-DATA

DEFINE ARRAY-DATA

APPENDIX F

NEW CONCOR VERSION 2 DECEMBER 1979

DICTIONARY-DIVISION

DICTIONARY-ATTRIBUTES-SECTION

DICTIONARY-NAME

FILE-SECTION

INPUT-FILE OUTPUT-FILE WRITE-FILE ERROR-FILE

DOENTIFICATION-CONTROL-SECTION

AREA-CONTROL

QUESTIONNAIRE-CONTROL

RECORD-CONTROL

INPUT-RECORD-SECTION

DEFINE-RECORD

WORKING-DATA-SECTION

NEW-DATA ARRAY-DATA

END-DiVISION

COMMENTS

In Version 1 all command keyword labels had to begin in column 1 ofthe coding sheet The position notation of Version 2 permits thecommand to begin in columns 1-72

The period preceding a statement

signifies that it is but a commentcard Accordingly it should be noted that while the END-DIVISION command must be present the division name identifier DICTIONARY-DIVISION

has not been implemented

Note Some parameters for each commandhave been altered even where general correspondence exists

OLD CONCOR VERSION 1 DECEMBER 1978

EDIT-SPECIFICATION

PROLOG

FILTER TYPE ( )

EPILOG

ALLOCATE (ALL) ASSERT (AST)

END ERROR

FILTER TYPE ( ) WITH

IF LET

RANGE (RNG) RECODE (REC) STOP --keyword

UPDATE (UPD) WRITE (WRT) XRECODE (XREC)

NEW CONCOR VERSION 2 DECEMBER 1979

EXECUTION-DIVISION

REPORT-CONTROL-SECTION

COUNT-IMPUTES GENERATE-EDIT-STATISTICS

EDIT-SPECIFICATION-SECTION

PROLOG-ROUTINE

FILTER-ROUTINE (name-of-record-type)

EPILOG-ROUTINE

ALLOCATE (ALLOC) ASSERT (ASRT) DRECODE (DRCD) END-DIVISION

EXIT

GRECODE (GRCD) IF LET LIST OUTPUT RANGE (RNG)

STOP UNTIL UPDATE (UPD) WRITE (WRT)

END-DIVISION

COMMENTS

Periods preceding division and section names signify that they are treated as comments by CONCOR

END-DIVISION signifies the termination of the CONCOR division organization and must always be present even though division identifiers have not been implemented

This figure additionally permits a comparison of the EDIT-SPECI-FICATION commands of both CONCOR language versions Note the changes in abbreviations and that some keyword options (not shown) have been altered even where general corresponshydence exists Individual commands are discussed on the following pages

CONCOR LANGUAGE COMMAND STATEMENTS

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

ALLOCATE (ALL) ALLOCATE (ALLOC) Allows for the assignment of values and with the UPDATE command providesthe imputation mechanism Major difference is this statement now providesfor a receiving-identifier list and statistics are generated as coded inthe GENERATE-EDIT-STATISTICS command option of the EXECUTION-DIVISION

ASSERT fAST) ASSERT (ASRT) The ASSERT command is the basic consistency editor command of CONCOR Command now provides for a NOERROR and MESSAGE option Additionallyby utilizinq the DUMPR DUMPO keywords provides the user with the option of using all the current value of all user-identifiers defined for the record type being processed Other changes to this command include the pass end-pass fail end-fail constructions for the P purpose of maintaining structured logic

(see RECODE) DRECODE (DRCD) Provides a method for recoding data items having a starting value of zero and continue in ascending sequence Replaces t previous XRECODE command

EXIT Provides the means to terminate EXECUTION (ESCAPE) of command statements within the UNTIL command

END END-IF END-THEN

A solitary END command is no lonqer permitted in conditional construction Note the similarity to COBOL pseudocode

END-ELSE

ERROR not implemented No longer supported in language

(continued)

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

FILTER TYPE ( ) WITH not implemented No longer are option statements supported in language This command once specified essentially a GOTO depending on construction

(see XRECODE) GRECODE (GRCD) This new command provides a method for the group recoding of input values

IF IF The IF statement has properties found virtually in all programming languages ie provides a means of evaluating a condition and controlling the execution of alternative logical statements Similar to pseudocode a CONCOR programmer must utilize the structured command IF THEN END-THEN ELSE END-ELSE construction in place of the IF THEN END ELSE END statements acceptable to the 1978 version

LET LET Virtually unchanged

LIST New command which allows for the generation of specific messages to be displayed in the report of edit statistics by questionnaire as well as specific individual identifiers

OUTPUT New command which provides the means by which records will be written to the output-file

(continuej)

4)bull

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 VERSION 1

RANGE (RNG)

RECODE (REC)

STOP- keyword

UPDATE (UPD)

(continued)

DECEMBER 1979 VERSION 2

RANGE (RNG)

name changed

STOP- keyword

UNTIL DO END-DO

UPDATE (UPD)

COMMENTS

As a basic editing command of CONCOR the December 1979 version permits the listing of the out-of-range values as well as the entire record or questionnaire through DUMPR and DUMPO optionswithin the command Another new option of this command involves the utilization of a PASS END-PASS FAIL END-FAIL construction permitting the specification of logicalcommand paths depending upon the outcome of the range test

Name changed to DRECODE Virtually identical with the DRC command in the COCENTS system

Unchanged

DO-END-DO one of the most significant and powerful enhancements to the CONCOR language this command provides a loopingcapability similar to most other high-level pogramminglanguages The number of iterations is under user control as well as the value initially assigned to the user identifier (counter) which is incremented with each successive execution of the command statements

Virtually unchanged it is the basic mechanism for maintaining arrays involved in imputation processes

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

WRITE (WRT) WRITE (WRT) The WRITE language statement while once utilized as the primary output command is now used to create an auxiliary or derivative file Statistics may be accumulated with the specification of the GENERATE-EDIT-STATISTICS option of the EXECUTION-DIVISION (See WRITE-FILE of DICTIONARY-DIVISION)

XRECODE (XREC) name changed Old command which has become the GRECODE command in the December 1979 version

0 0 0

DECEMBER 1978 VERSION 1

DECEMBER 1979 VERSION 2

Not implemented REPORT-DIViSION

DISPLAY-CONTROL-SECTION

DISPLAY-EDIT-STATISTICS

TOLERANCE-CONTROL-SECTION

ERROR-RATE-CHECK REJECT-FILE

END-DIVISION

COMMENTS

This new division (note division and section names treated as comments) provides the means to controlthe type of edit statistics reports to be generated Generally all reports produced by the commands ofthis division are organized according to the AREA-CONTROL command of the DICTIONARY-DIVISION Notethat this means the input data file must be preproshycessed (presorted external to CONCOR) on the basisof the AREA-CONTROL data field as CONCOR is incapableof merging uncontiguous records

There is currently no command to enable users to specify unique report headings on the listingfrom this division

APPENDIX G

ISPC FUTURE ENHANCEMENT LIST

G-1

Appendix G

Future Versions of CONCOR Design Considerations

The present version of the CONCOR system (Basic COBOL Version 2) was designed to provide adequate computing power to properly edit housing and population census da a using automatic correction techniquns As a result CONCOR should not be construed to be the ultimate software package for generalized data editing The developers of this version of CONCOR realize that there is a group of improvements that can be made to the system which would facilitate the census data editing process A second group of changes or enhancements also exist that could facilitate the editing of survey or other types of census data It should be noted that some of the modifications to meet these two objectives at times are mutually exclusive Some of the possible modifications and considerations are outlined below

1 Housing and Population Census Processing

1 1 Software

1111 Dictionary Division

(1) Implementation of the Dictionary Attributes Section This would allow the user to specify the dictionary size (number of pages) on disk and control other file characteristics such as volume specification and retention periods if applicable

(2) Addition of an optional input andor output file(s) which could be used to initialize and save values stored in arrays and used during the allocation process This feature would require additional syntax analysis and the creation of new commands to perform the movement of the data values

(3) An output record format area (Output Record Section) could be added This would allow the creation of edited output files in a different format and length than the file read into CONCOR This implementation would require other enhanceients to provide a simple means of moving values from th-iT i tato tile output file Unique naming of user-identifiers would also need to be addressed Another possiblity is the declaration of both the input and output location in the same command when specifying the contents of tile data record to be edited

(4) Addition of user-identifier naires to fields used in the controlling of summary statistics and unique questionnaire determination This would require modification to the system level data dictionary

46 G-2

(5) Increasing the number of area and questionnaire control fields

This implementation would require modification lo the

dictionary as tell as to the error records passed to the Report

Division Another problem would be the formatting of the extra

data values in the report headings

(6) Permit the mixing of difFerent data item types in the CONTROL

commands This would allow the greatest flexibilty in choosing

control fields In order to implement this enhancement the

dictionary and the generated COBOL program (EDITOR) would

require modification

(7) Give the user access to the values in the control fields during

the editing process Either the user would be prohibited from

changing these values (as is currently done with CONCOR

internal identifiers) to prevent the creation of new questionnaires or the summary information gathered on a

questionnaire basis would become meaningless Another

ramification of this enhancement is how to handle fields defined as alphanumeric as CONCOR internally works in binary

(8) Allow input data records with a similar format to be defined by

the same DEFINE-RECORD command statement This could be

accomplished by permitting multiple values in the RECORD-TYPE

clause of the DEFINE-RECORD command A possible problem here

is in the determination of which value to be move to the output

record when the record is to be written out using the OUTPUT

command 0

(9) Implement a COMMION-DATA concept to handle items command to all

record types This could be costly in terms of core storage

and additional conversion time if a storage area is allocated

for each identifier for each record type Only permitting one

common area for all records would require validation of the

data to ensure the values were exactly the same on each record

This wJould require the CONCOR program to perform many more

internal checks when reading the data file into the store area

(10) Permit the selective inclusion of data items found in the

DEFINE-RECORD command statement into the universe count used

for the determination of tolerance levels when using the COUNT-IIPUTES command statement

(11) Increase the number and types of data format referencing now

permitted External numeric data (type N) could be expanded to

handle 18 digit numbers Also signed numeric and packed

decimal data format could be added The signed numeric format

would allow both leading blanks and negative data values This

format is essential ihen dealing with data files generated by

FORTRAN programs

0

+

G-3 47

(12) Increase the number of comparison strings permitted for alphanumeric data items This would allow for easier processing of alphanumeric strings but would require modifications to the system data dictionary

(13) If a range of valid values were specified with the declaration of a user-identifier in the DEFINE-RECORD command an automatic range check could be performed prior to CONCOR giving the user access to the data items on the questionnaire The inclusion of such values are now done by many users in the form of comment statements This enhancement would also facilitate the use of the dictionary by tabulation software

(14) The language format used in the Dictionary Division could be modified to recognize both the space and comma character as valid delimiters The user would still be required to code two commas to receive the system default value for any parameter

(15) Provide a repetition factor in the initialization of arrays

(16) Print the data dictionary name in the titles of the dictionary documentation or provide a means by which the user may specify a title to appear in the listings

112 Execution Division

(M) Implement the Run Control Section This would allow the user to increase execution speed of the EDITOR program This would be accomplished by eliminating some of the system protection features such as division by zero and out of bounds checking In this secion the user could also specify new maximum values for the EDITOR error limits

(2) Provide an internal identifier (OCCURRENCE-PTR) which would contain the occurrence number of the record currently being filtered The only problem to resolve is how to treat this id-ntifier outside of the Filter Routines

(3) Implement a CREATE commnand This command would allow the user to create a record that Was not found on the input file This implementation would require special care as it gives the user the ability to change the valme of a CONCOR internal identifier (TYPE-CGUNT) and modify the contents of the store Another consequence of this command is what action is to be taken in the calculation of tolerance in terms of this new records inclusion into the universe of observations and number of changes

iK

48 G-4

(4) The code generated in the EDITOR program for the DRECODE

command should be modified to utilized a lookup table approach

This would decrease the execution time of the command but woul C

require additional communication between the GENED and GENDD

programs

(5) Permit the user to continue the message text field over

successive lines of code

(6) Implementation of a SET command command that would change

CONCORs default occurrence pointer for the duration of the SET

command This command would be ever helpful as the validation

of the existence of a record would only have to be done once

when the SET command was encounterd The major drawback to

this command is in the users ability to remember the action

taken during the last execution of the command when the

Execution Division command statements may cover many pages of

code

(7) Implement a method of specifying different data formats for

fields on the WRITE command statement This would give greater

flexibility to the user but would require major revision to the

routines now used to process the WRITE command

(8) Provide the user with two sets of internal identifiers The

first set currently exists in the system and is valid on the

basis of the current control area (if specified) being

executed The second set would be accumulated on a run total

basis and would require the creation of new CONCOR internal

ident ifiers

(9) Implement automatic array declarations for capturing allocation

frequency distributions This could be done by the system for

discrete values (especially if point number 13 above is

implemented) but a mechanism for handling continuous values

would need to be designed nother program would then be

required to format the information gathered into a readable

form

113 Report Division

(I) Provide a means by which the user may specify their own headings or titles -for the reports

(2) Give the user the ability to override page ejection as a means

of saving paper

(3) Provide a neans by which the statistics gathered may be

presented in other aqgregations other than the area-break and

total levels now pro ided This would require another

procedure to aggregate the statistics and pass them onto the

Report Division before the printing of the reports began Thi

extra procedure is required because of program size

cons i derat ions

G-5

49

114 System Level

(1) Generation of assembler (ALC) language code for the EDITOR program to be used on the IBM 360370 series type computers

(2) Modification of the GENSRC program i utilize a lookup table instead of the linear search technioa2 now used

(3) Modification of the RDMSGTXT program to look at continuation lines when searching for the text to be supplied by the system

(4) Hlodification to the system data dictionary to better arrange and make available information needed most frequently Also to examine the possibility of making the section dealing with the message text and report generation a separate file Investigate the industry standards for dictionary format and see how CONCOR meets the requirements set forth

(5) Make the parsing and syntacial routines interactive so as to increase the product ity of the CONCOR user This does not mean to imply that the editing process itself should be made interactive but rather only the development of the user code should be put online

12 Documentation

(1)The development of a self-teaching CONCOR manual

(2) The development of a CONCOR Users Guide

G-6

0

50

2 Survey or other Census Processing

21 Software

(1) Make optional the specification of the record type and questionnaire identification information as some files are not

hierarchical in nature

(2) Provide another input file as a means of doing a check-in procedure of sample cases expected This new input file would contain the master list of those oberservations selected to be

examined

(3) Allow floating point calculations

(4) Provide a mechanism for referencing repetitive sections of questionnaire This referencing (loop letter approach) could be accomplished in the same fashion as interrecord checking is now implemented Note that by using this approach two versions of the software would be required One version would be as CONCOR is now implemented and the second would accept flat data files where the subscript indicated a repetitive section on the survey document

(5) Add a weighting scheme to allow meaningful totals and summary

statistics based upon the sampling frame used to gather the data being edited

(6) Increase the number of user-identifiers allowed in the system because of the more detailed information found on survey forms

(7) Provide a means of manual correction of erroneous data items This correction scheme would utilize the data dictionary and allow for correction based upon the user-identifer name and

questionnaire identification CELADE has a COBOL version of

this program but it has not been finished nor tested

42

APPENDIX H

CONCOR SYSTEM INTERNAL VARIABLES

0 APPENO H

CONCOR SYSTEM INTERNAL VARIABLES

1 EOF-FLAG

2 ERROR-TN-QUESTIONNAIRE

3 CURRENT-POINTER-VALUE

4 TYPE-COUNT ( )

5 RECORD-COUNT

6 QUESTIONNAIRE-COUNT

7 RECORDS-IN-STORE

8 INVALID-RECORD-TYPE-COUNT

9 INCOMPLETE-FLAG

10 CONTINUATION-FLAG

11 INVALID-RECORD-FLAG

EOF-FLAG

ERRORS-IN-QUESTIONNAIRE

TYPE-COUNT ( )

IN-RECORD-COUNT

IN-QUESTIONNAIRE-COUNT

RECORDS-IN-STORE

INVALID-RECORD-TYPE-COUNT

INCOMPLETE-FLAG

CONTINUATION-FLAG

OUT-OF-RANGE

NOT-NUMBER

BLANK

(new internal counters) OUT-RECORD-COUNT OUTPUT-QUEST-COUNT WRITE-RECORD-COUNT

Change in spelling of variable

Not mentioned--possible ommission from manual

Not implemented

Note Current documentation equivocates when some variables

are reset--most are initialized with each control area break

Though implemented as reserved identifiers due to poor documentation of (old) CONCOR exact values and utility were unknown

These apparently new counters permit access to valuable totals within control areas

Note These are not cumulative counters Also outputshyquest-count violates naming conversion eg in out

APPENDIX I

CONCOR-EDITOR EXECUTION STATISTICS

C 0 N C o R - E D I T O R EXECUT ION S-TA-T1 ST I CS RUN DATEt 517I

APPENDIX I

CONTROL AREA 0 INPUTS RELD OUTPUTS BY OUTPUTCOHMAND OUTPUTS BY W0ITECOMP4JD NO VALI 0 SEQUENCE NUMBER 00000 00 0 RECTYPES 0

OUESTIONNAIRES RECORDS 0 QUEST KEYWORD- 0 ECORDS- RECORDS FORQUFSTshy

bullDIVISION BY ZERO LINE NUMBER i18 to DIVISION BY ZERO LINE NUMBER 118 bo DIVISION BY ZERO 00 LINE NUMBER 118

Oboo DIVISION BY ZERO 00 LINE NUMBER 118 LTNE |UJMP 000 DIVISION BY ZERO 000 LINE NUMBER 118

bo DIVISION BY ZERO 0 LINE NUMBER 118 00 RUN ABORTED DUE TO DIVISION BY ZERO ERRORS 00 1it LIST IR003 = Ro3t

1 6 42

112 LIST

Comments One of the most sever criticisms 114 LIST IDIVISIION BY ZErO ERROR Cw FOLLOWS of the old CONCOR package was that ]l LET RO01 003 = O it would cease processing for no

apparent reason--ABEND without generating any messages The most )16 LIST iRO01 R003 SET TO ZEROS Pou19 Ron3 conmon types of cessations are division by zero array coordinate 117 LIST RO0 0 = errors and user specified terminations A program to deliberately 118 LET P003 = R002 RO01 test CONCORS ability to providediagnostics was written The system 119 LIST LIST R003 R002RO01 =1 R0031 perfoned exceptionally well and correctly provided the line numbers in the 120 source program which caused the data exceptions The program text has been 121 superimposed upon the Execution Statistics page for illustration purposes 122 END TEST OF OPERANDS AND MSG

123

124

1 5 PEGISTERS RESET TO ZERO

125 LET R001 RO00 R003 = nt

0 N C 1 - Z D I T 0 it - ( r S T A T I S T I C S QJi DAT 0119

I ~ - ~APPENDIX b)APPENDIX TL I A Cr14A) 01 3TUTS JRtTF -0 VtLIQ AP-A I PUT R - 3ITD JTS LY UTPUT - Y C

I)ESTIjAAIES - PECLRW cU] I S T igry-t QEC UP )S irrkqS F] ) T

VLLII= FF= 15( C 1 u U-1 Tr JJ TI A)[ 9 )i 1= TI =WATE ) L i - U = VI L IJ) L T J3r-= I n

D 1 4A 1 EItR P IXSTI J A 1 L = 7 II I fIAT E I i Y C

= y 4 I 19f CZ U AT -Z-2j 1ESTIJh14 E CJ)PI) IATE= 2 V L 1E= L T I P1 =UM 11

= rU F r

F R L- -- I JE TI rjA OrPR eTE= rUMiF=E II V1 LIIF= LIT V M

tYfZU ttj gR R QUIESTI3Ii1 j Vi r IATE= LI n l FJw== 0R V J L T C ( -pJIT E~Q - cUESTT Y2ZjI 1= g rfiflUI)IATL= VeLUJE= P LIf PP= 1 3)

bull~ - A ED 011 1 Tn CCy -tENCE C rGR DI T FOR UP S )r 0 JCCUJ0EICF FJ71 ECGPO

173

175 UNTIL ROO1 gt 10 VARY P001 FROM 1 By Comments

In this example RO01 a 2 dimensional array 177 DO with 10 rows and 10 columns is attempting

178 UNTIL R002 - 10 VARY BO FROM 1 BY 1 to update the smaller (5 X 5) TA2 Nonshy- Uexistent element references caused the error

The program text has been superimposed on this

page for illustration purposes

180 ALLOCATE NOOP = TA1 (RO0] ROO)I

JAI UPDATE TA2 (RO01 R002) RO031

182 LIST N002 = R0037 NOO R003

1P3 LIST VALUES OF TA1 = RO03v TAl (RO01 R002)

184 LIST

185 LIST ITA2=TA2 RO01 Rn02)l

l6 END-DOI LIST

187 END-DO

deg r- -- shy

APPENJDIXI1 J( j

CONTROL AREA INPUTS READ OUTPUTS BY OUTPUT LOMANO OUTPUTS TEltCOMMNDfN

SEQUENCE NUMBER 11T A QUESTIONNAIRES RECORDS QUEST KEYWORD RECORDS SFf- 0 T rr-

JOB STOPPED DUETO USER STOP-RUN COMMAND ON LINC 215 - - -- 2 15 _ _ 4$ A

EDITOR -- NO RMAL END OF J B r T I1)

214 IF IN-RECORD-COUNT gt Comments In this example when the internal variable1215 ITHEN STOP-RUN END-THEN was than

215 TE END-THEN IN-RECORD-COUNT was greater than 500 processing was directed by the source COBOL CONCOR language

216 1 program to cease The program text has been - upon this page for illustrationI _-EC superimposed

S IF IN-RECORD-COUNT- lt 2 - - purposes ~~~~~~~~~~- bull - - ------I-

218~ THEN -LIST lt(THAN 20RECS INR

amp-~~~~~ ntai-beDcent

I

APPENDIX J

DIAGNOSTIC MEFSAGE GUIDE EXAMPLE

APPENDIX J DD-2

WARNING(DD-001) BLANK COMMAND STATEMENT LINE

EXPLANATION A blank command statement line was found in the user Dictionary Division command statement file

ACTION TAKEN Parsing continued with the next command statement line in the Dictionary Division command file

USER RESPONSE Put a comment indicator () in column 1 of the command statement line if spacing is desired if not remove the command statement line

ERROR (DD-002) ILLEGAL FIRST CHARACTER IN STRING

EXPLANATION A character not in the CONCOR legal character set (A-Z 0-9 - + the comma character () the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found as the first character in the string

ACTION TAKEN Parsing began again with the next string

USER RESPONSE Check for possible keying errors and ensure that the first character is a member of the valid CONCOR

character set

ERROR (DD-003) END OF LINE REACHED WITH NO DELIMITER FOR ILLEGAL STRING

EXPLANATION A string starting with a character not valid in the CONCOR legal character set (A-Z 0-9 - + the comma character ()p the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found without one of the delimiter characters ( =) when the end of the command line was reached

ACTION TAKEN Parsing began again with the next user command statement line in the Dictionary Division command file

USER RESPONSE Check for possible keying errors and ensure that each of the characters in the string is a member of the CONCOR valid character set (A-Z 0-9 - + the comma character () he delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ())

ERROR (DD-004) COMMENT INDICATOR () IN STRING

EXPLANATION The comment indicator character () was found embedded in a string

Page 40: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …

OLD CONCOR VERSION 1 DECEMBER 1978

EDIT-SPECIFICATION

PROLOG

FILTER TYPE ( )

EPILOG

ALLOCATE (ALL) ASSERT (AST)

END ERROR

FILTER TYPE ( ) WITH

IF LET

RANGE (RNG) RECODE (REC) STOP --keyword

UPDATE (UPD) WRITE (WRT) XRECODE (XREC)

NEW CONCOR VERSION 2 DECEMBER 1979

EXECUTION-DIVISION

REPORT-CONTROL-SECTION

COUNT-IMPUTES GENERATE-EDIT-STATISTICS

EDIT-SPECIFICATION-SECTION

PROLOG-ROUTINE

FILTER-ROUTINE (name-of-record-type)

EPILOG-ROUTINE

ALLOCATE (ALLOC) ASSERT (ASRT) DRECODE (DRCD) END-DIVISION

EXIT

GRECODE (GRCD) IF LET LIST OUTPUT RANGE (RNG)

STOP UNTIL UPDATE (UPD) WRITE (WRT)

END-DIVISION

COMMENTS

Periods preceding division and section names signify that they are treated as comments by CONCOR

END-DIVISION signifies the termination of the CONCOR division organization and must always be present even though division identifiers have not been implemented

This figure additionally permits a comparison of the EDIT-SPECI-FICATION commands of both CONCOR language versions Note the changes in abbreviations and that some keyword options (not shown) have been altered even where general corresponshydence exists Individual commands are discussed on the following pages

CONCOR LANGUAGE COMMAND STATEMENTS

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

ALLOCATE (ALL) ALLOCATE (ALLOC) Allows for the assignment of values and with the UPDATE command providesthe imputation mechanism Major difference is this statement now providesfor a receiving-identifier list and statistics are generated as coded inthe GENERATE-EDIT-STATISTICS command option of the EXECUTION-DIVISION

ASSERT fAST) ASSERT (ASRT) The ASSERT command is the basic consistency editor command of CONCOR Command now provides for a NOERROR and MESSAGE option Additionallyby utilizinq the DUMPR DUMPO keywords provides the user with the option of using all the current value of all user-identifiers defined for the record type being processed Other changes to this command include the pass end-pass fail end-fail constructions for the P purpose of maintaining structured logic

(see RECODE) DRECODE (DRCD) Provides a method for recoding data items having a starting value of zero and continue in ascending sequence Replaces t previous XRECODE command

EXIT Provides the means to terminate EXECUTION (ESCAPE) of command statements within the UNTIL command

END END-IF END-THEN

A solitary END command is no lonqer permitted in conditional construction Note the similarity to COBOL pseudocode

END-ELSE

ERROR not implemented No longer supported in language

(continued)

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

FILTER TYPE ( ) WITH not implemented No longer are option statements supported in language This command once specified essentially a GOTO depending on construction

(see XRECODE) GRECODE (GRCD) This new command provides a method for the group recoding of input values

IF IF The IF statement has properties found virtually in all programming languages ie provides a means of evaluating a condition and controlling the execution of alternative logical statements Similar to pseudocode a CONCOR programmer must utilize the structured command IF THEN END-THEN ELSE END-ELSE construction in place of the IF THEN END ELSE END statements acceptable to the 1978 version

LET LET Virtually unchanged

LIST New command which allows for the generation of specific messages to be displayed in the report of edit statistics by questionnaire as well as specific individual identifiers

OUTPUT New command which provides the means by which records will be written to the output-file

(continuej)

4)bull

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 VERSION 1

RANGE (RNG)

RECODE (REC)

STOP- keyword

UPDATE (UPD)

(continued)

DECEMBER 1979 VERSION 2

RANGE (RNG)

name changed

STOP- keyword

UNTIL DO END-DO

UPDATE (UPD)

COMMENTS

As a basic editing command of CONCOR the December 1979 version permits the listing of the out-of-range values as well as the entire record or questionnaire through DUMPR and DUMPO optionswithin the command Another new option of this command involves the utilization of a PASS END-PASS FAIL END-FAIL construction permitting the specification of logicalcommand paths depending upon the outcome of the range test

Name changed to DRECODE Virtually identical with the DRC command in the COCENTS system

Unchanged

DO-END-DO one of the most significant and powerful enhancements to the CONCOR language this command provides a loopingcapability similar to most other high-level pogramminglanguages The number of iterations is under user control as well as the value initially assigned to the user identifier (counter) which is incremented with each successive execution of the command statements

Virtually unchanged it is the basic mechanism for maintaining arrays involved in imputation processes

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

WRITE (WRT) WRITE (WRT) The WRITE language statement while once utilized as the primary output command is now used to create an auxiliary or derivative file Statistics may be accumulated with the specification of the GENERATE-EDIT-STATISTICS option of the EXECUTION-DIVISION (See WRITE-FILE of DICTIONARY-DIVISION)

XRECODE (XREC) name changed Old command which has become the GRECODE command in the December 1979 version

0 0 0

DECEMBER 1978 VERSION 1

DECEMBER 1979 VERSION 2

Not implemented REPORT-DIViSION

DISPLAY-CONTROL-SECTION

DISPLAY-EDIT-STATISTICS

TOLERANCE-CONTROL-SECTION

ERROR-RATE-CHECK REJECT-FILE

END-DIVISION

COMMENTS

This new division (note division and section names treated as comments) provides the means to controlthe type of edit statistics reports to be generated Generally all reports produced by the commands ofthis division are organized according to the AREA-CONTROL command of the DICTIONARY-DIVISION Notethat this means the input data file must be preproshycessed (presorted external to CONCOR) on the basisof the AREA-CONTROL data field as CONCOR is incapableof merging uncontiguous records

There is currently no command to enable users to specify unique report headings on the listingfrom this division

APPENDIX G

ISPC FUTURE ENHANCEMENT LIST

G-1

Appendix G

Future Versions of CONCOR Design Considerations

The present version of the CONCOR system (Basic COBOL Version 2) was designed to provide adequate computing power to properly edit housing and population census da a using automatic correction techniquns As a result CONCOR should not be construed to be the ultimate software package for generalized data editing The developers of this version of CONCOR realize that there is a group of improvements that can be made to the system which would facilitate the census data editing process A second group of changes or enhancements also exist that could facilitate the editing of survey or other types of census data It should be noted that some of the modifications to meet these two objectives at times are mutually exclusive Some of the possible modifications and considerations are outlined below

1 Housing and Population Census Processing

1 1 Software

1111 Dictionary Division

(1) Implementation of the Dictionary Attributes Section This would allow the user to specify the dictionary size (number of pages) on disk and control other file characteristics such as volume specification and retention periods if applicable

(2) Addition of an optional input andor output file(s) which could be used to initialize and save values stored in arrays and used during the allocation process This feature would require additional syntax analysis and the creation of new commands to perform the movement of the data values

(3) An output record format area (Output Record Section) could be added This would allow the creation of edited output files in a different format and length than the file read into CONCOR This implementation would require other enhanceients to provide a simple means of moving values from th-iT i tato tile output file Unique naming of user-identifiers would also need to be addressed Another possiblity is the declaration of both the input and output location in the same command when specifying the contents of tile data record to be edited

(4) Addition of user-identifier naires to fields used in the controlling of summary statistics and unique questionnaire determination This would require modification to the system level data dictionary

46 G-2

(5) Increasing the number of area and questionnaire control fields

This implementation would require modification lo the

dictionary as tell as to the error records passed to the Report

Division Another problem would be the formatting of the extra

data values in the report headings

(6) Permit the mixing of difFerent data item types in the CONTROL

commands This would allow the greatest flexibilty in choosing

control fields In order to implement this enhancement the

dictionary and the generated COBOL program (EDITOR) would

require modification

(7) Give the user access to the values in the control fields during

the editing process Either the user would be prohibited from

changing these values (as is currently done with CONCOR

internal identifiers) to prevent the creation of new questionnaires or the summary information gathered on a

questionnaire basis would become meaningless Another

ramification of this enhancement is how to handle fields defined as alphanumeric as CONCOR internally works in binary

(8) Allow input data records with a similar format to be defined by

the same DEFINE-RECORD command statement This could be

accomplished by permitting multiple values in the RECORD-TYPE

clause of the DEFINE-RECORD command A possible problem here

is in the determination of which value to be move to the output

record when the record is to be written out using the OUTPUT

command 0

(9) Implement a COMMION-DATA concept to handle items command to all

record types This could be costly in terms of core storage

and additional conversion time if a storage area is allocated

for each identifier for each record type Only permitting one

common area for all records would require validation of the

data to ensure the values were exactly the same on each record

This wJould require the CONCOR program to perform many more

internal checks when reading the data file into the store area

(10) Permit the selective inclusion of data items found in the

DEFINE-RECORD command statement into the universe count used

for the determination of tolerance levels when using the COUNT-IIPUTES command statement

(11) Increase the number and types of data format referencing now

permitted External numeric data (type N) could be expanded to

handle 18 digit numbers Also signed numeric and packed

decimal data format could be added The signed numeric format

would allow both leading blanks and negative data values This

format is essential ihen dealing with data files generated by

FORTRAN programs

0

+

G-3 47

(12) Increase the number of comparison strings permitted for alphanumeric data items This would allow for easier processing of alphanumeric strings but would require modifications to the system data dictionary

(13) If a range of valid values were specified with the declaration of a user-identifier in the DEFINE-RECORD command an automatic range check could be performed prior to CONCOR giving the user access to the data items on the questionnaire The inclusion of such values are now done by many users in the form of comment statements This enhancement would also facilitate the use of the dictionary by tabulation software

(14) The language format used in the Dictionary Division could be modified to recognize both the space and comma character as valid delimiters The user would still be required to code two commas to receive the system default value for any parameter

(15) Provide a repetition factor in the initialization of arrays

(16) Print the data dictionary name in the titles of the dictionary documentation or provide a means by which the user may specify a title to appear in the listings

112 Execution Division

(M) Implement the Run Control Section This would allow the user to increase execution speed of the EDITOR program This would be accomplished by eliminating some of the system protection features such as division by zero and out of bounds checking In this secion the user could also specify new maximum values for the EDITOR error limits

(2) Provide an internal identifier (OCCURRENCE-PTR) which would contain the occurrence number of the record currently being filtered The only problem to resolve is how to treat this id-ntifier outside of the Filter Routines

(3) Implement a CREATE commnand This command would allow the user to create a record that Was not found on the input file This implementation would require special care as it gives the user the ability to change the valme of a CONCOR internal identifier (TYPE-CGUNT) and modify the contents of the store Another consequence of this command is what action is to be taken in the calculation of tolerance in terms of this new records inclusion into the universe of observations and number of changes

iK

48 G-4

(4) The code generated in the EDITOR program for the DRECODE

command should be modified to utilized a lookup table approach

This would decrease the execution time of the command but woul C

require additional communication between the GENED and GENDD

programs

(5) Permit the user to continue the message text field over

successive lines of code

(6) Implementation of a SET command command that would change

CONCORs default occurrence pointer for the duration of the SET

command This command would be ever helpful as the validation

of the existence of a record would only have to be done once

when the SET command was encounterd The major drawback to

this command is in the users ability to remember the action

taken during the last execution of the command when the

Execution Division command statements may cover many pages of

code

(7) Implement a method of specifying different data formats for

fields on the WRITE command statement This would give greater

flexibility to the user but would require major revision to the

routines now used to process the WRITE command

(8) Provide the user with two sets of internal identifiers The

first set currently exists in the system and is valid on the

basis of the current control area (if specified) being

executed The second set would be accumulated on a run total

basis and would require the creation of new CONCOR internal

ident ifiers

(9) Implement automatic array declarations for capturing allocation

frequency distributions This could be done by the system for

discrete values (especially if point number 13 above is

implemented) but a mechanism for handling continuous values

would need to be designed nother program would then be

required to format the information gathered into a readable

form

113 Report Division

(I) Provide a means by which the user may specify their own headings or titles -for the reports

(2) Give the user the ability to override page ejection as a means

of saving paper

(3) Provide a neans by which the statistics gathered may be

presented in other aqgregations other than the area-break and

total levels now pro ided This would require another

procedure to aggregate the statistics and pass them onto the

Report Division before the printing of the reports began Thi

extra procedure is required because of program size

cons i derat ions

G-5

49

114 System Level

(1) Generation of assembler (ALC) language code for the EDITOR program to be used on the IBM 360370 series type computers

(2) Modification of the GENSRC program i utilize a lookup table instead of the linear search technioa2 now used

(3) Modification of the RDMSGTXT program to look at continuation lines when searching for the text to be supplied by the system

(4) Hlodification to the system data dictionary to better arrange and make available information needed most frequently Also to examine the possibility of making the section dealing with the message text and report generation a separate file Investigate the industry standards for dictionary format and see how CONCOR meets the requirements set forth

(5) Make the parsing and syntacial routines interactive so as to increase the product ity of the CONCOR user This does not mean to imply that the editing process itself should be made interactive but rather only the development of the user code should be put online

12 Documentation

(1)The development of a self-teaching CONCOR manual

(2) The development of a CONCOR Users Guide

G-6

0

50

2 Survey or other Census Processing

21 Software

(1) Make optional the specification of the record type and questionnaire identification information as some files are not

hierarchical in nature

(2) Provide another input file as a means of doing a check-in procedure of sample cases expected This new input file would contain the master list of those oberservations selected to be

examined

(3) Allow floating point calculations

(4) Provide a mechanism for referencing repetitive sections of questionnaire This referencing (loop letter approach) could be accomplished in the same fashion as interrecord checking is now implemented Note that by using this approach two versions of the software would be required One version would be as CONCOR is now implemented and the second would accept flat data files where the subscript indicated a repetitive section on the survey document

(5) Add a weighting scheme to allow meaningful totals and summary

statistics based upon the sampling frame used to gather the data being edited

(6) Increase the number of user-identifiers allowed in the system because of the more detailed information found on survey forms

(7) Provide a means of manual correction of erroneous data items This correction scheme would utilize the data dictionary and allow for correction based upon the user-identifer name and

questionnaire identification CELADE has a COBOL version of

this program but it has not been finished nor tested

42

APPENDIX H

CONCOR SYSTEM INTERNAL VARIABLES

0 APPENO H

CONCOR SYSTEM INTERNAL VARIABLES

1 EOF-FLAG

2 ERROR-TN-QUESTIONNAIRE

3 CURRENT-POINTER-VALUE

4 TYPE-COUNT ( )

5 RECORD-COUNT

6 QUESTIONNAIRE-COUNT

7 RECORDS-IN-STORE

8 INVALID-RECORD-TYPE-COUNT

9 INCOMPLETE-FLAG

10 CONTINUATION-FLAG

11 INVALID-RECORD-FLAG

EOF-FLAG

ERRORS-IN-QUESTIONNAIRE

TYPE-COUNT ( )

IN-RECORD-COUNT

IN-QUESTIONNAIRE-COUNT

RECORDS-IN-STORE

INVALID-RECORD-TYPE-COUNT

INCOMPLETE-FLAG

CONTINUATION-FLAG

OUT-OF-RANGE

NOT-NUMBER

BLANK

(new internal counters) OUT-RECORD-COUNT OUTPUT-QUEST-COUNT WRITE-RECORD-COUNT

Change in spelling of variable

Not mentioned--possible ommission from manual

Not implemented

Note Current documentation equivocates when some variables

are reset--most are initialized with each control area break

Though implemented as reserved identifiers due to poor documentation of (old) CONCOR exact values and utility were unknown

These apparently new counters permit access to valuable totals within control areas

Note These are not cumulative counters Also outputshyquest-count violates naming conversion eg in out

APPENDIX I

CONCOR-EDITOR EXECUTION STATISTICS

C 0 N C o R - E D I T O R EXECUT ION S-TA-T1 ST I CS RUN DATEt 517I

APPENDIX I

CONTROL AREA 0 INPUTS RELD OUTPUTS BY OUTPUTCOHMAND OUTPUTS BY W0ITECOMP4JD NO VALI 0 SEQUENCE NUMBER 00000 00 0 RECTYPES 0

OUESTIONNAIRES RECORDS 0 QUEST KEYWORD- 0 ECORDS- RECORDS FORQUFSTshy

bullDIVISION BY ZERO LINE NUMBER i18 to DIVISION BY ZERO LINE NUMBER 118 bo DIVISION BY ZERO 00 LINE NUMBER 118

Oboo DIVISION BY ZERO 00 LINE NUMBER 118 LTNE |UJMP 000 DIVISION BY ZERO 000 LINE NUMBER 118

bo DIVISION BY ZERO 0 LINE NUMBER 118 00 RUN ABORTED DUE TO DIVISION BY ZERO ERRORS 00 1it LIST IR003 = Ro3t

1 6 42

112 LIST

Comments One of the most sever criticisms 114 LIST IDIVISIION BY ZErO ERROR Cw FOLLOWS of the old CONCOR package was that ]l LET RO01 003 = O it would cease processing for no

apparent reason--ABEND without generating any messages The most )16 LIST iRO01 R003 SET TO ZEROS Pou19 Ron3 conmon types of cessations are division by zero array coordinate 117 LIST RO0 0 = errors and user specified terminations A program to deliberately 118 LET P003 = R002 RO01 test CONCORS ability to providediagnostics was written The system 119 LIST LIST R003 R002RO01 =1 R0031 perfoned exceptionally well and correctly provided the line numbers in the 120 source program which caused the data exceptions The program text has been 121 superimposed upon the Execution Statistics page for illustration purposes 122 END TEST OF OPERANDS AND MSG

123

124

1 5 PEGISTERS RESET TO ZERO

125 LET R001 RO00 R003 = nt

0 N C 1 - Z D I T 0 it - ( r S T A T I S T I C S QJi DAT 0119

I ~ - ~APPENDIX b)APPENDIX TL I A Cr14A) 01 3TUTS JRtTF -0 VtLIQ AP-A I PUT R - 3ITD JTS LY UTPUT - Y C

I)ESTIjAAIES - PECLRW cU] I S T igry-t QEC UP )S irrkqS F] ) T

VLLII= FF= 15( C 1 u U-1 Tr JJ TI A)[ 9 )i 1= TI =WATE ) L i - U = VI L IJ) L T J3r-= I n

D 1 4A 1 EItR P IXSTI J A 1 L = 7 II I fIAT E I i Y C

= y 4 I 19f CZ U AT -Z-2j 1ESTIJh14 E CJ)PI) IATE= 2 V L 1E= L T I P1 =UM 11

= rU F r

F R L- -- I JE TI rjA OrPR eTE= rUMiF=E II V1 LIIF= LIT V M

tYfZU ttj gR R QUIESTI3Ii1 j Vi r IATE= LI n l FJw== 0R V J L T C ( -pJIT E~Q - cUESTT Y2ZjI 1= g rfiflUI)IATL= VeLUJE= P LIf PP= 1 3)

bull~ - A ED 011 1 Tn CCy -tENCE C rGR DI T FOR UP S )r 0 JCCUJ0EICF FJ71 ECGPO

173

175 UNTIL ROO1 gt 10 VARY P001 FROM 1 By Comments

In this example RO01 a 2 dimensional array 177 DO with 10 rows and 10 columns is attempting

178 UNTIL R002 - 10 VARY BO FROM 1 BY 1 to update the smaller (5 X 5) TA2 Nonshy- Uexistent element references caused the error

The program text has been superimposed on this

page for illustration purposes

180 ALLOCATE NOOP = TA1 (RO0] ROO)I

JAI UPDATE TA2 (RO01 R002) RO031

182 LIST N002 = R0037 NOO R003

1P3 LIST VALUES OF TA1 = RO03v TAl (RO01 R002)

184 LIST

185 LIST ITA2=TA2 RO01 Rn02)l

l6 END-DOI LIST

187 END-DO

deg r- -- shy

APPENJDIXI1 J( j

CONTROL AREA INPUTS READ OUTPUTS BY OUTPUT LOMANO OUTPUTS TEltCOMMNDfN

SEQUENCE NUMBER 11T A QUESTIONNAIRES RECORDS QUEST KEYWORD RECORDS SFf- 0 T rr-

JOB STOPPED DUETO USER STOP-RUN COMMAND ON LINC 215 - - -- 2 15 _ _ 4$ A

EDITOR -- NO RMAL END OF J B r T I1)

214 IF IN-RECORD-COUNT gt Comments In this example when the internal variable1215 ITHEN STOP-RUN END-THEN was than

215 TE END-THEN IN-RECORD-COUNT was greater than 500 processing was directed by the source COBOL CONCOR language

216 1 program to cease The program text has been - upon this page for illustrationI _-EC superimposed

S IF IN-RECORD-COUNT- lt 2 - - purposes ~~~~~~~~~~- bull - - ------I-

218~ THEN -LIST lt(THAN 20RECS INR

amp-~~~~~ ntai-beDcent

I

APPENDIX J

DIAGNOSTIC MEFSAGE GUIDE EXAMPLE

APPENDIX J DD-2

WARNING(DD-001) BLANK COMMAND STATEMENT LINE

EXPLANATION A blank command statement line was found in the user Dictionary Division command statement file

ACTION TAKEN Parsing continued with the next command statement line in the Dictionary Division command file

USER RESPONSE Put a comment indicator () in column 1 of the command statement line if spacing is desired if not remove the command statement line

ERROR (DD-002) ILLEGAL FIRST CHARACTER IN STRING

EXPLANATION A character not in the CONCOR legal character set (A-Z 0-9 - + the comma character () the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found as the first character in the string

ACTION TAKEN Parsing began again with the next string

USER RESPONSE Check for possible keying errors and ensure that the first character is a member of the valid CONCOR

character set

ERROR (DD-003) END OF LINE REACHED WITH NO DELIMITER FOR ILLEGAL STRING

EXPLANATION A string starting with a character not valid in the CONCOR legal character set (A-Z 0-9 - + the comma character ()p the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found without one of the delimiter characters ( =) when the end of the command line was reached

ACTION TAKEN Parsing began again with the next user command statement line in the Dictionary Division command file

USER RESPONSE Check for possible keying errors and ensure that each of the characters in the string is a member of the CONCOR valid character set (A-Z 0-9 - + the comma character () he delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ())

ERROR (DD-004) COMMENT INDICATOR () IN STRING

EXPLANATION The comment indicator character () was found embedded in a string

Page 41: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …

CONCOR LANGUAGE COMMAND STATEMENTS

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

ALLOCATE (ALL) ALLOCATE (ALLOC) Allows for the assignment of values and with the UPDATE command providesthe imputation mechanism Major difference is this statement now providesfor a receiving-identifier list and statistics are generated as coded inthe GENERATE-EDIT-STATISTICS command option of the EXECUTION-DIVISION

ASSERT fAST) ASSERT (ASRT) The ASSERT command is the basic consistency editor command of CONCOR Command now provides for a NOERROR and MESSAGE option Additionallyby utilizinq the DUMPR DUMPO keywords provides the user with the option of using all the current value of all user-identifiers defined for the record type being processed Other changes to this command include the pass end-pass fail end-fail constructions for the P purpose of maintaining structured logic

(see RECODE) DRECODE (DRCD) Provides a method for recoding data items having a starting value of zero and continue in ascending sequence Replaces t previous XRECODE command

EXIT Provides the means to terminate EXECUTION (ESCAPE) of command statements within the UNTIL command

END END-IF END-THEN

A solitary END command is no lonqer permitted in conditional construction Note the similarity to COBOL pseudocode

END-ELSE

ERROR not implemented No longer supported in language

(continued)

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

FILTER TYPE ( ) WITH not implemented No longer are option statements supported in language This command once specified essentially a GOTO depending on construction

(see XRECODE) GRECODE (GRCD) This new command provides a method for the group recoding of input values

IF IF The IF statement has properties found virtually in all programming languages ie provides a means of evaluating a condition and controlling the execution of alternative logical statements Similar to pseudocode a CONCOR programmer must utilize the structured command IF THEN END-THEN ELSE END-ELSE construction in place of the IF THEN END ELSE END statements acceptable to the 1978 version

LET LET Virtually unchanged

LIST New command which allows for the generation of specific messages to be displayed in the report of edit statistics by questionnaire as well as specific individual identifiers

OUTPUT New command which provides the means by which records will be written to the output-file

(continuej)

4)bull

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 VERSION 1

RANGE (RNG)

RECODE (REC)

STOP- keyword

UPDATE (UPD)

(continued)

DECEMBER 1979 VERSION 2

RANGE (RNG)

name changed

STOP- keyword

UNTIL DO END-DO

UPDATE (UPD)

COMMENTS

As a basic editing command of CONCOR the December 1979 version permits the listing of the out-of-range values as well as the entire record or questionnaire through DUMPR and DUMPO optionswithin the command Another new option of this command involves the utilization of a PASS END-PASS FAIL END-FAIL construction permitting the specification of logicalcommand paths depending upon the outcome of the range test

Name changed to DRECODE Virtually identical with the DRC command in the COCENTS system

Unchanged

DO-END-DO one of the most significant and powerful enhancements to the CONCOR language this command provides a loopingcapability similar to most other high-level pogramminglanguages The number of iterations is under user control as well as the value initially assigned to the user identifier (counter) which is incremented with each successive execution of the command statements

Virtually unchanged it is the basic mechanism for maintaining arrays involved in imputation processes

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

WRITE (WRT) WRITE (WRT) The WRITE language statement while once utilized as the primary output command is now used to create an auxiliary or derivative file Statistics may be accumulated with the specification of the GENERATE-EDIT-STATISTICS option of the EXECUTION-DIVISION (See WRITE-FILE of DICTIONARY-DIVISION)

XRECODE (XREC) name changed Old command which has become the GRECODE command in the December 1979 version

0 0 0

DECEMBER 1978 VERSION 1

DECEMBER 1979 VERSION 2

Not implemented REPORT-DIViSION

DISPLAY-CONTROL-SECTION

DISPLAY-EDIT-STATISTICS

TOLERANCE-CONTROL-SECTION

ERROR-RATE-CHECK REJECT-FILE

END-DIVISION

COMMENTS

This new division (note division and section names treated as comments) provides the means to controlthe type of edit statistics reports to be generated Generally all reports produced by the commands ofthis division are organized according to the AREA-CONTROL command of the DICTIONARY-DIVISION Notethat this means the input data file must be preproshycessed (presorted external to CONCOR) on the basisof the AREA-CONTROL data field as CONCOR is incapableof merging uncontiguous records

There is currently no command to enable users to specify unique report headings on the listingfrom this division

APPENDIX G

ISPC FUTURE ENHANCEMENT LIST

G-1

Appendix G

Future Versions of CONCOR Design Considerations

The present version of the CONCOR system (Basic COBOL Version 2) was designed to provide adequate computing power to properly edit housing and population census da a using automatic correction techniquns As a result CONCOR should not be construed to be the ultimate software package for generalized data editing The developers of this version of CONCOR realize that there is a group of improvements that can be made to the system which would facilitate the census data editing process A second group of changes or enhancements also exist that could facilitate the editing of survey or other types of census data It should be noted that some of the modifications to meet these two objectives at times are mutually exclusive Some of the possible modifications and considerations are outlined below

1 Housing and Population Census Processing

1 1 Software

1111 Dictionary Division

(1) Implementation of the Dictionary Attributes Section This would allow the user to specify the dictionary size (number of pages) on disk and control other file characteristics such as volume specification and retention periods if applicable

(2) Addition of an optional input andor output file(s) which could be used to initialize and save values stored in arrays and used during the allocation process This feature would require additional syntax analysis and the creation of new commands to perform the movement of the data values

(3) An output record format area (Output Record Section) could be added This would allow the creation of edited output files in a different format and length than the file read into CONCOR This implementation would require other enhanceients to provide a simple means of moving values from th-iT i tato tile output file Unique naming of user-identifiers would also need to be addressed Another possiblity is the declaration of both the input and output location in the same command when specifying the contents of tile data record to be edited

(4) Addition of user-identifier naires to fields used in the controlling of summary statistics and unique questionnaire determination This would require modification to the system level data dictionary

46 G-2

(5) Increasing the number of area and questionnaire control fields

This implementation would require modification lo the

dictionary as tell as to the error records passed to the Report

Division Another problem would be the formatting of the extra

data values in the report headings

(6) Permit the mixing of difFerent data item types in the CONTROL

commands This would allow the greatest flexibilty in choosing

control fields In order to implement this enhancement the

dictionary and the generated COBOL program (EDITOR) would

require modification

(7) Give the user access to the values in the control fields during

the editing process Either the user would be prohibited from

changing these values (as is currently done with CONCOR

internal identifiers) to prevent the creation of new questionnaires or the summary information gathered on a

questionnaire basis would become meaningless Another

ramification of this enhancement is how to handle fields defined as alphanumeric as CONCOR internally works in binary

(8) Allow input data records with a similar format to be defined by

the same DEFINE-RECORD command statement This could be

accomplished by permitting multiple values in the RECORD-TYPE

clause of the DEFINE-RECORD command A possible problem here

is in the determination of which value to be move to the output

record when the record is to be written out using the OUTPUT

command 0

(9) Implement a COMMION-DATA concept to handle items command to all

record types This could be costly in terms of core storage

and additional conversion time if a storage area is allocated

for each identifier for each record type Only permitting one

common area for all records would require validation of the

data to ensure the values were exactly the same on each record

This wJould require the CONCOR program to perform many more

internal checks when reading the data file into the store area

(10) Permit the selective inclusion of data items found in the

DEFINE-RECORD command statement into the universe count used

for the determination of tolerance levels when using the COUNT-IIPUTES command statement

(11) Increase the number and types of data format referencing now

permitted External numeric data (type N) could be expanded to

handle 18 digit numbers Also signed numeric and packed

decimal data format could be added The signed numeric format

would allow both leading blanks and negative data values This

format is essential ihen dealing with data files generated by

FORTRAN programs

0

+

G-3 47

(12) Increase the number of comparison strings permitted for alphanumeric data items This would allow for easier processing of alphanumeric strings but would require modifications to the system data dictionary

(13) If a range of valid values were specified with the declaration of a user-identifier in the DEFINE-RECORD command an automatic range check could be performed prior to CONCOR giving the user access to the data items on the questionnaire The inclusion of such values are now done by many users in the form of comment statements This enhancement would also facilitate the use of the dictionary by tabulation software

(14) The language format used in the Dictionary Division could be modified to recognize both the space and comma character as valid delimiters The user would still be required to code two commas to receive the system default value for any parameter

(15) Provide a repetition factor in the initialization of arrays

(16) Print the data dictionary name in the titles of the dictionary documentation or provide a means by which the user may specify a title to appear in the listings

112 Execution Division

(M) Implement the Run Control Section This would allow the user to increase execution speed of the EDITOR program This would be accomplished by eliminating some of the system protection features such as division by zero and out of bounds checking In this secion the user could also specify new maximum values for the EDITOR error limits

(2) Provide an internal identifier (OCCURRENCE-PTR) which would contain the occurrence number of the record currently being filtered The only problem to resolve is how to treat this id-ntifier outside of the Filter Routines

(3) Implement a CREATE commnand This command would allow the user to create a record that Was not found on the input file This implementation would require special care as it gives the user the ability to change the valme of a CONCOR internal identifier (TYPE-CGUNT) and modify the contents of the store Another consequence of this command is what action is to be taken in the calculation of tolerance in terms of this new records inclusion into the universe of observations and number of changes

iK

48 G-4

(4) The code generated in the EDITOR program for the DRECODE

command should be modified to utilized a lookup table approach

This would decrease the execution time of the command but woul C

require additional communication between the GENED and GENDD

programs

(5) Permit the user to continue the message text field over

successive lines of code

(6) Implementation of a SET command command that would change

CONCORs default occurrence pointer for the duration of the SET

command This command would be ever helpful as the validation

of the existence of a record would only have to be done once

when the SET command was encounterd The major drawback to

this command is in the users ability to remember the action

taken during the last execution of the command when the

Execution Division command statements may cover many pages of

code

(7) Implement a method of specifying different data formats for

fields on the WRITE command statement This would give greater

flexibility to the user but would require major revision to the

routines now used to process the WRITE command

(8) Provide the user with two sets of internal identifiers The

first set currently exists in the system and is valid on the

basis of the current control area (if specified) being

executed The second set would be accumulated on a run total

basis and would require the creation of new CONCOR internal

ident ifiers

(9) Implement automatic array declarations for capturing allocation

frequency distributions This could be done by the system for

discrete values (especially if point number 13 above is

implemented) but a mechanism for handling continuous values

would need to be designed nother program would then be

required to format the information gathered into a readable

form

113 Report Division

(I) Provide a means by which the user may specify their own headings or titles -for the reports

(2) Give the user the ability to override page ejection as a means

of saving paper

(3) Provide a neans by which the statistics gathered may be

presented in other aqgregations other than the area-break and

total levels now pro ided This would require another

procedure to aggregate the statistics and pass them onto the

Report Division before the printing of the reports began Thi

extra procedure is required because of program size

cons i derat ions

G-5

49

114 System Level

(1) Generation of assembler (ALC) language code for the EDITOR program to be used on the IBM 360370 series type computers

(2) Modification of the GENSRC program i utilize a lookup table instead of the linear search technioa2 now used

(3) Modification of the RDMSGTXT program to look at continuation lines when searching for the text to be supplied by the system

(4) Hlodification to the system data dictionary to better arrange and make available information needed most frequently Also to examine the possibility of making the section dealing with the message text and report generation a separate file Investigate the industry standards for dictionary format and see how CONCOR meets the requirements set forth

(5) Make the parsing and syntacial routines interactive so as to increase the product ity of the CONCOR user This does not mean to imply that the editing process itself should be made interactive but rather only the development of the user code should be put online

12 Documentation

(1)The development of a self-teaching CONCOR manual

(2) The development of a CONCOR Users Guide

G-6

0

50

2 Survey or other Census Processing

21 Software

(1) Make optional the specification of the record type and questionnaire identification information as some files are not

hierarchical in nature

(2) Provide another input file as a means of doing a check-in procedure of sample cases expected This new input file would contain the master list of those oberservations selected to be

examined

(3) Allow floating point calculations

(4) Provide a mechanism for referencing repetitive sections of questionnaire This referencing (loop letter approach) could be accomplished in the same fashion as interrecord checking is now implemented Note that by using this approach two versions of the software would be required One version would be as CONCOR is now implemented and the second would accept flat data files where the subscript indicated a repetitive section on the survey document

(5) Add a weighting scheme to allow meaningful totals and summary

statistics based upon the sampling frame used to gather the data being edited

(6) Increase the number of user-identifiers allowed in the system because of the more detailed information found on survey forms

(7) Provide a means of manual correction of erroneous data items This correction scheme would utilize the data dictionary and allow for correction based upon the user-identifer name and

questionnaire identification CELADE has a COBOL version of

this program but it has not been finished nor tested

42

APPENDIX H

CONCOR SYSTEM INTERNAL VARIABLES

0 APPENO H

CONCOR SYSTEM INTERNAL VARIABLES

1 EOF-FLAG

2 ERROR-TN-QUESTIONNAIRE

3 CURRENT-POINTER-VALUE

4 TYPE-COUNT ( )

5 RECORD-COUNT

6 QUESTIONNAIRE-COUNT

7 RECORDS-IN-STORE

8 INVALID-RECORD-TYPE-COUNT

9 INCOMPLETE-FLAG

10 CONTINUATION-FLAG

11 INVALID-RECORD-FLAG

EOF-FLAG

ERRORS-IN-QUESTIONNAIRE

TYPE-COUNT ( )

IN-RECORD-COUNT

IN-QUESTIONNAIRE-COUNT

RECORDS-IN-STORE

INVALID-RECORD-TYPE-COUNT

INCOMPLETE-FLAG

CONTINUATION-FLAG

OUT-OF-RANGE

NOT-NUMBER

BLANK

(new internal counters) OUT-RECORD-COUNT OUTPUT-QUEST-COUNT WRITE-RECORD-COUNT

Change in spelling of variable

Not mentioned--possible ommission from manual

Not implemented

Note Current documentation equivocates when some variables

are reset--most are initialized with each control area break

Though implemented as reserved identifiers due to poor documentation of (old) CONCOR exact values and utility were unknown

These apparently new counters permit access to valuable totals within control areas

Note These are not cumulative counters Also outputshyquest-count violates naming conversion eg in out

APPENDIX I

CONCOR-EDITOR EXECUTION STATISTICS

C 0 N C o R - E D I T O R EXECUT ION S-TA-T1 ST I CS RUN DATEt 517I

APPENDIX I

CONTROL AREA 0 INPUTS RELD OUTPUTS BY OUTPUTCOHMAND OUTPUTS BY W0ITECOMP4JD NO VALI 0 SEQUENCE NUMBER 00000 00 0 RECTYPES 0

OUESTIONNAIRES RECORDS 0 QUEST KEYWORD- 0 ECORDS- RECORDS FORQUFSTshy

bullDIVISION BY ZERO LINE NUMBER i18 to DIVISION BY ZERO LINE NUMBER 118 bo DIVISION BY ZERO 00 LINE NUMBER 118

Oboo DIVISION BY ZERO 00 LINE NUMBER 118 LTNE |UJMP 000 DIVISION BY ZERO 000 LINE NUMBER 118

bo DIVISION BY ZERO 0 LINE NUMBER 118 00 RUN ABORTED DUE TO DIVISION BY ZERO ERRORS 00 1it LIST IR003 = Ro3t

1 6 42

112 LIST

Comments One of the most sever criticisms 114 LIST IDIVISIION BY ZErO ERROR Cw FOLLOWS of the old CONCOR package was that ]l LET RO01 003 = O it would cease processing for no

apparent reason--ABEND without generating any messages The most )16 LIST iRO01 R003 SET TO ZEROS Pou19 Ron3 conmon types of cessations are division by zero array coordinate 117 LIST RO0 0 = errors and user specified terminations A program to deliberately 118 LET P003 = R002 RO01 test CONCORS ability to providediagnostics was written The system 119 LIST LIST R003 R002RO01 =1 R0031 perfoned exceptionally well and correctly provided the line numbers in the 120 source program which caused the data exceptions The program text has been 121 superimposed upon the Execution Statistics page for illustration purposes 122 END TEST OF OPERANDS AND MSG

123

124

1 5 PEGISTERS RESET TO ZERO

125 LET R001 RO00 R003 = nt

0 N C 1 - Z D I T 0 it - ( r S T A T I S T I C S QJi DAT 0119

I ~ - ~APPENDIX b)APPENDIX TL I A Cr14A) 01 3TUTS JRtTF -0 VtLIQ AP-A I PUT R - 3ITD JTS LY UTPUT - Y C

I)ESTIjAAIES - PECLRW cU] I S T igry-t QEC UP )S irrkqS F] ) T

VLLII= FF= 15( C 1 u U-1 Tr JJ TI A)[ 9 )i 1= TI =WATE ) L i - U = VI L IJ) L T J3r-= I n

D 1 4A 1 EItR P IXSTI J A 1 L = 7 II I fIAT E I i Y C

= y 4 I 19f CZ U AT -Z-2j 1ESTIJh14 E CJ)PI) IATE= 2 V L 1E= L T I P1 =UM 11

= rU F r

F R L- -- I JE TI rjA OrPR eTE= rUMiF=E II V1 LIIF= LIT V M

tYfZU ttj gR R QUIESTI3Ii1 j Vi r IATE= LI n l FJw== 0R V J L T C ( -pJIT E~Q - cUESTT Y2ZjI 1= g rfiflUI)IATL= VeLUJE= P LIf PP= 1 3)

bull~ - A ED 011 1 Tn CCy -tENCE C rGR DI T FOR UP S )r 0 JCCUJ0EICF FJ71 ECGPO

173

175 UNTIL ROO1 gt 10 VARY P001 FROM 1 By Comments

In this example RO01 a 2 dimensional array 177 DO with 10 rows and 10 columns is attempting

178 UNTIL R002 - 10 VARY BO FROM 1 BY 1 to update the smaller (5 X 5) TA2 Nonshy- Uexistent element references caused the error

The program text has been superimposed on this

page for illustration purposes

180 ALLOCATE NOOP = TA1 (RO0] ROO)I

JAI UPDATE TA2 (RO01 R002) RO031

182 LIST N002 = R0037 NOO R003

1P3 LIST VALUES OF TA1 = RO03v TAl (RO01 R002)

184 LIST

185 LIST ITA2=TA2 RO01 Rn02)l

l6 END-DOI LIST

187 END-DO

deg r- -- shy

APPENJDIXI1 J( j

CONTROL AREA INPUTS READ OUTPUTS BY OUTPUT LOMANO OUTPUTS TEltCOMMNDfN

SEQUENCE NUMBER 11T A QUESTIONNAIRES RECORDS QUEST KEYWORD RECORDS SFf- 0 T rr-

JOB STOPPED DUETO USER STOP-RUN COMMAND ON LINC 215 - - -- 2 15 _ _ 4$ A

EDITOR -- NO RMAL END OF J B r T I1)

214 IF IN-RECORD-COUNT gt Comments In this example when the internal variable1215 ITHEN STOP-RUN END-THEN was than

215 TE END-THEN IN-RECORD-COUNT was greater than 500 processing was directed by the source COBOL CONCOR language

216 1 program to cease The program text has been - upon this page for illustrationI _-EC superimposed

S IF IN-RECORD-COUNT- lt 2 - - purposes ~~~~~~~~~~- bull - - ------I-

218~ THEN -LIST lt(THAN 20RECS INR

amp-~~~~~ ntai-beDcent

I

APPENDIX J

DIAGNOSTIC MEFSAGE GUIDE EXAMPLE

APPENDIX J DD-2

WARNING(DD-001) BLANK COMMAND STATEMENT LINE

EXPLANATION A blank command statement line was found in the user Dictionary Division command statement file

ACTION TAKEN Parsing continued with the next command statement line in the Dictionary Division command file

USER RESPONSE Put a comment indicator () in column 1 of the command statement line if spacing is desired if not remove the command statement line

ERROR (DD-002) ILLEGAL FIRST CHARACTER IN STRING

EXPLANATION A character not in the CONCOR legal character set (A-Z 0-9 - + the comma character () the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found as the first character in the string

ACTION TAKEN Parsing began again with the next string

USER RESPONSE Check for possible keying errors and ensure that the first character is a member of the valid CONCOR

character set

ERROR (DD-003) END OF LINE REACHED WITH NO DELIMITER FOR ILLEGAL STRING

EXPLANATION A string starting with a character not valid in the CONCOR legal character set (A-Z 0-9 - + the comma character ()p the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found without one of the delimiter characters ( =) when the end of the command line was reached

ACTION TAKEN Parsing began again with the next user command statement line in the Dictionary Division command file

USER RESPONSE Check for possible keying errors and ensure that each of the characters in the string is a member of the CONCOR valid character set (A-Z 0-9 - + the comma character () he delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ())

ERROR (DD-004) COMMENT INDICATOR () IN STRING

EXPLANATION The comment indicator character () was found embedded in a string

Page 42: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

FILTER TYPE ( ) WITH not implemented No longer are option statements supported in language This command once specified essentially a GOTO depending on construction

(see XRECODE) GRECODE (GRCD) This new command provides a method for the group recoding of input values

IF IF The IF statement has properties found virtually in all programming languages ie provides a means of evaluating a condition and controlling the execution of alternative logical statements Similar to pseudocode a CONCOR programmer must utilize the structured command IF THEN END-THEN ELSE END-ELSE construction in place of the IF THEN END ELSE END statements acceptable to the 1978 version

LET LET Virtually unchanged

LIST New command which allows for the generation of specific messages to be displayed in the report of edit statistics by questionnaire as well as specific individual identifiers

OUTPUT New command which provides the means by which records will be written to the output-file

(continuej)

4)bull

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 VERSION 1

RANGE (RNG)

RECODE (REC)

STOP- keyword

UPDATE (UPD)

(continued)

DECEMBER 1979 VERSION 2

RANGE (RNG)

name changed

STOP- keyword

UNTIL DO END-DO

UPDATE (UPD)

COMMENTS

As a basic editing command of CONCOR the December 1979 version permits the listing of the out-of-range values as well as the entire record or questionnaire through DUMPR and DUMPO optionswithin the command Another new option of this command involves the utilization of a PASS END-PASS FAIL END-FAIL construction permitting the specification of logicalcommand paths depending upon the outcome of the range test

Name changed to DRECODE Virtually identical with the DRC command in the COCENTS system

Unchanged

DO-END-DO one of the most significant and powerful enhancements to the CONCOR language this command provides a loopingcapability similar to most other high-level pogramminglanguages The number of iterations is under user control as well as the value initially assigned to the user identifier (counter) which is incremented with each successive execution of the command statements

Virtually unchanged it is the basic mechanism for maintaining arrays involved in imputation processes

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

WRITE (WRT) WRITE (WRT) The WRITE language statement while once utilized as the primary output command is now used to create an auxiliary or derivative file Statistics may be accumulated with the specification of the GENERATE-EDIT-STATISTICS option of the EXECUTION-DIVISION (See WRITE-FILE of DICTIONARY-DIVISION)

XRECODE (XREC) name changed Old command which has become the GRECODE command in the December 1979 version

0 0 0

DECEMBER 1978 VERSION 1

DECEMBER 1979 VERSION 2

Not implemented REPORT-DIViSION

DISPLAY-CONTROL-SECTION

DISPLAY-EDIT-STATISTICS

TOLERANCE-CONTROL-SECTION

ERROR-RATE-CHECK REJECT-FILE

END-DIVISION

COMMENTS

This new division (note division and section names treated as comments) provides the means to controlthe type of edit statistics reports to be generated Generally all reports produced by the commands ofthis division are organized according to the AREA-CONTROL command of the DICTIONARY-DIVISION Notethat this means the input data file must be preproshycessed (presorted external to CONCOR) on the basisof the AREA-CONTROL data field as CONCOR is incapableof merging uncontiguous records

There is currently no command to enable users to specify unique report headings on the listingfrom this division

APPENDIX G

ISPC FUTURE ENHANCEMENT LIST

G-1

Appendix G

Future Versions of CONCOR Design Considerations

The present version of the CONCOR system (Basic COBOL Version 2) was designed to provide adequate computing power to properly edit housing and population census da a using automatic correction techniquns As a result CONCOR should not be construed to be the ultimate software package for generalized data editing The developers of this version of CONCOR realize that there is a group of improvements that can be made to the system which would facilitate the census data editing process A second group of changes or enhancements also exist that could facilitate the editing of survey or other types of census data It should be noted that some of the modifications to meet these two objectives at times are mutually exclusive Some of the possible modifications and considerations are outlined below

1 Housing and Population Census Processing

1 1 Software

1111 Dictionary Division

(1) Implementation of the Dictionary Attributes Section This would allow the user to specify the dictionary size (number of pages) on disk and control other file characteristics such as volume specification and retention periods if applicable

(2) Addition of an optional input andor output file(s) which could be used to initialize and save values stored in arrays and used during the allocation process This feature would require additional syntax analysis and the creation of new commands to perform the movement of the data values

(3) An output record format area (Output Record Section) could be added This would allow the creation of edited output files in a different format and length than the file read into CONCOR This implementation would require other enhanceients to provide a simple means of moving values from th-iT i tato tile output file Unique naming of user-identifiers would also need to be addressed Another possiblity is the declaration of both the input and output location in the same command when specifying the contents of tile data record to be edited

(4) Addition of user-identifier naires to fields used in the controlling of summary statistics and unique questionnaire determination This would require modification to the system level data dictionary

46 G-2

(5) Increasing the number of area and questionnaire control fields

This implementation would require modification lo the

dictionary as tell as to the error records passed to the Report

Division Another problem would be the formatting of the extra

data values in the report headings

(6) Permit the mixing of difFerent data item types in the CONTROL

commands This would allow the greatest flexibilty in choosing

control fields In order to implement this enhancement the

dictionary and the generated COBOL program (EDITOR) would

require modification

(7) Give the user access to the values in the control fields during

the editing process Either the user would be prohibited from

changing these values (as is currently done with CONCOR

internal identifiers) to prevent the creation of new questionnaires or the summary information gathered on a

questionnaire basis would become meaningless Another

ramification of this enhancement is how to handle fields defined as alphanumeric as CONCOR internally works in binary

(8) Allow input data records with a similar format to be defined by

the same DEFINE-RECORD command statement This could be

accomplished by permitting multiple values in the RECORD-TYPE

clause of the DEFINE-RECORD command A possible problem here

is in the determination of which value to be move to the output

record when the record is to be written out using the OUTPUT

command 0

(9) Implement a COMMION-DATA concept to handle items command to all

record types This could be costly in terms of core storage

and additional conversion time if a storage area is allocated

for each identifier for each record type Only permitting one

common area for all records would require validation of the

data to ensure the values were exactly the same on each record

This wJould require the CONCOR program to perform many more

internal checks when reading the data file into the store area

(10) Permit the selective inclusion of data items found in the

DEFINE-RECORD command statement into the universe count used

for the determination of tolerance levels when using the COUNT-IIPUTES command statement

(11) Increase the number and types of data format referencing now

permitted External numeric data (type N) could be expanded to

handle 18 digit numbers Also signed numeric and packed

decimal data format could be added The signed numeric format

would allow both leading blanks and negative data values This

format is essential ihen dealing with data files generated by

FORTRAN programs

0

+

G-3 47

(12) Increase the number of comparison strings permitted for alphanumeric data items This would allow for easier processing of alphanumeric strings but would require modifications to the system data dictionary

(13) If a range of valid values were specified with the declaration of a user-identifier in the DEFINE-RECORD command an automatic range check could be performed prior to CONCOR giving the user access to the data items on the questionnaire The inclusion of such values are now done by many users in the form of comment statements This enhancement would also facilitate the use of the dictionary by tabulation software

(14) The language format used in the Dictionary Division could be modified to recognize both the space and comma character as valid delimiters The user would still be required to code two commas to receive the system default value for any parameter

(15) Provide a repetition factor in the initialization of arrays

(16) Print the data dictionary name in the titles of the dictionary documentation or provide a means by which the user may specify a title to appear in the listings

112 Execution Division

(M) Implement the Run Control Section This would allow the user to increase execution speed of the EDITOR program This would be accomplished by eliminating some of the system protection features such as division by zero and out of bounds checking In this secion the user could also specify new maximum values for the EDITOR error limits

(2) Provide an internal identifier (OCCURRENCE-PTR) which would contain the occurrence number of the record currently being filtered The only problem to resolve is how to treat this id-ntifier outside of the Filter Routines

(3) Implement a CREATE commnand This command would allow the user to create a record that Was not found on the input file This implementation would require special care as it gives the user the ability to change the valme of a CONCOR internal identifier (TYPE-CGUNT) and modify the contents of the store Another consequence of this command is what action is to be taken in the calculation of tolerance in terms of this new records inclusion into the universe of observations and number of changes

iK

48 G-4

(4) The code generated in the EDITOR program for the DRECODE

command should be modified to utilized a lookup table approach

This would decrease the execution time of the command but woul C

require additional communication between the GENED and GENDD

programs

(5) Permit the user to continue the message text field over

successive lines of code

(6) Implementation of a SET command command that would change

CONCORs default occurrence pointer for the duration of the SET

command This command would be ever helpful as the validation

of the existence of a record would only have to be done once

when the SET command was encounterd The major drawback to

this command is in the users ability to remember the action

taken during the last execution of the command when the

Execution Division command statements may cover many pages of

code

(7) Implement a method of specifying different data formats for

fields on the WRITE command statement This would give greater

flexibility to the user but would require major revision to the

routines now used to process the WRITE command

(8) Provide the user with two sets of internal identifiers The

first set currently exists in the system and is valid on the

basis of the current control area (if specified) being

executed The second set would be accumulated on a run total

basis and would require the creation of new CONCOR internal

ident ifiers

(9) Implement automatic array declarations for capturing allocation

frequency distributions This could be done by the system for

discrete values (especially if point number 13 above is

implemented) but a mechanism for handling continuous values

would need to be designed nother program would then be

required to format the information gathered into a readable

form

113 Report Division

(I) Provide a means by which the user may specify their own headings or titles -for the reports

(2) Give the user the ability to override page ejection as a means

of saving paper

(3) Provide a neans by which the statistics gathered may be

presented in other aqgregations other than the area-break and

total levels now pro ided This would require another

procedure to aggregate the statistics and pass them onto the

Report Division before the printing of the reports began Thi

extra procedure is required because of program size

cons i derat ions

G-5

49

114 System Level

(1) Generation of assembler (ALC) language code for the EDITOR program to be used on the IBM 360370 series type computers

(2) Modification of the GENSRC program i utilize a lookup table instead of the linear search technioa2 now used

(3) Modification of the RDMSGTXT program to look at continuation lines when searching for the text to be supplied by the system

(4) Hlodification to the system data dictionary to better arrange and make available information needed most frequently Also to examine the possibility of making the section dealing with the message text and report generation a separate file Investigate the industry standards for dictionary format and see how CONCOR meets the requirements set forth

(5) Make the parsing and syntacial routines interactive so as to increase the product ity of the CONCOR user This does not mean to imply that the editing process itself should be made interactive but rather only the development of the user code should be put online

12 Documentation

(1)The development of a self-teaching CONCOR manual

(2) The development of a CONCOR Users Guide

G-6

0

50

2 Survey or other Census Processing

21 Software

(1) Make optional the specification of the record type and questionnaire identification information as some files are not

hierarchical in nature

(2) Provide another input file as a means of doing a check-in procedure of sample cases expected This new input file would contain the master list of those oberservations selected to be

examined

(3) Allow floating point calculations

(4) Provide a mechanism for referencing repetitive sections of questionnaire This referencing (loop letter approach) could be accomplished in the same fashion as interrecord checking is now implemented Note that by using this approach two versions of the software would be required One version would be as CONCOR is now implemented and the second would accept flat data files where the subscript indicated a repetitive section on the survey document

(5) Add a weighting scheme to allow meaningful totals and summary

statistics based upon the sampling frame used to gather the data being edited

(6) Increase the number of user-identifiers allowed in the system because of the more detailed information found on survey forms

(7) Provide a means of manual correction of erroneous data items This correction scheme would utilize the data dictionary and allow for correction based upon the user-identifer name and

questionnaire identification CELADE has a COBOL version of

this program but it has not been finished nor tested

42

APPENDIX H

CONCOR SYSTEM INTERNAL VARIABLES

0 APPENO H

CONCOR SYSTEM INTERNAL VARIABLES

1 EOF-FLAG

2 ERROR-TN-QUESTIONNAIRE

3 CURRENT-POINTER-VALUE

4 TYPE-COUNT ( )

5 RECORD-COUNT

6 QUESTIONNAIRE-COUNT

7 RECORDS-IN-STORE

8 INVALID-RECORD-TYPE-COUNT

9 INCOMPLETE-FLAG

10 CONTINUATION-FLAG

11 INVALID-RECORD-FLAG

EOF-FLAG

ERRORS-IN-QUESTIONNAIRE

TYPE-COUNT ( )

IN-RECORD-COUNT

IN-QUESTIONNAIRE-COUNT

RECORDS-IN-STORE

INVALID-RECORD-TYPE-COUNT

INCOMPLETE-FLAG

CONTINUATION-FLAG

OUT-OF-RANGE

NOT-NUMBER

BLANK

(new internal counters) OUT-RECORD-COUNT OUTPUT-QUEST-COUNT WRITE-RECORD-COUNT

Change in spelling of variable

Not mentioned--possible ommission from manual

Not implemented

Note Current documentation equivocates when some variables

are reset--most are initialized with each control area break

Though implemented as reserved identifiers due to poor documentation of (old) CONCOR exact values and utility were unknown

These apparently new counters permit access to valuable totals within control areas

Note These are not cumulative counters Also outputshyquest-count violates naming conversion eg in out

APPENDIX I

CONCOR-EDITOR EXECUTION STATISTICS

C 0 N C o R - E D I T O R EXECUT ION S-TA-T1 ST I CS RUN DATEt 517I

APPENDIX I

CONTROL AREA 0 INPUTS RELD OUTPUTS BY OUTPUTCOHMAND OUTPUTS BY W0ITECOMP4JD NO VALI 0 SEQUENCE NUMBER 00000 00 0 RECTYPES 0

OUESTIONNAIRES RECORDS 0 QUEST KEYWORD- 0 ECORDS- RECORDS FORQUFSTshy

bullDIVISION BY ZERO LINE NUMBER i18 to DIVISION BY ZERO LINE NUMBER 118 bo DIVISION BY ZERO 00 LINE NUMBER 118

Oboo DIVISION BY ZERO 00 LINE NUMBER 118 LTNE |UJMP 000 DIVISION BY ZERO 000 LINE NUMBER 118

bo DIVISION BY ZERO 0 LINE NUMBER 118 00 RUN ABORTED DUE TO DIVISION BY ZERO ERRORS 00 1it LIST IR003 = Ro3t

1 6 42

112 LIST

Comments One of the most sever criticisms 114 LIST IDIVISIION BY ZErO ERROR Cw FOLLOWS of the old CONCOR package was that ]l LET RO01 003 = O it would cease processing for no

apparent reason--ABEND without generating any messages The most )16 LIST iRO01 R003 SET TO ZEROS Pou19 Ron3 conmon types of cessations are division by zero array coordinate 117 LIST RO0 0 = errors and user specified terminations A program to deliberately 118 LET P003 = R002 RO01 test CONCORS ability to providediagnostics was written The system 119 LIST LIST R003 R002RO01 =1 R0031 perfoned exceptionally well and correctly provided the line numbers in the 120 source program which caused the data exceptions The program text has been 121 superimposed upon the Execution Statistics page for illustration purposes 122 END TEST OF OPERANDS AND MSG

123

124

1 5 PEGISTERS RESET TO ZERO

125 LET R001 RO00 R003 = nt

0 N C 1 - Z D I T 0 it - ( r S T A T I S T I C S QJi DAT 0119

I ~ - ~APPENDIX b)APPENDIX TL I A Cr14A) 01 3TUTS JRtTF -0 VtLIQ AP-A I PUT R - 3ITD JTS LY UTPUT - Y C

I)ESTIjAAIES - PECLRW cU] I S T igry-t QEC UP )S irrkqS F] ) T

VLLII= FF= 15( C 1 u U-1 Tr JJ TI A)[ 9 )i 1= TI =WATE ) L i - U = VI L IJ) L T J3r-= I n

D 1 4A 1 EItR P IXSTI J A 1 L = 7 II I fIAT E I i Y C

= y 4 I 19f CZ U AT -Z-2j 1ESTIJh14 E CJ)PI) IATE= 2 V L 1E= L T I P1 =UM 11

= rU F r

F R L- -- I JE TI rjA OrPR eTE= rUMiF=E II V1 LIIF= LIT V M

tYfZU ttj gR R QUIESTI3Ii1 j Vi r IATE= LI n l FJw== 0R V J L T C ( -pJIT E~Q - cUESTT Y2ZjI 1= g rfiflUI)IATL= VeLUJE= P LIf PP= 1 3)

bull~ - A ED 011 1 Tn CCy -tENCE C rGR DI T FOR UP S )r 0 JCCUJ0EICF FJ71 ECGPO

173

175 UNTIL ROO1 gt 10 VARY P001 FROM 1 By Comments

In this example RO01 a 2 dimensional array 177 DO with 10 rows and 10 columns is attempting

178 UNTIL R002 - 10 VARY BO FROM 1 BY 1 to update the smaller (5 X 5) TA2 Nonshy- Uexistent element references caused the error

The program text has been superimposed on this

page for illustration purposes

180 ALLOCATE NOOP = TA1 (RO0] ROO)I

JAI UPDATE TA2 (RO01 R002) RO031

182 LIST N002 = R0037 NOO R003

1P3 LIST VALUES OF TA1 = RO03v TAl (RO01 R002)

184 LIST

185 LIST ITA2=TA2 RO01 Rn02)l

l6 END-DOI LIST

187 END-DO

deg r- -- shy

APPENJDIXI1 J( j

CONTROL AREA INPUTS READ OUTPUTS BY OUTPUT LOMANO OUTPUTS TEltCOMMNDfN

SEQUENCE NUMBER 11T A QUESTIONNAIRES RECORDS QUEST KEYWORD RECORDS SFf- 0 T rr-

JOB STOPPED DUETO USER STOP-RUN COMMAND ON LINC 215 - - -- 2 15 _ _ 4$ A

EDITOR -- NO RMAL END OF J B r T I1)

214 IF IN-RECORD-COUNT gt Comments In this example when the internal variable1215 ITHEN STOP-RUN END-THEN was than

215 TE END-THEN IN-RECORD-COUNT was greater than 500 processing was directed by the source COBOL CONCOR language

216 1 program to cease The program text has been - upon this page for illustrationI _-EC superimposed

S IF IN-RECORD-COUNT- lt 2 - - purposes ~~~~~~~~~~- bull - - ------I-

218~ THEN -LIST lt(THAN 20RECS INR

amp-~~~~~ ntai-beDcent

I

APPENDIX J

DIAGNOSTIC MEFSAGE GUIDE EXAMPLE

APPENDIX J DD-2

WARNING(DD-001) BLANK COMMAND STATEMENT LINE

EXPLANATION A blank command statement line was found in the user Dictionary Division command statement file

ACTION TAKEN Parsing continued with the next command statement line in the Dictionary Division command file

USER RESPONSE Put a comment indicator () in column 1 of the command statement line if spacing is desired if not remove the command statement line

ERROR (DD-002) ILLEGAL FIRST CHARACTER IN STRING

EXPLANATION A character not in the CONCOR legal character set (A-Z 0-9 - + the comma character () the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found as the first character in the string

ACTION TAKEN Parsing began again with the next string

USER RESPONSE Check for possible keying errors and ensure that the first character is a member of the valid CONCOR

character set

ERROR (DD-003) END OF LINE REACHED WITH NO DELIMITER FOR ILLEGAL STRING

EXPLANATION A string starting with a character not valid in the CONCOR legal character set (A-Z 0-9 - + the comma character ()p the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found without one of the delimiter characters ( =) when the end of the command line was reached

ACTION TAKEN Parsing began again with the next user command statement line in the Dictionary Division command file

USER RESPONSE Check for possible keying errors and ensure that each of the characters in the string is a member of the CONCOR valid character set (A-Z 0-9 - + the comma character () he delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ())

ERROR (DD-004) COMMENT INDICATOR () IN STRING

EXPLANATION The comment indicator character () was found embedded in a string

Page 43: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 VERSION 1

RANGE (RNG)

RECODE (REC)

STOP- keyword

UPDATE (UPD)

(continued)

DECEMBER 1979 VERSION 2

RANGE (RNG)

name changed

STOP- keyword

UNTIL DO END-DO

UPDATE (UPD)

COMMENTS

As a basic editing command of CONCOR the December 1979 version permits the listing of the out-of-range values as well as the entire record or questionnaire through DUMPR and DUMPO optionswithin the command Another new option of this command involves the utilization of a PASS END-PASS FAIL END-FAIL construction permitting the specification of logicalcommand paths depending upon the outcome of the range test

Name changed to DRECODE Virtually identical with the DRC command in the COCENTS system

Unchanged

DO-END-DO one of the most significant and powerful enhancements to the CONCOR language this command provides a loopingcapability similar to most other high-level pogramminglanguages The number of iterations is under user control as well as the value initially assigned to the user identifier (counter) which is incremented with each successive execution of the command statements

Virtually unchanged it is the basic mechanism for maintaining arrays involved in imputation processes

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

WRITE (WRT) WRITE (WRT) The WRITE language statement while once utilized as the primary output command is now used to create an auxiliary or derivative file Statistics may be accumulated with the specification of the GENERATE-EDIT-STATISTICS option of the EXECUTION-DIVISION (See WRITE-FILE of DICTIONARY-DIVISION)

XRECODE (XREC) name changed Old command which has become the GRECODE command in the December 1979 version

0 0 0

DECEMBER 1978 VERSION 1

DECEMBER 1979 VERSION 2

Not implemented REPORT-DIViSION

DISPLAY-CONTROL-SECTION

DISPLAY-EDIT-STATISTICS

TOLERANCE-CONTROL-SECTION

ERROR-RATE-CHECK REJECT-FILE

END-DIVISION

COMMENTS

This new division (note division and section names treated as comments) provides the means to controlthe type of edit statistics reports to be generated Generally all reports produced by the commands ofthis division are organized according to the AREA-CONTROL command of the DICTIONARY-DIVISION Notethat this means the input data file must be preproshycessed (presorted external to CONCOR) on the basisof the AREA-CONTROL data field as CONCOR is incapableof merging uncontiguous records

There is currently no command to enable users to specify unique report headings on the listingfrom this division

APPENDIX G

ISPC FUTURE ENHANCEMENT LIST

G-1

Appendix G

Future Versions of CONCOR Design Considerations

The present version of the CONCOR system (Basic COBOL Version 2) was designed to provide adequate computing power to properly edit housing and population census da a using automatic correction techniquns As a result CONCOR should not be construed to be the ultimate software package for generalized data editing The developers of this version of CONCOR realize that there is a group of improvements that can be made to the system which would facilitate the census data editing process A second group of changes or enhancements also exist that could facilitate the editing of survey or other types of census data It should be noted that some of the modifications to meet these two objectives at times are mutually exclusive Some of the possible modifications and considerations are outlined below

1 Housing and Population Census Processing

1 1 Software

1111 Dictionary Division

(1) Implementation of the Dictionary Attributes Section This would allow the user to specify the dictionary size (number of pages) on disk and control other file characteristics such as volume specification and retention periods if applicable

(2) Addition of an optional input andor output file(s) which could be used to initialize and save values stored in arrays and used during the allocation process This feature would require additional syntax analysis and the creation of new commands to perform the movement of the data values

(3) An output record format area (Output Record Section) could be added This would allow the creation of edited output files in a different format and length than the file read into CONCOR This implementation would require other enhanceients to provide a simple means of moving values from th-iT i tato tile output file Unique naming of user-identifiers would also need to be addressed Another possiblity is the declaration of both the input and output location in the same command when specifying the contents of tile data record to be edited

(4) Addition of user-identifier naires to fields used in the controlling of summary statistics and unique questionnaire determination This would require modification to the system level data dictionary

46 G-2

(5) Increasing the number of area and questionnaire control fields

This implementation would require modification lo the

dictionary as tell as to the error records passed to the Report

Division Another problem would be the formatting of the extra

data values in the report headings

(6) Permit the mixing of difFerent data item types in the CONTROL

commands This would allow the greatest flexibilty in choosing

control fields In order to implement this enhancement the

dictionary and the generated COBOL program (EDITOR) would

require modification

(7) Give the user access to the values in the control fields during

the editing process Either the user would be prohibited from

changing these values (as is currently done with CONCOR

internal identifiers) to prevent the creation of new questionnaires or the summary information gathered on a

questionnaire basis would become meaningless Another

ramification of this enhancement is how to handle fields defined as alphanumeric as CONCOR internally works in binary

(8) Allow input data records with a similar format to be defined by

the same DEFINE-RECORD command statement This could be

accomplished by permitting multiple values in the RECORD-TYPE

clause of the DEFINE-RECORD command A possible problem here

is in the determination of which value to be move to the output

record when the record is to be written out using the OUTPUT

command 0

(9) Implement a COMMION-DATA concept to handle items command to all

record types This could be costly in terms of core storage

and additional conversion time if a storage area is allocated

for each identifier for each record type Only permitting one

common area for all records would require validation of the

data to ensure the values were exactly the same on each record

This wJould require the CONCOR program to perform many more

internal checks when reading the data file into the store area

(10) Permit the selective inclusion of data items found in the

DEFINE-RECORD command statement into the universe count used

for the determination of tolerance levels when using the COUNT-IIPUTES command statement

(11) Increase the number and types of data format referencing now

permitted External numeric data (type N) could be expanded to

handle 18 digit numbers Also signed numeric and packed

decimal data format could be added The signed numeric format

would allow both leading blanks and negative data values This

format is essential ihen dealing with data files generated by

FORTRAN programs

0

+

G-3 47

(12) Increase the number of comparison strings permitted for alphanumeric data items This would allow for easier processing of alphanumeric strings but would require modifications to the system data dictionary

(13) If a range of valid values were specified with the declaration of a user-identifier in the DEFINE-RECORD command an automatic range check could be performed prior to CONCOR giving the user access to the data items on the questionnaire The inclusion of such values are now done by many users in the form of comment statements This enhancement would also facilitate the use of the dictionary by tabulation software

(14) The language format used in the Dictionary Division could be modified to recognize both the space and comma character as valid delimiters The user would still be required to code two commas to receive the system default value for any parameter

(15) Provide a repetition factor in the initialization of arrays

(16) Print the data dictionary name in the titles of the dictionary documentation or provide a means by which the user may specify a title to appear in the listings

112 Execution Division

(M) Implement the Run Control Section This would allow the user to increase execution speed of the EDITOR program This would be accomplished by eliminating some of the system protection features such as division by zero and out of bounds checking In this secion the user could also specify new maximum values for the EDITOR error limits

(2) Provide an internal identifier (OCCURRENCE-PTR) which would contain the occurrence number of the record currently being filtered The only problem to resolve is how to treat this id-ntifier outside of the Filter Routines

(3) Implement a CREATE commnand This command would allow the user to create a record that Was not found on the input file This implementation would require special care as it gives the user the ability to change the valme of a CONCOR internal identifier (TYPE-CGUNT) and modify the contents of the store Another consequence of this command is what action is to be taken in the calculation of tolerance in terms of this new records inclusion into the universe of observations and number of changes

iK

48 G-4

(4) The code generated in the EDITOR program for the DRECODE

command should be modified to utilized a lookup table approach

This would decrease the execution time of the command but woul C

require additional communication between the GENED and GENDD

programs

(5) Permit the user to continue the message text field over

successive lines of code

(6) Implementation of a SET command command that would change

CONCORs default occurrence pointer for the duration of the SET

command This command would be ever helpful as the validation

of the existence of a record would only have to be done once

when the SET command was encounterd The major drawback to

this command is in the users ability to remember the action

taken during the last execution of the command when the

Execution Division command statements may cover many pages of

code

(7) Implement a method of specifying different data formats for

fields on the WRITE command statement This would give greater

flexibility to the user but would require major revision to the

routines now used to process the WRITE command

(8) Provide the user with two sets of internal identifiers The

first set currently exists in the system and is valid on the

basis of the current control area (if specified) being

executed The second set would be accumulated on a run total

basis and would require the creation of new CONCOR internal

ident ifiers

(9) Implement automatic array declarations for capturing allocation

frequency distributions This could be done by the system for

discrete values (especially if point number 13 above is

implemented) but a mechanism for handling continuous values

would need to be designed nother program would then be

required to format the information gathered into a readable

form

113 Report Division

(I) Provide a means by which the user may specify their own headings or titles -for the reports

(2) Give the user the ability to override page ejection as a means

of saving paper

(3) Provide a neans by which the statistics gathered may be

presented in other aqgregations other than the area-break and

total levels now pro ided This would require another

procedure to aggregate the statistics and pass them onto the

Report Division before the printing of the reports began Thi

extra procedure is required because of program size

cons i derat ions

G-5

49

114 System Level

(1) Generation of assembler (ALC) language code for the EDITOR program to be used on the IBM 360370 series type computers

(2) Modification of the GENSRC program i utilize a lookup table instead of the linear search technioa2 now used

(3) Modification of the RDMSGTXT program to look at continuation lines when searching for the text to be supplied by the system

(4) Hlodification to the system data dictionary to better arrange and make available information needed most frequently Also to examine the possibility of making the section dealing with the message text and report generation a separate file Investigate the industry standards for dictionary format and see how CONCOR meets the requirements set forth

(5) Make the parsing and syntacial routines interactive so as to increase the product ity of the CONCOR user This does not mean to imply that the editing process itself should be made interactive but rather only the development of the user code should be put online

12 Documentation

(1)The development of a self-teaching CONCOR manual

(2) The development of a CONCOR Users Guide

G-6

0

50

2 Survey or other Census Processing

21 Software

(1) Make optional the specification of the record type and questionnaire identification information as some files are not

hierarchical in nature

(2) Provide another input file as a means of doing a check-in procedure of sample cases expected This new input file would contain the master list of those oberservations selected to be

examined

(3) Allow floating point calculations

(4) Provide a mechanism for referencing repetitive sections of questionnaire This referencing (loop letter approach) could be accomplished in the same fashion as interrecord checking is now implemented Note that by using this approach two versions of the software would be required One version would be as CONCOR is now implemented and the second would accept flat data files where the subscript indicated a repetitive section on the survey document

(5) Add a weighting scheme to allow meaningful totals and summary

statistics based upon the sampling frame used to gather the data being edited

(6) Increase the number of user-identifiers allowed in the system because of the more detailed information found on survey forms

(7) Provide a means of manual correction of erroneous data items This correction scheme would utilize the data dictionary and allow for correction based upon the user-identifer name and

questionnaire identification CELADE has a COBOL version of

this program but it has not been finished nor tested

42

APPENDIX H

CONCOR SYSTEM INTERNAL VARIABLES

0 APPENO H

CONCOR SYSTEM INTERNAL VARIABLES

1 EOF-FLAG

2 ERROR-TN-QUESTIONNAIRE

3 CURRENT-POINTER-VALUE

4 TYPE-COUNT ( )

5 RECORD-COUNT

6 QUESTIONNAIRE-COUNT

7 RECORDS-IN-STORE

8 INVALID-RECORD-TYPE-COUNT

9 INCOMPLETE-FLAG

10 CONTINUATION-FLAG

11 INVALID-RECORD-FLAG

EOF-FLAG

ERRORS-IN-QUESTIONNAIRE

TYPE-COUNT ( )

IN-RECORD-COUNT

IN-QUESTIONNAIRE-COUNT

RECORDS-IN-STORE

INVALID-RECORD-TYPE-COUNT

INCOMPLETE-FLAG

CONTINUATION-FLAG

OUT-OF-RANGE

NOT-NUMBER

BLANK

(new internal counters) OUT-RECORD-COUNT OUTPUT-QUEST-COUNT WRITE-RECORD-COUNT

Change in spelling of variable

Not mentioned--possible ommission from manual

Not implemented

Note Current documentation equivocates when some variables

are reset--most are initialized with each control area break

Though implemented as reserved identifiers due to poor documentation of (old) CONCOR exact values and utility were unknown

These apparently new counters permit access to valuable totals within control areas

Note These are not cumulative counters Also outputshyquest-count violates naming conversion eg in out

APPENDIX I

CONCOR-EDITOR EXECUTION STATISTICS

C 0 N C o R - E D I T O R EXECUT ION S-TA-T1 ST I CS RUN DATEt 517I

APPENDIX I

CONTROL AREA 0 INPUTS RELD OUTPUTS BY OUTPUTCOHMAND OUTPUTS BY W0ITECOMP4JD NO VALI 0 SEQUENCE NUMBER 00000 00 0 RECTYPES 0

OUESTIONNAIRES RECORDS 0 QUEST KEYWORD- 0 ECORDS- RECORDS FORQUFSTshy

bullDIVISION BY ZERO LINE NUMBER i18 to DIVISION BY ZERO LINE NUMBER 118 bo DIVISION BY ZERO 00 LINE NUMBER 118

Oboo DIVISION BY ZERO 00 LINE NUMBER 118 LTNE |UJMP 000 DIVISION BY ZERO 000 LINE NUMBER 118

bo DIVISION BY ZERO 0 LINE NUMBER 118 00 RUN ABORTED DUE TO DIVISION BY ZERO ERRORS 00 1it LIST IR003 = Ro3t

1 6 42

112 LIST

Comments One of the most sever criticisms 114 LIST IDIVISIION BY ZErO ERROR Cw FOLLOWS of the old CONCOR package was that ]l LET RO01 003 = O it would cease processing for no

apparent reason--ABEND without generating any messages The most )16 LIST iRO01 R003 SET TO ZEROS Pou19 Ron3 conmon types of cessations are division by zero array coordinate 117 LIST RO0 0 = errors and user specified terminations A program to deliberately 118 LET P003 = R002 RO01 test CONCORS ability to providediagnostics was written The system 119 LIST LIST R003 R002RO01 =1 R0031 perfoned exceptionally well and correctly provided the line numbers in the 120 source program which caused the data exceptions The program text has been 121 superimposed upon the Execution Statistics page for illustration purposes 122 END TEST OF OPERANDS AND MSG

123

124

1 5 PEGISTERS RESET TO ZERO

125 LET R001 RO00 R003 = nt

0 N C 1 - Z D I T 0 it - ( r S T A T I S T I C S QJi DAT 0119

I ~ - ~APPENDIX b)APPENDIX TL I A Cr14A) 01 3TUTS JRtTF -0 VtLIQ AP-A I PUT R - 3ITD JTS LY UTPUT - Y C

I)ESTIjAAIES - PECLRW cU] I S T igry-t QEC UP )S irrkqS F] ) T

VLLII= FF= 15( C 1 u U-1 Tr JJ TI A)[ 9 )i 1= TI =WATE ) L i - U = VI L IJ) L T J3r-= I n

D 1 4A 1 EItR P IXSTI J A 1 L = 7 II I fIAT E I i Y C

= y 4 I 19f CZ U AT -Z-2j 1ESTIJh14 E CJ)PI) IATE= 2 V L 1E= L T I P1 =UM 11

= rU F r

F R L- -- I JE TI rjA OrPR eTE= rUMiF=E II V1 LIIF= LIT V M

tYfZU ttj gR R QUIESTI3Ii1 j Vi r IATE= LI n l FJw== 0R V J L T C ( -pJIT E~Q - cUESTT Y2ZjI 1= g rfiflUI)IATL= VeLUJE= P LIf PP= 1 3)

bull~ - A ED 011 1 Tn CCy -tENCE C rGR DI T FOR UP S )r 0 JCCUJ0EICF FJ71 ECGPO

173

175 UNTIL ROO1 gt 10 VARY P001 FROM 1 By Comments

In this example RO01 a 2 dimensional array 177 DO with 10 rows and 10 columns is attempting

178 UNTIL R002 - 10 VARY BO FROM 1 BY 1 to update the smaller (5 X 5) TA2 Nonshy- Uexistent element references caused the error

The program text has been superimposed on this

page for illustration purposes

180 ALLOCATE NOOP = TA1 (RO0] ROO)I

JAI UPDATE TA2 (RO01 R002) RO031

182 LIST N002 = R0037 NOO R003

1P3 LIST VALUES OF TA1 = RO03v TAl (RO01 R002)

184 LIST

185 LIST ITA2=TA2 RO01 Rn02)l

l6 END-DOI LIST

187 END-DO

deg r- -- shy

APPENJDIXI1 J( j

CONTROL AREA INPUTS READ OUTPUTS BY OUTPUT LOMANO OUTPUTS TEltCOMMNDfN

SEQUENCE NUMBER 11T A QUESTIONNAIRES RECORDS QUEST KEYWORD RECORDS SFf- 0 T rr-

JOB STOPPED DUETO USER STOP-RUN COMMAND ON LINC 215 - - -- 2 15 _ _ 4$ A

EDITOR -- NO RMAL END OF J B r T I1)

214 IF IN-RECORD-COUNT gt Comments In this example when the internal variable1215 ITHEN STOP-RUN END-THEN was than

215 TE END-THEN IN-RECORD-COUNT was greater than 500 processing was directed by the source COBOL CONCOR language

216 1 program to cease The program text has been - upon this page for illustrationI _-EC superimposed

S IF IN-RECORD-COUNT- lt 2 - - purposes ~~~~~~~~~~- bull - - ------I-

218~ THEN -LIST lt(THAN 20RECS INR

amp-~~~~~ ntai-beDcent

I

APPENDIX J

DIAGNOSTIC MEFSAGE GUIDE EXAMPLE

APPENDIX J DD-2

WARNING(DD-001) BLANK COMMAND STATEMENT LINE

EXPLANATION A blank command statement line was found in the user Dictionary Division command statement file

ACTION TAKEN Parsing continued with the next command statement line in the Dictionary Division command file

USER RESPONSE Put a comment indicator () in column 1 of the command statement line if spacing is desired if not remove the command statement line

ERROR (DD-002) ILLEGAL FIRST CHARACTER IN STRING

EXPLANATION A character not in the CONCOR legal character set (A-Z 0-9 - + the comma character () the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found as the first character in the string

ACTION TAKEN Parsing began again with the next string

USER RESPONSE Check for possible keying errors and ensure that the first character is a member of the valid CONCOR

character set

ERROR (DD-003) END OF LINE REACHED WITH NO DELIMITER FOR ILLEGAL STRING

EXPLANATION A string starting with a character not valid in the CONCOR legal character set (A-Z 0-9 - + the comma character ()p the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found without one of the delimiter characters ( =) when the end of the command line was reached

ACTION TAKEN Parsing began again with the next user command statement line in the Dictionary Division command file

USER RESPONSE Check for possible keying errors and ensure that each of the characters in the string is a member of the CONCOR valid character set (A-Z 0-9 - + the comma character () he delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ())

ERROR (DD-004) COMMENT INDICATOR () IN STRING

EXPLANATION The comment indicator character () was found embedded in a string

Page 44: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …

CONCOR LANGUAGE COMMAND STATEMENTS (continued)

DECEMBER 1978 DECEMBER 1979 VERSION 1 VERSION 2 COMMENTS

WRITE (WRT) WRITE (WRT) The WRITE language statement while once utilized as the primary output command is now used to create an auxiliary or derivative file Statistics may be accumulated with the specification of the GENERATE-EDIT-STATISTICS option of the EXECUTION-DIVISION (See WRITE-FILE of DICTIONARY-DIVISION)

XRECODE (XREC) name changed Old command which has become the GRECODE command in the December 1979 version

0 0 0

DECEMBER 1978 VERSION 1

DECEMBER 1979 VERSION 2

Not implemented REPORT-DIViSION

DISPLAY-CONTROL-SECTION

DISPLAY-EDIT-STATISTICS

TOLERANCE-CONTROL-SECTION

ERROR-RATE-CHECK REJECT-FILE

END-DIVISION

COMMENTS

This new division (note division and section names treated as comments) provides the means to controlthe type of edit statistics reports to be generated Generally all reports produced by the commands ofthis division are organized according to the AREA-CONTROL command of the DICTIONARY-DIVISION Notethat this means the input data file must be preproshycessed (presorted external to CONCOR) on the basisof the AREA-CONTROL data field as CONCOR is incapableof merging uncontiguous records

There is currently no command to enable users to specify unique report headings on the listingfrom this division

APPENDIX G

ISPC FUTURE ENHANCEMENT LIST

G-1

Appendix G

Future Versions of CONCOR Design Considerations

The present version of the CONCOR system (Basic COBOL Version 2) was designed to provide adequate computing power to properly edit housing and population census da a using automatic correction techniquns As a result CONCOR should not be construed to be the ultimate software package for generalized data editing The developers of this version of CONCOR realize that there is a group of improvements that can be made to the system which would facilitate the census data editing process A second group of changes or enhancements also exist that could facilitate the editing of survey or other types of census data It should be noted that some of the modifications to meet these two objectives at times are mutually exclusive Some of the possible modifications and considerations are outlined below

1 Housing and Population Census Processing

1 1 Software

1111 Dictionary Division

(1) Implementation of the Dictionary Attributes Section This would allow the user to specify the dictionary size (number of pages) on disk and control other file characteristics such as volume specification and retention periods if applicable

(2) Addition of an optional input andor output file(s) which could be used to initialize and save values stored in arrays and used during the allocation process This feature would require additional syntax analysis and the creation of new commands to perform the movement of the data values

(3) An output record format area (Output Record Section) could be added This would allow the creation of edited output files in a different format and length than the file read into CONCOR This implementation would require other enhanceients to provide a simple means of moving values from th-iT i tato tile output file Unique naming of user-identifiers would also need to be addressed Another possiblity is the declaration of both the input and output location in the same command when specifying the contents of tile data record to be edited

(4) Addition of user-identifier naires to fields used in the controlling of summary statistics and unique questionnaire determination This would require modification to the system level data dictionary

46 G-2

(5) Increasing the number of area and questionnaire control fields

This implementation would require modification lo the

dictionary as tell as to the error records passed to the Report

Division Another problem would be the formatting of the extra

data values in the report headings

(6) Permit the mixing of difFerent data item types in the CONTROL

commands This would allow the greatest flexibilty in choosing

control fields In order to implement this enhancement the

dictionary and the generated COBOL program (EDITOR) would

require modification

(7) Give the user access to the values in the control fields during

the editing process Either the user would be prohibited from

changing these values (as is currently done with CONCOR

internal identifiers) to prevent the creation of new questionnaires or the summary information gathered on a

questionnaire basis would become meaningless Another

ramification of this enhancement is how to handle fields defined as alphanumeric as CONCOR internally works in binary

(8) Allow input data records with a similar format to be defined by

the same DEFINE-RECORD command statement This could be

accomplished by permitting multiple values in the RECORD-TYPE

clause of the DEFINE-RECORD command A possible problem here

is in the determination of which value to be move to the output

record when the record is to be written out using the OUTPUT

command 0

(9) Implement a COMMION-DATA concept to handle items command to all

record types This could be costly in terms of core storage

and additional conversion time if a storage area is allocated

for each identifier for each record type Only permitting one

common area for all records would require validation of the

data to ensure the values were exactly the same on each record

This wJould require the CONCOR program to perform many more

internal checks when reading the data file into the store area

(10) Permit the selective inclusion of data items found in the

DEFINE-RECORD command statement into the universe count used

for the determination of tolerance levels when using the COUNT-IIPUTES command statement

(11) Increase the number and types of data format referencing now

permitted External numeric data (type N) could be expanded to

handle 18 digit numbers Also signed numeric and packed

decimal data format could be added The signed numeric format

would allow both leading blanks and negative data values This

format is essential ihen dealing with data files generated by

FORTRAN programs

0

+

G-3 47

(12) Increase the number of comparison strings permitted for alphanumeric data items This would allow for easier processing of alphanumeric strings but would require modifications to the system data dictionary

(13) If a range of valid values were specified with the declaration of a user-identifier in the DEFINE-RECORD command an automatic range check could be performed prior to CONCOR giving the user access to the data items on the questionnaire The inclusion of such values are now done by many users in the form of comment statements This enhancement would also facilitate the use of the dictionary by tabulation software

(14) The language format used in the Dictionary Division could be modified to recognize both the space and comma character as valid delimiters The user would still be required to code two commas to receive the system default value for any parameter

(15) Provide a repetition factor in the initialization of arrays

(16) Print the data dictionary name in the titles of the dictionary documentation or provide a means by which the user may specify a title to appear in the listings

112 Execution Division

(M) Implement the Run Control Section This would allow the user to increase execution speed of the EDITOR program This would be accomplished by eliminating some of the system protection features such as division by zero and out of bounds checking In this secion the user could also specify new maximum values for the EDITOR error limits

(2) Provide an internal identifier (OCCURRENCE-PTR) which would contain the occurrence number of the record currently being filtered The only problem to resolve is how to treat this id-ntifier outside of the Filter Routines

(3) Implement a CREATE commnand This command would allow the user to create a record that Was not found on the input file This implementation would require special care as it gives the user the ability to change the valme of a CONCOR internal identifier (TYPE-CGUNT) and modify the contents of the store Another consequence of this command is what action is to be taken in the calculation of tolerance in terms of this new records inclusion into the universe of observations and number of changes

iK

48 G-4

(4) The code generated in the EDITOR program for the DRECODE

command should be modified to utilized a lookup table approach

This would decrease the execution time of the command but woul C

require additional communication between the GENED and GENDD

programs

(5) Permit the user to continue the message text field over

successive lines of code

(6) Implementation of a SET command command that would change

CONCORs default occurrence pointer for the duration of the SET

command This command would be ever helpful as the validation

of the existence of a record would only have to be done once

when the SET command was encounterd The major drawback to

this command is in the users ability to remember the action

taken during the last execution of the command when the

Execution Division command statements may cover many pages of

code

(7) Implement a method of specifying different data formats for

fields on the WRITE command statement This would give greater

flexibility to the user but would require major revision to the

routines now used to process the WRITE command

(8) Provide the user with two sets of internal identifiers The

first set currently exists in the system and is valid on the

basis of the current control area (if specified) being

executed The second set would be accumulated on a run total

basis and would require the creation of new CONCOR internal

ident ifiers

(9) Implement automatic array declarations for capturing allocation

frequency distributions This could be done by the system for

discrete values (especially if point number 13 above is

implemented) but a mechanism for handling continuous values

would need to be designed nother program would then be

required to format the information gathered into a readable

form

113 Report Division

(I) Provide a means by which the user may specify their own headings or titles -for the reports

(2) Give the user the ability to override page ejection as a means

of saving paper

(3) Provide a neans by which the statistics gathered may be

presented in other aqgregations other than the area-break and

total levels now pro ided This would require another

procedure to aggregate the statistics and pass them onto the

Report Division before the printing of the reports began Thi

extra procedure is required because of program size

cons i derat ions

G-5

49

114 System Level

(1) Generation of assembler (ALC) language code for the EDITOR program to be used on the IBM 360370 series type computers

(2) Modification of the GENSRC program i utilize a lookup table instead of the linear search technioa2 now used

(3) Modification of the RDMSGTXT program to look at continuation lines when searching for the text to be supplied by the system

(4) Hlodification to the system data dictionary to better arrange and make available information needed most frequently Also to examine the possibility of making the section dealing with the message text and report generation a separate file Investigate the industry standards for dictionary format and see how CONCOR meets the requirements set forth

(5) Make the parsing and syntacial routines interactive so as to increase the product ity of the CONCOR user This does not mean to imply that the editing process itself should be made interactive but rather only the development of the user code should be put online

12 Documentation

(1)The development of a self-teaching CONCOR manual

(2) The development of a CONCOR Users Guide

G-6

0

50

2 Survey or other Census Processing

21 Software

(1) Make optional the specification of the record type and questionnaire identification information as some files are not

hierarchical in nature

(2) Provide another input file as a means of doing a check-in procedure of sample cases expected This new input file would contain the master list of those oberservations selected to be

examined

(3) Allow floating point calculations

(4) Provide a mechanism for referencing repetitive sections of questionnaire This referencing (loop letter approach) could be accomplished in the same fashion as interrecord checking is now implemented Note that by using this approach two versions of the software would be required One version would be as CONCOR is now implemented and the second would accept flat data files where the subscript indicated a repetitive section on the survey document

(5) Add a weighting scheme to allow meaningful totals and summary

statistics based upon the sampling frame used to gather the data being edited

(6) Increase the number of user-identifiers allowed in the system because of the more detailed information found on survey forms

(7) Provide a means of manual correction of erroneous data items This correction scheme would utilize the data dictionary and allow for correction based upon the user-identifer name and

questionnaire identification CELADE has a COBOL version of

this program but it has not been finished nor tested

42

APPENDIX H

CONCOR SYSTEM INTERNAL VARIABLES

0 APPENO H

CONCOR SYSTEM INTERNAL VARIABLES

1 EOF-FLAG

2 ERROR-TN-QUESTIONNAIRE

3 CURRENT-POINTER-VALUE

4 TYPE-COUNT ( )

5 RECORD-COUNT

6 QUESTIONNAIRE-COUNT

7 RECORDS-IN-STORE

8 INVALID-RECORD-TYPE-COUNT

9 INCOMPLETE-FLAG

10 CONTINUATION-FLAG

11 INVALID-RECORD-FLAG

EOF-FLAG

ERRORS-IN-QUESTIONNAIRE

TYPE-COUNT ( )

IN-RECORD-COUNT

IN-QUESTIONNAIRE-COUNT

RECORDS-IN-STORE

INVALID-RECORD-TYPE-COUNT

INCOMPLETE-FLAG

CONTINUATION-FLAG

OUT-OF-RANGE

NOT-NUMBER

BLANK

(new internal counters) OUT-RECORD-COUNT OUTPUT-QUEST-COUNT WRITE-RECORD-COUNT

Change in spelling of variable

Not mentioned--possible ommission from manual

Not implemented

Note Current documentation equivocates when some variables

are reset--most are initialized with each control area break

Though implemented as reserved identifiers due to poor documentation of (old) CONCOR exact values and utility were unknown

These apparently new counters permit access to valuable totals within control areas

Note These are not cumulative counters Also outputshyquest-count violates naming conversion eg in out

APPENDIX I

CONCOR-EDITOR EXECUTION STATISTICS

C 0 N C o R - E D I T O R EXECUT ION S-TA-T1 ST I CS RUN DATEt 517I

APPENDIX I

CONTROL AREA 0 INPUTS RELD OUTPUTS BY OUTPUTCOHMAND OUTPUTS BY W0ITECOMP4JD NO VALI 0 SEQUENCE NUMBER 00000 00 0 RECTYPES 0

OUESTIONNAIRES RECORDS 0 QUEST KEYWORD- 0 ECORDS- RECORDS FORQUFSTshy

bullDIVISION BY ZERO LINE NUMBER i18 to DIVISION BY ZERO LINE NUMBER 118 bo DIVISION BY ZERO 00 LINE NUMBER 118

Oboo DIVISION BY ZERO 00 LINE NUMBER 118 LTNE |UJMP 000 DIVISION BY ZERO 000 LINE NUMBER 118

bo DIVISION BY ZERO 0 LINE NUMBER 118 00 RUN ABORTED DUE TO DIVISION BY ZERO ERRORS 00 1it LIST IR003 = Ro3t

1 6 42

112 LIST

Comments One of the most sever criticisms 114 LIST IDIVISIION BY ZErO ERROR Cw FOLLOWS of the old CONCOR package was that ]l LET RO01 003 = O it would cease processing for no

apparent reason--ABEND without generating any messages The most )16 LIST iRO01 R003 SET TO ZEROS Pou19 Ron3 conmon types of cessations are division by zero array coordinate 117 LIST RO0 0 = errors and user specified terminations A program to deliberately 118 LET P003 = R002 RO01 test CONCORS ability to providediagnostics was written The system 119 LIST LIST R003 R002RO01 =1 R0031 perfoned exceptionally well and correctly provided the line numbers in the 120 source program which caused the data exceptions The program text has been 121 superimposed upon the Execution Statistics page for illustration purposes 122 END TEST OF OPERANDS AND MSG

123

124

1 5 PEGISTERS RESET TO ZERO

125 LET R001 RO00 R003 = nt

0 N C 1 - Z D I T 0 it - ( r S T A T I S T I C S QJi DAT 0119

I ~ - ~APPENDIX b)APPENDIX TL I A Cr14A) 01 3TUTS JRtTF -0 VtLIQ AP-A I PUT R - 3ITD JTS LY UTPUT - Y C

I)ESTIjAAIES - PECLRW cU] I S T igry-t QEC UP )S irrkqS F] ) T

VLLII= FF= 15( C 1 u U-1 Tr JJ TI A)[ 9 )i 1= TI =WATE ) L i - U = VI L IJ) L T J3r-= I n

D 1 4A 1 EItR P IXSTI J A 1 L = 7 II I fIAT E I i Y C

= y 4 I 19f CZ U AT -Z-2j 1ESTIJh14 E CJ)PI) IATE= 2 V L 1E= L T I P1 =UM 11

= rU F r

F R L- -- I JE TI rjA OrPR eTE= rUMiF=E II V1 LIIF= LIT V M

tYfZU ttj gR R QUIESTI3Ii1 j Vi r IATE= LI n l FJw== 0R V J L T C ( -pJIT E~Q - cUESTT Y2ZjI 1= g rfiflUI)IATL= VeLUJE= P LIf PP= 1 3)

bull~ - A ED 011 1 Tn CCy -tENCE C rGR DI T FOR UP S )r 0 JCCUJ0EICF FJ71 ECGPO

173

175 UNTIL ROO1 gt 10 VARY P001 FROM 1 By Comments

In this example RO01 a 2 dimensional array 177 DO with 10 rows and 10 columns is attempting

178 UNTIL R002 - 10 VARY BO FROM 1 BY 1 to update the smaller (5 X 5) TA2 Nonshy- Uexistent element references caused the error

The program text has been superimposed on this

page for illustration purposes

180 ALLOCATE NOOP = TA1 (RO0] ROO)I

JAI UPDATE TA2 (RO01 R002) RO031

182 LIST N002 = R0037 NOO R003

1P3 LIST VALUES OF TA1 = RO03v TAl (RO01 R002)

184 LIST

185 LIST ITA2=TA2 RO01 Rn02)l

l6 END-DOI LIST

187 END-DO

deg r- -- shy

APPENJDIXI1 J( j

CONTROL AREA INPUTS READ OUTPUTS BY OUTPUT LOMANO OUTPUTS TEltCOMMNDfN

SEQUENCE NUMBER 11T A QUESTIONNAIRES RECORDS QUEST KEYWORD RECORDS SFf- 0 T rr-

JOB STOPPED DUETO USER STOP-RUN COMMAND ON LINC 215 - - -- 2 15 _ _ 4$ A

EDITOR -- NO RMAL END OF J B r T I1)

214 IF IN-RECORD-COUNT gt Comments In this example when the internal variable1215 ITHEN STOP-RUN END-THEN was than

215 TE END-THEN IN-RECORD-COUNT was greater than 500 processing was directed by the source COBOL CONCOR language

216 1 program to cease The program text has been - upon this page for illustrationI _-EC superimposed

S IF IN-RECORD-COUNT- lt 2 - - purposes ~~~~~~~~~~- bull - - ------I-

218~ THEN -LIST lt(THAN 20RECS INR

amp-~~~~~ ntai-beDcent

I

APPENDIX J

DIAGNOSTIC MEFSAGE GUIDE EXAMPLE

APPENDIX J DD-2

WARNING(DD-001) BLANK COMMAND STATEMENT LINE

EXPLANATION A blank command statement line was found in the user Dictionary Division command statement file

ACTION TAKEN Parsing continued with the next command statement line in the Dictionary Division command file

USER RESPONSE Put a comment indicator () in column 1 of the command statement line if spacing is desired if not remove the command statement line

ERROR (DD-002) ILLEGAL FIRST CHARACTER IN STRING

EXPLANATION A character not in the CONCOR legal character set (A-Z 0-9 - + the comma character () the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found as the first character in the string

ACTION TAKEN Parsing began again with the next string

USER RESPONSE Check for possible keying errors and ensure that the first character is a member of the valid CONCOR

character set

ERROR (DD-003) END OF LINE REACHED WITH NO DELIMITER FOR ILLEGAL STRING

EXPLANATION A string starting with a character not valid in the CONCOR legal character set (A-Z 0-9 - + the comma character ()p the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found without one of the delimiter characters ( =) when the end of the command line was reached

ACTION TAKEN Parsing began again with the next user command statement line in the Dictionary Division command file

USER RESPONSE Check for possible keying errors and ensure that each of the characters in the string is a member of the CONCOR valid character set (A-Z 0-9 - + the comma character () he delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ())

ERROR (DD-004) COMMENT INDICATOR () IN STRING

EXPLANATION The comment indicator character () was found embedded in a string

Page 45: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …

DECEMBER 1978 VERSION 1

DECEMBER 1979 VERSION 2

Not implemented REPORT-DIViSION

DISPLAY-CONTROL-SECTION

DISPLAY-EDIT-STATISTICS

TOLERANCE-CONTROL-SECTION

ERROR-RATE-CHECK REJECT-FILE

END-DIVISION

COMMENTS

This new division (note division and section names treated as comments) provides the means to controlthe type of edit statistics reports to be generated Generally all reports produced by the commands ofthis division are organized according to the AREA-CONTROL command of the DICTIONARY-DIVISION Notethat this means the input data file must be preproshycessed (presorted external to CONCOR) on the basisof the AREA-CONTROL data field as CONCOR is incapableof merging uncontiguous records

There is currently no command to enable users to specify unique report headings on the listingfrom this division

APPENDIX G

ISPC FUTURE ENHANCEMENT LIST

G-1

Appendix G

Future Versions of CONCOR Design Considerations

The present version of the CONCOR system (Basic COBOL Version 2) was designed to provide adequate computing power to properly edit housing and population census da a using automatic correction techniquns As a result CONCOR should not be construed to be the ultimate software package for generalized data editing The developers of this version of CONCOR realize that there is a group of improvements that can be made to the system which would facilitate the census data editing process A second group of changes or enhancements also exist that could facilitate the editing of survey or other types of census data It should be noted that some of the modifications to meet these two objectives at times are mutually exclusive Some of the possible modifications and considerations are outlined below

1 Housing and Population Census Processing

1 1 Software

1111 Dictionary Division

(1) Implementation of the Dictionary Attributes Section This would allow the user to specify the dictionary size (number of pages) on disk and control other file characteristics such as volume specification and retention periods if applicable

(2) Addition of an optional input andor output file(s) which could be used to initialize and save values stored in arrays and used during the allocation process This feature would require additional syntax analysis and the creation of new commands to perform the movement of the data values

(3) An output record format area (Output Record Section) could be added This would allow the creation of edited output files in a different format and length than the file read into CONCOR This implementation would require other enhanceients to provide a simple means of moving values from th-iT i tato tile output file Unique naming of user-identifiers would also need to be addressed Another possiblity is the declaration of both the input and output location in the same command when specifying the contents of tile data record to be edited

(4) Addition of user-identifier naires to fields used in the controlling of summary statistics and unique questionnaire determination This would require modification to the system level data dictionary

46 G-2

(5) Increasing the number of area and questionnaire control fields

This implementation would require modification lo the

dictionary as tell as to the error records passed to the Report

Division Another problem would be the formatting of the extra

data values in the report headings

(6) Permit the mixing of difFerent data item types in the CONTROL

commands This would allow the greatest flexibilty in choosing

control fields In order to implement this enhancement the

dictionary and the generated COBOL program (EDITOR) would

require modification

(7) Give the user access to the values in the control fields during

the editing process Either the user would be prohibited from

changing these values (as is currently done with CONCOR

internal identifiers) to prevent the creation of new questionnaires or the summary information gathered on a

questionnaire basis would become meaningless Another

ramification of this enhancement is how to handle fields defined as alphanumeric as CONCOR internally works in binary

(8) Allow input data records with a similar format to be defined by

the same DEFINE-RECORD command statement This could be

accomplished by permitting multiple values in the RECORD-TYPE

clause of the DEFINE-RECORD command A possible problem here

is in the determination of which value to be move to the output

record when the record is to be written out using the OUTPUT

command 0

(9) Implement a COMMION-DATA concept to handle items command to all

record types This could be costly in terms of core storage

and additional conversion time if a storage area is allocated

for each identifier for each record type Only permitting one

common area for all records would require validation of the

data to ensure the values were exactly the same on each record

This wJould require the CONCOR program to perform many more

internal checks when reading the data file into the store area

(10) Permit the selective inclusion of data items found in the

DEFINE-RECORD command statement into the universe count used

for the determination of tolerance levels when using the COUNT-IIPUTES command statement

(11) Increase the number and types of data format referencing now

permitted External numeric data (type N) could be expanded to

handle 18 digit numbers Also signed numeric and packed

decimal data format could be added The signed numeric format

would allow both leading blanks and negative data values This

format is essential ihen dealing with data files generated by

FORTRAN programs

0

+

G-3 47

(12) Increase the number of comparison strings permitted for alphanumeric data items This would allow for easier processing of alphanumeric strings but would require modifications to the system data dictionary

(13) If a range of valid values were specified with the declaration of a user-identifier in the DEFINE-RECORD command an automatic range check could be performed prior to CONCOR giving the user access to the data items on the questionnaire The inclusion of such values are now done by many users in the form of comment statements This enhancement would also facilitate the use of the dictionary by tabulation software

(14) The language format used in the Dictionary Division could be modified to recognize both the space and comma character as valid delimiters The user would still be required to code two commas to receive the system default value for any parameter

(15) Provide a repetition factor in the initialization of arrays

(16) Print the data dictionary name in the titles of the dictionary documentation or provide a means by which the user may specify a title to appear in the listings

112 Execution Division

(M) Implement the Run Control Section This would allow the user to increase execution speed of the EDITOR program This would be accomplished by eliminating some of the system protection features such as division by zero and out of bounds checking In this secion the user could also specify new maximum values for the EDITOR error limits

(2) Provide an internal identifier (OCCURRENCE-PTR) which would contain the occurrence number of the record currently being filtered The only problem to resolve is how to treat this id-ntifier outside of the Filter Routines

(3) Implement a CREATE commnand This command would allow the user to create a record that Was not found on the input file This implementation would require special care as it gives the user the ability to change the valme of a CONCOR internal identifier (TYPE-CGUNT) and modify the contents of the store Another consequence of this command is what action is to be taken in the calculation of tolerance in terms of this new records inclusion into the universe of observations and number of changes

iK

48 G-4

(4) The code generated in the EDITOR program for the DRECODE

command should be modified to utilized a lookup table approach

This would decrease the execution time of the command but woul C

require additional communication between the GENED and GENDD

programs

(5) Permit the user to continue the message text field over

successive lines of code

(6) Implementation of a SET command command that would change

CONCORs default occurrence pointer for the duration of the SET

command This command would be ever helpful as the validation

of the existence of a record would only have to be done once

when the SET command was encounterd The major drawback to

this command is in the users ability to remember the action

taken during the last execution of the command when the

Execution Division command statements may cover many pages of

code

(7) Implement a method of specifying different data formats for

fields on the WRITE command statement This would give greater

flexibility to the user but would require major revision to the

routines now used to process the WRITE command

(8) Provide the user with two sets of internal identifiers The

first set currently exists in the system and is valid on the

basis of the current control area (if specified) being

executed The second set would be accumulated on a run total

basis and would require the creation of new CONCOR internal

ident ifiers

(9) Implement automatic array declarations for capturing allocation

frequency distributions This could be done by the system for

discrete values (especially if point number 13 above is

implemented) but a mechanism for handling continuous values

would need to be designed nother program would then be

required to format the information gathered into a readable

form

113 Report Division

(I) Provide a means by which the user may specify their own headings or titles -for the reports

(2) Give the user the ability to override page ejection as a means

of saving paper

(3) Provide a neans by which the statistics gathered may be

presented in other aqgregations other than the area-break and

total levels now pro ided This would require another

procedure to aggregate the statistics and pass them onto the

Report Division before the printing of the reports began Thi

extra procedure is required because of program size

cons i derat ions

G-5

49

114 System Level

(1) Generation of assembler (ALC) language code for the EDITOR program to be used on the IBM 360370 series type computers

(2) Modification of the GENSRC program i utilize a lookup table instead of the linear search technioa2 now used

(3) Modification of the RDMSGTXT program to look at continuation lines when searching for the text to be supplied by the system

(4) Hlodification to the system data dictionary to better arrange and make available information needed most frequently Also to examine the possibility of making the section dealing with the message text and report generation a separate file Investigate the industry standards for dictionary format and see how CONCOR meets the requirements set forth

(5) Make the parsing and syntacial routines interactive so as to increase the product ity of the CONCOR user This does not mean to imply that the editing process itself should be made interactive but rather only the development of the user code should be put online

12 Documentation

(1)The development of a self-teaching CONCOR manual

(2) The development of a CONCOR Users Guide

G-6

0

50

2 Survey or other Census Processing

21 Software

(1) Make optional the specification of the record type and questionnaire identification information as some files are not

hierarchical in nature

(2) Provide another input file as a means of doing a check-in procedure of sample cases expected This new input file would contain the master list of those oberservations selected to be

examined

(3) Allow floating point calculations

(4) Provide a mechanism for referencing repetitive sections of questionnaire This referencing (loop letter approach) could be accomplished in the same fashion as interrecord checking is now implemented Note that by using this approach two versions of the software would be required One version would be as CONCOR is now implemented and the second would accept flat data files where the subscript indicated a repetitive section on the survey document

(5) Add a weighting scheme to allow meaningful totals and summary

statistics based upon the sampling frame used to gather the data being edited

(6) Increase the number of user-identifiers allowed in the system because of the more detailed information found on survey forms

(7) Provide a means of manual correction of erroneous data items This correction scheme would utilize the data dictionary and allow for correction based upon the user-identifer name and

questionnaire identification CELADE has a COBOL version of

this program but it has not been finished nor tested

42

APPENDIX H

CONCOR SYSTEM INTERNAL VARIABLES

0 APPENO H

CONCOR SYSTEM INTERNAL VARIABLES

1 EOF-FLAG

2 ERROR-TN-QUESTIONNAIRE

3 CURRENT-POINTER-VALUE

4 TYPE-COUNT ( )

5 RECORD-COUNT

6 QUESTIONNAIRE-COUNT

7 RECORDS-IN-STORE

8 INVALID-RECORD-TYPE-COUNT

9 INCOMPLETE-FLAG

10 CONTINUATION-FLAG

11 INVALID-RECORD-FLAG

EOF-FLAG

ERRORS-IN-QUESTIONNAIRE

TYPE-COUNT ( )

IN-RECORD-COUNT

IN-QUESTIONNAIRE-COUNT

RECORDS-IN-STORE

INVALID-RECORD-TYPE-COUNT

INCOMPLETE-FLAG

CONTINUATION-FLAG

OUT-OF-RANGE

NOT-NUMBER

BLANK

(new internal counters) OUT-RECORD-COUNT OUTPUT-QUEST-COUNT WRITE-RECORD-COUNT

Change in spelling of variable

Not mentioned--possible ommission from manual

Not implemented

Note Current documentation equivocates when some variables

are reset--most are initialized with each control area break

Though implemented as reserved identifiers due to poor documentation of (old) CONCOR exact values and utility were unknown

These apparently new counters permit access to valuable totals within control areas

Note These are not cumulative counters Also outputshyquest-count violates naming conversion eg in out

APPENDIX I

CONCOR-EDITOR EXECUTION STATISTICS

C 0 N C o R - E D I T O R EXECUT ION S-TA-T1 ST I CS RUN DATEt 517I

APPENDIX I

CONTROL AREA 0 INPUTS RELD OUTPUTS BY OUTPUTCOHMAND OUTPUTS BY W0ITECOMP4JD NO VALI 0 SEQUENCE NUMBER 00000 00 0 RECTYPES 0

OUESTIONNAIRES RECORDS 0 QUEST KEYWORD- 0 ECORDS- RECORDS FORQUFSTshy

bullDIVISION BY ZERO LINE NUMBER i18 to DIVISION BY ZERO LINE NUMBER 118 bo DIVISION BY ZERO 00 LINE NUMBER 118

Oboo DIVISION BY ZERO 00 LINE NUMBER 118 LTNE |UJMP 000 DIVISION BY ZERO 000 LINE NUMBER 118

bo DIVISION BY ZERO 0 LINE NUMBER 118 00 RUN ABORTED DUE TO DIVISION BY ZERO ERRORS 00 1it LIST IR003 = Ro3t

1 6 42

112 LIST

Comments One of the most sever criticisms 114 LIST IDIVISIION BY ZErO ERROR Cw FOLLOWS of the old CONCOR package was that ]l LET RO01 003 = O it would cease processing for no

apparent reason--ABEND without generating any messages The most )16 LIST iRO01 R003 SET TO ZEROS Pou19 Ron3 conmon types of cessations are division by zero array coordinate 117 LIST RO0 0 = errors and user specified terminations A program to deliberately 118 LET P003 = R002 RO01 test CONCORS ability to providediagnostics was written The system 119 LIST LIST R003 R002RO01 =1 R0031 perfoned exceptionally well and correctly provided the line numbers in the 120 source program which caused the data exceptions The program text has been 121 superimposed upon the Execution Statistics page for illustration purposes 122 END TEST OF OPERANDS AND MSG

123

124

1 5 PEGISTERS RESET TO ZERO

125 LET R001 RO00 R003 = nt

0 N C 1 - Z D I T 0 it - ( r S T A T I S T I C S QJi DAT 0119

I ~ - ~APPENDIX b)APPENDIX TL I A Cr14A) 01 3TUTS JRtTF -0 VtLIQ AP-A I PUT R - 3ITD JTS LY UTPUT - Y C

I)ESTIjAAIES - PECLRW cU] I S T igry-t QEC UP )S irrkqS F] ) T

VLLII= FF= 15( C 1 u U-1 Tr JJ TI A)[ 9 )i 1= TI =WATE ) L i - U = VI L IJ) L T J3r-= I n

D 1 4A 1 EItR P IXSTI J A 1 L = 7 II I fIAT E I i Y C

= y 4 I 19f CZ U AT -Z-2j 1ESTIJh14 E CJ)PI) IATE= 2 V L 1E= L T I P1 =UM 11

= rU F r

F R L- -- I JE TI rjA OrPR eTE= rUMiF=E II V1 LIIF= LIT V M

tYfZU ttj gR R QUIESTI3Ii1 j Vi r IATE= LI n l FJw== 0R V J L T C ( -pJIT E~Q - cUESTT Y2ZjI 1= g rfiflUI)IATL= VeLUJE= P LIf PP= 1 3)

bull~ - A ED 011 1 Tn CCy -tENCE C rGR DI T FOR UP S )r 0 JCCUJ0EICF FJ71 ECGPO

173

175 UNTIL ROO1 gt 10 VARY P001 FROM 1 By Comments

In this example RO01 a 2 dimensional array 177 DO with 10 rows and 10 columns is attempting

178 UNTIL R002 - 10 VARY BO FROM 1 BY 1 to update the smaller (5 X 5) TA2 Nonshy- Uexistent element references caused the error

The program text has been superimposed on this

page for illustration purposes

180 ALLOCATE NOOP = TA1 (RO0] ROO)I

JAI UPDATE TA2 (RO01 R002) RO031

182 LIST N002 = R0037 NOO R003

1P3 LIST VALUES OF TA1 = RO03v TAl (RO01 R002)

184 LIST

185 LIST ITA2=TA2 RO01 Rn02)l

l6 END-DOI LIST

187 END-DO

deg r- -- shy

APPENJDIXI1 J( j

CONTROL AREA INPUTS READ OUTPUTS BY OUTPUT LOMANO OUTPUTS TEltCOMMNDfN

SEQUENCE NUMBER 11T A QUESTIONNAIRES RECORDS QUEST KEYWORD RECORDS SFf- 0 T rr-

JOB STOPPED DUETO USER STOP-RUN COMMAND ON LINC 215 - - -- 2 15 _ _ 4$ A

EDITOR -- NO RMAL END OF J B r T I1)

214 IF IN-RECORD-COUNT gt Comments In this example when the internal variable1215 ITHEN STOP-RUN END-THEN was than

215 TE END-THEN IN-RECORD-COUNT was greater than 500 processing was directed by the source COBOL CONCOR language

216 1 program to cease The program text has been - upon this page for illustrationI _-EC superimposed

S IF IN-RECORD-COUNT- lt 2 - - purposes ~~~~~~~~~~- bull - - ------I-

218~ THEN -LIST lt(THAN 20RECS INR

amp-~~~~~ ntai-beDcent

I

APPENDIX J

DIAGNOSTIC MEFSAGE GUIDE EXAMPLE

APPENDIX J DD-2

WARNING(DD-001) BLANK COMMAND STATEMENT LINE

EXPLANATION A blank command statement line was found in the user Dictionary Division command statement file

ACTION TAKEN Parsing continued with the next command statement line in the Dictionary Division command file

USER RESPONSE Put a comment indicator () in column 1 of the command statement line if spacing is desired if not remove the command statement line

ERROR (DD-002) ILLEGAL FIRST CHARACTER IN STRING

EXPLANATION A character not in the CONCOR legal character set (A-Z 0-9 - + the comma character () the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found as the first character in the string

ACTION TAKEN Parsing began again with the next string

USER RESPONSE Check for possible keying errors and ensure that the first character is a member of the valid CONCOR

character set

ERROR (DD-003) END OF LINE REACHED WITH NO DELIMITER FOR ILLEGAL STRING

EXPLANATION A string starting with a character not valid in the CONCOR legal character set (A-Z 0-9 - + the comma character ()p the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found without one of the delimiter characters ( =) when the end of the command line was reached

ACTION TAKEN Parsing began again with the next user command statement line in the Dictionary Division command file

USER RESPONSE Check for possible keying errors and ensure that each of the characters in the string is a member of the CONCOR valid character set (A-Z 0-9 - + the comma character () he delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ())

ERROR (DD-004) COMMENT INDICATOR () IN STRING

EXPLANATION The comment indicator character () was found embedded in a string

Page 46: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …

APPENDIX G

ISPC FUTURE ENHANCEMENT LIST

G-1

Appendix G

Future Versions of CONCOR Design Considerations

The present version of the CONCOR system (Basic COBOL Version 2) was designed to provide adequate computing power to properly edit housing and population census da a using automatic correction techniquns As a result CONCOR should not be construed to be the ultimate software package for generalized data editing The developers of this version of CONCOR realize that there is a group of improvements that can be made to the system which would facilitate the census data editing process A second group of changes or enhancements also exist that could facilitate the editing of survey or other types of census data It should be noted that some of the modifications to meet these two objectives at times are mutually exclusive Some of the possible modifications and considerations are outlined below

1 Housing and Population Census Processing

1 1 Software

1111 Dictionary Division

(1) Implementation of the Dictionary Attributes Section This would allow the user to specify the dictionary size (number of pages) on disk and control other file characteristics such as volume specification and retention periods if applicable

(2) Addition of an optional input andor output file(s) which could be used to initialize and save values stored in arrays and used during the allocation process This feature would require additional syntax analysis and the creation of new commands to perform the movement of the data values

(3) An output record format area (Output Record Section) could be added This would allow the creation of edited output files in a different format and length than the file read into CONCOR This implementation would require other enhanceients to provide a simple means of moving values from th-iT i tato tile output file Unique naming of user-identifiers would also need to be addressed Another possiblity is the declaration of both the input and output location in the same command when specifying the contents of tile data record to be edited

(4) Addition of user-identifier naires to fields used in the controlling of summary statistics and unique questionnaire determination This would require modification to the system level data dictionary

46 G-2

(5) Increasing the number of area and questionnaire control fields

This implementation would require modification lo the

dictionary as tell as to the error records passed to the Report

Division Another problem would be the formatting of the extra

data values in the report headings

(6) Permit the mixing of difFerent data item types in the CONTROL

commands This would allow the greatest flexibilty in choosing

control fields In order to implement this enhancement the

dictionary and the generated COBOL program (EDITOR) would

require modification

(7) Give the user access to the values in the control fields during

the editing process Either the user would be prohibited from

changing these values (as is currently done with CONCOR

internal identifiers) to prevent the creation of new questionnaires or the summary information gathered on a

questionnaire basis would become meaningless Another

ramification of this enhancement is how to handle fields defined as alphanumeric as CONCOR internally works in binary

(8) Allow input data records with a similar format to be defined by

the same DEFINE-RECORD command statement This could be

accomplished by permitting multiple values in the RECORD-TYPE

clause of the DEFINE-RECORD command A possible problem here

is in the determination of which value to be move to the output

record when the record is to be written out using the OUTPUT

command 0

(9) Implement a COMMION-DATA concept to handle items command to all

record types This could be costly in terms of core storage

and additional conversion time if a storage area is allocated

for each identifier for each record type Only permitting one

common area for all records would require validation of the

data to ensure the values were exactly the same on each record

This wJould require the CONCOR program to perform many more

internal checks when reading the data file into the store area

(10) Permit the selective inclusion of data items found in the

DEFINE-RECORD command statement into the universe count used

for the determination of tolerance levels when using the COUNT-IIPUTES command statement

(11) Increase the number and types of data format referencing now

permitted External numeric data (type N) could be expanded to

handle 18 digit numbers Also signed numeric and packed

decimal data format could be added The signed numeric format

would allow both leading blanks and negative data values This

format is essential ihen dealing with data files generated by

FORTRAN programs

0

+

G-3 47

(12) Increase the number of comparison strings permitted for alphanumeric data items This would allow for easier processing of alphanumeric strings but would require modifications to the system data dictionary

(13) If a range of valid values were specified with the declaration of a user-identifier in the DEFINE-RECORD command an automatic range check could be performed prior to CONCOR giving the user access to the data items on the questionnaire The inclusion of such values are now done by many users in the form of comment statements This enhancement would also facilitate the use of the dictionary by tabulation software

(14) The language format used in the Dictionary Division could be modified to recognize both the space and comma character as valid delimiters The user would still be required to code two commas to receive the system default value for any parameter

(15) Provide a repetition factor in the initialization of arrays

(16) Print the data dictionary name in the titles of the dictionary documentation or provide a means by which the user may specify a title to appear in the listings

112 Execution Division

(M) Implement the Run Control Section This would allow the user to increase execution speed of the EDITOR program This would be accomplished by eliminating some of the system protection features such as division by zero and out of bounds checking In this secion the user could also specify new maximum values for the EDITOR error limits

(2) Provide an internal identifier (OCCURRENCE-PTR) which would contain the occurrence number of the record currently being filtered The only problem to resolve is how to treat this id-ntifier outside of the Filter Routines

(3) Implement a CREATE commnand This command would allow the user to create a record that Was not found on the input file This implementation would require special care as it gives the user the ability to change the valme of a CONCOR internal identifier (TYPE-CGUNT) and modify the contents of the store Another consequence of this command is what action is to be taken in the calculation of tolerance in terms of this new records inclusion into the universe of observations and number of changes

iK

48 G-4

(4) The code generated in the EDITOR program for the DRECODE

command should be modified to utilized a lookup table approach

This would decrease the execution time of the command but woul C

require additional communication between the GENED and GENDD

programs

(5) Permit the user to continue the message text field over

successive lines of code

(6) Implementation of a SET command command that would change

CONCORs default occurrence pointer for the duration of the SET

command This command would be ever helpful as the validation

of the existence of a record would only have to be done once

when the SET command was encounterd The major drawback to

this command is in the users ability to remember the action

taken during the last execution of the command when the

Execution Division command statements may cover many pages of

code

(7) Implement a method of specifying different data formats for

fields on the WRITE command statement This would give greater

flexibility to the user but would require major revision to the

routines now used to process the WRITE command

(8) Provide the user with two sets of internal identifiers The

first set currently exists in the system and is valid on the

basis of the current control area (if specified) being

executed The second set would be accumulated on a run total

basis and would require the creation of new CONCOR internal

ident ifiers

(9) Implement automatic array declarations for capturing allocation

frequency distributions This could be done by the system for

discrete values (especially if point number 13 above is

implemented) but a mechanism for handling continuous values

would need to be designed nother program would then be

required to format the information gathered into a readable

form

113 Report Division

(I) Provide a means by which the user may specify their own headings or titles -for the reports

(2) Give the user the ability to override page ejection as a means

of saving paper

(3) Provide a neans by which the statistics gathered may be

presented in other aqgregations other than the area-break and

total levels now pro ided This would require another

procedure to aggregate the statistics and pass them onto the

Report Division before the printing of the reports began Thi

extra procedure is required because of program size

cons i derat ions

G-5

49

114 System Level

(1) Generation of assembler (ALC) language code for the EDITOR program to be used on the IBM 360370 series type computers

(2) Modification of the GENSRC program i utilize a lookup table instead of the linear search technioa2 now used

(3) Modification of the RDMSGTXT program to look at continuation lines when searching for the text to be supplied by the system

(4) Hlodification to the system data dictionary to better arrange and make available information needed most frequently Also to examine the possibility of making the section dealing with the message text and report generation a separate file Investigate the industry standards for dictionary format and see how CONCOR meets the requirements set forth

(5) Make the parsing and syntacial routines interactive so as to increase the product ity of the CONCOR user This does not mean to imply that the editing process itself should be made interactive but rather only the development of the user code should be put online

12 Documentation

(1)The development of a self-teaching CONCOR manual

(2) The development of a CONCOR Users Guide

G-6

0

50

2 Survey or other Census Processing

21 Software

(1) Make optional the specification of the record type and questionnaire identification information as some files are not

hierarchical in nature

(2) Provide another input file as a means of doing a check-in procedure of sample cases expected This new input file would contain the master list of those oberservations selected to be

examined

(3) Allow floating point calculations

(4) Provide a mechanism for referencing repetitive sections of questionnaire This referencing (loop letter approach) could be accomplished in the same fashion as interrecord checking is now implemented Note that by using this approach two versions of the software would be required One version would be as CONCOR is now implemented and the second would accept flat data files where the subscript indicated a repetitive section on the survey document

(5) Add a weighting scheme to allow meaningful totals and summary

statistics based upon the sampling frame used to gather the data being edited

(6) Increase the number of user-identifiers allowed in the system because of the more detailed information found on survey forms

(7) Provide a means of manual correction of erroneous data items This correction scheme would utilize the data dictionary and allow for correction based upon the user-identifer name and

questionnaire identification CELADE has a COBOL version of

this program but it has not been finished nor tested

42

APPENDIX H

CONCOR SYSTEM INTERNAL VARIABLES

0 APPENO H

CONCOR SYSTEM INTERNAL VARIABLES

1 EOF-FLAG

2 ERROR-TN-QUESTIONNAIRE

3 CURRENT-POINTER-VALUE

4 TYPE-COUNT ( )

5 RECORD-COUNT

6 QUESTIONNAIRE-COUNT

7 RECORDS-IN-STORE

8 INVALID-RECORD-TYPE-COUNT

9 INCOMPLETE-FLAG

10 CONTINUATION-FLAG

11 INVALID-RECORD-FLAG

EOF-FLAG

ERRORS-IN-QUESTIONNAIRE

TYPE-COUNT ( )

IN-RECORD-COUNT

IN-QUESTIONNAIRE-COUNT

RECORDS-IN-STORE

INVALID-RECORD-TYPE-COUNT

INCOMPLETE-FLAG

CONTINUATION-FLAG

OUT-OF-RANGE

NOT-NUMBER

BLANK

(new internal counters) OUT-RECORD-COUNT OUTPUT-QUEST-COUNT WRITE-RECORD-COUNT

Change in spelling of variable

Not mentioned--possible ommission from manual

Not implemented

Note Current documentation equivocates when some variables

are reset--most are initialized with each control area break

Though implemented as reserved identifiers due to poor documentation of (old) CONCOR exact values and utility were unknown

These apparently new counters permit access to valuable totals within control areas

Note These are not cumulative counters Also outputshyquest-count violates naming conversion eg in out

APPENDIX I

CONCOR-EDITOR EXECUTION STATISTICS

C 0 N C o R - E D I T O R EXECUT ION S-TA-T1 ST I CS RUN DATEt 517I

APPENDIX I

CONTROL AREA 0 INPUTS RELD OUTPUTS BY OUTPUTCOHMAND OUTPUTS BY W0ITECOMP4JD NO VALI 0 SEQUENCE NUMBER 00000 00 0 RECTYPES 0

OUESTIONNAIRES RECORDS 0 QUEST KEYWORD- 0 ECORDS- RECORDS FORQUFSTshy

bullDIVISION BY ZERO LINE NUMBER i18 to DIVISION BY ZERO LINE NUMBER 118 bo DIVISION BY ZERO 00 LINE NUMBER 118

Oboo DIVISION BY ZERO 00 LINE NUMBER 118 LTNE |UJMP 000 DIVISION BY ZERO 000 LINE NUMBER 118

bo DIVISION BY ZERO 0 LINE NUMBER 118 00 RUN ABORTED DUE TO DIVISION BY ZERO ERRORS 00 1it LIST IR003 = Ro3t

1 6 42

112 LIST

Comments One of the most sever criticisms 114 LIST IDIVISIION BY ZErO ERROR Cw FOLLOWS of the old CONCOR package was that ]l LET RO01 003 = O it would cease processing for no

apparent reason--ABEND without generating any messages The most )16 LIST iRO01 R003 SET TO ZEROS Pou19 Ron3 conmon types of cessations are division by zero array coordinate 117 LIST RO0 0 = errors and user specified terminations A program to deliberately 118 LET P003 = R002 RO01 test CONCORS ability to providediagnostics was written The system 119 LIST LIST R003 R002RO01 =1 R0031 perfoned exceptionally well and correctly provided the line numbers in the 120 source program which caused the data exceptions The program text has been 121 superimposed upon the Execution Statistics page for illustration purposes 122 END TEST OF OPERANDS AND MSG

123

124

1 5 PEGISTERS RESET TO ZERO

125 LET R001 RO00 R003 = nt

0 N C 1 - Z D I T 0 it - ( r S T A T I S T I C S QJi DAT 0119

I ~ - ~APPENDIX b)APPENDIX TL I A Cr14A) 01 3TUTS JRtTF -0 VtLIQ AP-A I PUT R - 3ITD JTS LY UTPUT - Y C

I)ESTIjAAIES - PECLRW cU] I S T igry-t QEC UP )S irrkqS F] ) T

VLLII= FF= 15( C 1 u U-1 Tr JJ TI A)[ 9 )i 1= TI =WATE ) L i - U = VI L IJ) L T J3r-= I n

D 1 4A 1 EItR P IXSTI J A 1 L = 7 II I fIAT E I i Y C

= y 4 I 19f CZ U AT -Z-2j 1ESTIJh14 E CJ)PI) IATE= 2 V L 1E= L T I P1 =UM 11

= rU F r

F R L- -- I JE TI rjA OrPR eTE= rUMiF=E II V1 LIIF= LIT V M

tYfZU ttj gR R QUIESTI3Ii1 j Vi r IATE= LI n l FJw== 0R V J L T C ( -pJIT E~Q - cUESTT Y2ZjI 1= g rfiflUI)IATL= VeLUJE= P LIf PP= 1 3)

bull~ - A ED 011 1 Tn CCy -tENCE C rGR DI T FOR UP S )r 0 JCCUJ0EICF FJ71 ECGPO

173

175 UNTIL ROO1 gt 10 VARY P001 FROM 1 By Comments

In this example RO01 a 2 dimensional array 177 DO with 10 rows and 10 columns is attempting

178 UNTIL R002 - 10 VARY BO FROM 1 BY 1 to update the smaller (5 X 5) TA2 Nonshy- Uexistent element references caused the error

The program text has been superimposed on this

page for illustration purposes

180 ALLOCATE NOOP = TA1 (RO0] ROO)I

JAI UPDATE TA2 (RO01 R002) RO031

182 LIST N002 = R0037 NOO R003

1P3 LIST VALUES OF TA1 = RO03v TAl (RO01 R002)

184 LIST

185 LIST ITA2=TA2 RO01 Rn02)l

l6 END-DOI LIST

187 END-DO

deg r- -- shy

APPENJDIXI1 J( j

CONTROL AREA INPUTS READ OUTPUTS BY OUTPUT LOMANO OUTPUTS TEltCOMMNDfN

SEQUENCE NUMBER 11T A QUESTIONNAIRES RECORDS QUEST KEYWORD RECORDS SFf- 0 T rr-

JOB STOPPED DUETO USER STOP-RUN COMMAND ON LINC 215 - - -- 2 15 _ _ 4$ A

EDITOR -- NO RMAL END OF J B r T I1)

214 IF IN-RECORD-COUNT gt Comments In this example when the internal variable1215 ITHEN STOP-RUN END-THEN was than

215 TE END-THEN IN-RECORD-COUNT was greater than 500 processing was directed by the source COBOL CONCOR language

216 1 program to cease The program text has been - upon this page for illustrationI _-EC superimposed

S IF IN-RECORD-COUNT- lt 2 - - purposes ~~~~~~~~~~- bull - - ------I-

218~ THEN -LIST lt(THAN 20RECS INR

amp-~~~~~ ntai-beDcent

I

APPENDIX J

DIAGNOSTIC MEFSAGE GUIDE EXAMPLE

APPENDIX J DD-2

WARNING(DD-001) BLANK COMMAND STATEMENT LINE

EXPLANATION A blank command statement line was found in the user Dictionary Division command statement file

ACTION TAKEN Parsing continued with the next command statement line in the Dictionary Division command file

USER RESPONSE Put a comment indicator () in column 1 of the command statement line if spacing is desired if not remove the command statement line

ERROR (DD-002) ILLEGAL FIRST CHARACTER IN STRING

EXPLANATION A character not in the CONCOR legal character set (A-Z 0-9 - + the comma character () the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found as the first character in the string

ACTION TAKEN Parsing began again with the next string

USER RESPONSE Check for possible keying errors and ensure that the first character is a member of the valid CONCOR

character set

ERROR (DD-003) END OF LINE REACHED WITH NO DELIMITER FOR ILLEGAL STRING

EXPLANATION A string starting with a character not valid in the CONCOR legal character set (A-Z 0-9 - + the comma character ()p the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found without one of the delimiter characters ( =) when the end of the command line was reached

ACTION TAKEN Parsing began again with the next user command statement line in the Dictionary Division command file

USER RESPONSE Check for possible keying errors and ensure that each of the characters in the string is a member of the CONCOR valid character set (A-Z 0-9 - + the comma character () he delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ())

ERROR (DD-004) COMMENT INDICATOR () IN STRING

EXPLANATION The comment indicator character () was found embedded in a string

Page 47: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …

G-1

Appendix G

Future Versions of CONCOR Design Considerations

The present version of the CONCOR system (Basic COBOL Version 2) was designed to provide adequate computing power to properly edit housing and population census da a using automatic correction techniquns As a result CONCOR should not be construed to be the ultimate software package for generalized data editing The developers of this version of CONCOR realize that there is a group of improvements that can be made to the system which would facilitate the census data editing process A second group of changes or enhancements also exist that could facilitate the editing of survey or other types of census data It should be noted that some of the modifications to meet these two objectives at times are mutually exclusive Some of the possible modifications and considerations are outlined below

1 Housing and Population Census Processing

1 1 Software

1111 Dictionary Division

(1) Implementation of the Dictionary Attributes Section This would allow the user to specify the dictionary size (number of pages) on disk and control other file characteristics such as volume specification and retention periods if applicable

(2) Addition of an optional input andor output file(s) which could be used to initialize and save values stored in arrays and used during the allocation process This feature would require additional syntax analysis and the creation of new commands to perform the movement of the data values

(3) An output record format area (Output Record Section) could be added This would allow the creation of edited output files in a different format and length than the file read into CONCOR This implementation would require other enhanceients to provide a simple means of moving values from th-iT i tato tile output file Unique naming of user-identifiers would also need to be addressed Another possiblity is the declaration of both the input and output location in the same command when specifying the contents of tile data record to be edited

(4) Addition of user-identifier naires to fields used in the controlling of summary statistics and unique questionnaire determination This would require modification to the system level data dictionary

46 G-2

(5) Increasing the number of area and questionnaire control fields

This implementation would require modification lo the

dictionary as tell as to the error records passed to the Report

Division Another problem would be the formatting of the extra

data values in the report headings

(6) Permit the mixing of difFerent data item types in the CONTROL

commands This would allow the greatest flexibilty in choosing

control fields In order to implement this enhancement the

dictionary and the generated COBOL program (EDITOR) would

require modification

(7) Give the user access to the values in the control fields during

the editing process Either the user would be prohibited from

changing these values (as is currently done with CONCOR

internal identifiers) to prevent the creation of new questionnaires or the summary information gathered on a

questionnaire basis would become meaningless Another

ramification of this enhancement is how to handle fields defined as alphanumeric as CONCOR internally works in binary

(8) Allow input data records with a similar format to be defined by

the same DEFINE-RECORD command statement This could be

accomplished by permitting multiple values in the RECORD-TYPE

clause of the DEFINE-RECORD command A possible problem here

is in the determination of which value to be move to the output

record when the record is to be written out using the OUTPUT

command 0

(9) Implement a COMMION-DATA concept to handle items command to all

record types This could be costly in terms of core storage

and additional conversion time if a storage area is allocated

for each identifier for each record type Only permitting one

common area for all records would require validation of the

data to ensure the values were exactly the same on each record

This wJould require the CONCOR program to perform many more

internal checks when reading the data file into the store area

(10) Permit the selective inclusion of data items found in the

DEFINE-RECORD command statement into the universe count used

for the determination of tolerance levels when using the COUNT-IIPUTES command statement

(11) Increase the number and types of data format referencing now

permitted External numeric data (type N) could be expanded to

handle 18 digit numbers Also signed numeric and packed

decimal data format could be added The signed numeric format

would allow both leading blanks and negative data values This

format is essential ihen dealing with data files generated by

FORTRAN programs

0

+

G-3 47

(12) Increase the number of comparison strings permitted for alphanumeric data items This would allow for easier processing of alphanumeric strings but would require modifications to the system data dictionary

(13) If a range of valid values were specified with the declaration of a user-identifier in the DEFINE-RECORD command an automatic range check could be performed prior to CONCOR giving the user access to the data items on the questionnaire The inclusion of such values are now done by many users in the form of comment statements This enhancement would also facilitate the use of the dictionary by tabulation software

(14) The language format used in the Dictionary Division could be modified to recognize both the space and comma character as valid delimiters The user would still be required to code two commas to receive the system default value for any parameter

(15) Provide a repetition factor in the initialization of arrays

(16) Print the data dictionary name in the titles of the dictionary documentation or provide a means by which the user may specify a title to appear in the listings

112 Execution Division

(M) Implement the Run Control Section This would allow the user to increase execution speed of the EDITOR program This would be accomplished by eliminating some of the system protection features such as division by zero and out of bounds checking In this secion the user could also specify new maximum values for the EDITOR error limits

(2) Provide an internal identifier (OCCURRENCE-PTR) which would contain the occurrence number of the record currently being filtered The only problem to resolve is how to treat this id-ntifier outside of the Filter Routines

(3) Implement a CREATE commnand This command would allow the user to create a record that Was not found on the input file This implementation would require special care as it gives the user the ability to change the valme of a CONCOR internal identifier (TYPE-CGUNT) and modify the contents of the store Another consequence of this command is what action is to be taken in the calculation of tolerance in terms of this new records inclusion into the universe of observations and number of changes

iK

48 G-4

(4) The code generated in the EDITOR program for the DRECODE

command should be modified to utilized a lookup table approach

This would decrease the execution time of the command but woul C

require additional communication between the GENED and GENDD

programs

(5) Permit the user to continue the message text field over

successive lines of code

(6) Implementation of a SET command command that would change

CONCORs default occurrence pointer for the duration of the SET

command This command would be ever helpful as the validation

of the existence of a record would only have to be done once

when the SET command was encounterd The major drawback to

this command is in the users ability to remember the action

taken during the last execution of the command when the

Execution Division command statements may cover many pages of

code

(7) Implement a method of specifying different data formats for

fields on the WRITE command statement This would give greater

flexibility to the user but would require major revision to the

routines now used to process the WRITE command

(8) Provide the user with two sets of internal identifiers The

first set currently exists in the system and is valid on the

basis of the current control area (if specified) being

executed The second set would be accumulated on a run total

basis and would require the creation of new CONCOR internal

ident ifiers

(9) Implement automatic array declarations for capturing allocation

frequency distributions This could be done by the system for

discrete values (especially if point number 13 above is

implemented) but a mechanism for handling continuous values

would need to be designed nother program would then be

required to format the information gathered into a readable

form

113 Report Division

(I) Provide a means by which the user may specify their own headings or titles -for the reports

(2) Give the user the ability to override page ejection as a means

of saving paper

(3) Provide a neans by which the statistics gathered may be

presented in other aqgregations other than the area-break and

total levels now pro ided This would require another

procedure to aggregate the statistics and pass them onto the

Report Division before the printing of the reports began Thi

extra procedure is required because of program size

cons i derat ions

G-5

49

114 System Level

(1) Generation of assembler (ALC) language code for the EDITOR program to be used on the IBM 360370 series type computers

(2) Modification of the GENSRC program i utilize a lookup table instead of the linear search technioa2 now used

(3) Modification of the RDMSGTXT program to look at continuation lines when searching for the text to be supplied by the system

(4) Hlodification to the system data dictionary to better arrange and make available information needed most frequently Also to examine the possibility of making the section dealing with the message text and report generation a separate file Investigate the industry standards for dictionary format and see how CONCOR meets the requirements set forth

(5) Make the parsing and syntacial routines interactive so as to increase the product ity of the CONCOR user This does not mean to imply that the editing process itself should be made interactive but rather only the development of the user code should be put online

12 Documentation

(1)The development of a self-teaching CONCOR manual

(2) The development of a CONCOR Users Guide

G-6

0

50

2 Survey or other Census Processing

21 Software

(1) Make optional the specification of the record type and questionnaire identification information as some files are not

hierarchical in nature

(2) Provide another input file as a means of doing a check-in procedure of sample cases expected This new input file would contain the master list of those oberservations selected to be

examined

(3) Allow floating point calculations

(4) Provide a mechanism for referencing repetitive sections of questionnaire This referencing (loop letter approach) could be accomplished in the same fashion as interrecord checking is now implemented Note that by using this approach two versions of the software would be required One version would be as CONCOR is now implemented and the second would accept flat data files where the subscript indicated a repetitive section on the survey document

(5) Add a weighting scheme to allow meaningful totals and summary

statistics based upon the sampling frame used to gather the data being edited

(6) Increase the number of user-identifiers allowed in the system because of the more detailed information found on survey forms

(7) Provide a means of manual correction of erroneous data items This correction scheme would utilize the data dictionary and allow for correction based upon the user-identifer name and

questionnaire identification CELADE has a COBOL version of

this program but it has not been finished nor tested

42

APPENDIX H

CONCOR SYSTEM INTERNAL VARIABLES

0 APPENO H

CONCOR SYSTEM INTERNAL VARIABLES

1 EOF-FLAG

2 ERROR-TN-QUESTIONNAIRE

3 CURRENT-POINTER-VALUE

4 TYPE-COUNT ( )

5 RECORD-COUNT

6 QUESTIONNAIRE-COUNT

7 RECORDS-IN-STORE

8 INVALID-RECORD-TYPE-COUNT

9 INCOMPLETE-FLAG

10 CONTINUATION-FLAG

11 INVALID-RECORD-FLAG

EOF-FLAG

ERRORS-IN-QUESTIONNAIRE

TYPE-COUNT ( )

IN-RECORD-COUNT

IN-QUESTIONNAIRE-COUNT

RECORDS-IN-STORE

INVALID-RECORD-TYPE-COUNT

INCOMPLETE-FLAG

CONTINUATION-FLAG

OUT-OF-RANGE

NOT-NUMBER

BLANK

(new internal counters) OUT-RECORD-COUNT OUTPUT-QUEST-COUNT WRITE-RECORD-COUNT

Change in spelling of variable

Not mentioned--possible ommission from manual

Not implemented

Note Current documentation equivocates when some variables

are reset--most are initialized with each control area break

Though implemented as reserved identifiers due to poor documentation of (old) CONCOR exact values and utility were unknown

These apparently new counters permit access to valuable totals within control areas

Note These are not cumulative counters Also outputshyquest-count violates naming conversion eg in out

APPENDIX I

CONCOR-EDITOR EXECUTION STATISTICS

C 0 N C o R - E D I T O R EXECUT ION S-TA-T1 ST I CS RUN DATEt 517I

APPENDIX I

CONTROL AREA 0 INPUTS RELD OUTPUTS BY OUTPUTCOHMAND OUTPUTS BY W0ITECOMP4JD NO VALI 0 SEQUENCE NUMBER 00000 00 0 RECTYPES 0

OUESTIONNAIRES RECORDS 0 QUEST KEYWORD- 0 ECORDS- RECORDS FORQUFSTshy

bullDIVISION BY ZERO LINE NUMBER i18 to DIVISION BY ZERO LINE NUMBER 118 bo DIVISION BY ZERO 00 LINE NUMBER 118

Oboo DIVISION BY ZERO 00 LINE NUMBER 118 LTNE |UJMP 000 DIVISION BY ZERO 000 LINE NUMBER 118

bo DIVISION BY ZERO 0 LINE NUMBER 118 00 RUN ABORTED DUE TO DIVISION BY ZERO ERRORS 00 1it LIST IR003 = Ro3t

1 6 42

112 LIST

Comments One of the most sever criticisms 114 LIST IDIVISIION BY ZErO ERROR Cw FOLLOWS of the old CONCOR package was that ]l LET RO01 003 = O it would cease processing for no

apparent reason--ABEND without generating any messages The most )16 LIST iRO01 R003 SET TO ZEROS Pou19 Ron3 conmon types of cessations are division by zero array coordinate 117 LIST RO0 0 = errors and user specified terminations A program to deliberately 118 LET P003 = R002 RO01 test CONCORS ability to providediagnostics was written The system 119 LIST LIST R003 R002RO01 =1 R0031 perfoned exceptionally well and correctly provided the line numbers in the 120 source program which caused the data exceptions The program text has been 121 superimposed upon the Execution Statistics page for illustration purposes 122 END TEST OF OPERANDS AND MSG

123

124

1 5 PEGISTERS RESET TO ZERO

125 LET R001 RO00 R003 = nt

0 N C 1 - Z D I T 0 it - ( r S T A T I S T I C S QJi DAT 0119

I ~ - ~APPENDIX b)APPENDIX TL I A Cr14A) 01 3TUTS JRtTF -0 VtLIQ AP-A I PUT R - 3ITD JTS LY UTPUT - Y C

I)ESTIjAAIES - PECLRW cU] I S T igry-t QEC UP )S irrkqS F] ) T

VLLII= FF= 15( C 1 u U-1 Tr JJ TI A)[ 9 )i 1= TI =WATE ) L i - U = VI L IJ) L T J3r-= I n

D 1 4A 1 EItR P IXSTI J A 1 L = 7 II I fIAT E I i Y C

= y 4 I 19f CZ U AT -Z-2j 1ESTIJh14 E CJ)PI) IATE= 2 V L 1E= L T I P1 =UM 11

= rU F r

F R L- -- I JE TI rjA OrPR eTE= rUMiF=E II V1 LIIF= LIT V M

tYfZU ttj gR R QUIESTI3Ii1 j Vi r IATE= LI n l FJw== 0R V J L T C ( -pJIT E~Q - cUESTT Y2ZjI 1= g rfiflUI)IATL= VeLUJE= P LIf PP= 1 3)

bull~ - A ED 011 1 Tn CCy -tENCE C rGR DI T FOR UP S )r 0 JCCUJ0EICF FJ71 ECGPO

173

175 UNTIL ROO1 gt 10 VARY P001 FROM 1 By Comments

In this example RO01 a 2 dimensional array 177 DO with 10 rows and 10 columns is attempting

178 UNTIL R002 - 10 VARY BO FROM 1 BY 1 to update the smaller (5 X 5) TA2 Nonshy- Uexistent element references caused the error

The program text has been superimposed on this

page for illustration purposes

180 ALLOCATE NOOP = TA1 (RO0] ROO)I

JAI UPDATE TA2 (RO01 R002) RO031

182 LIST N002 = R0037 NOO R003

1P3 LIST VALUES OF TA1 = RO03v TAl (RO01 R002)

184 LIST

185 LIST ITA2=TA2 RO01 Rn02)l

l6 END-DOI LIST

187 END-DO

deg r- -- shy

APPENJDIXI1 J( j

CONTROL AREA INPUTS READ OUTPUTS BY OUTPUT LOMANO OUTPUTS TEltCOMMNDfN

SEQUENCE NUMBER 11T A QUESTIONNAIRES RECORDS QUEST KEYWORD RECORDS SFf- 0 T rr-

JOB STOPPED DUETO USER STOP-RUN COMMAND ON LINC 215 - - -- 2 15 _ _ 4$ A

EDITOR -- NO RMAL END OF J B r T I1)

214 IF IN-RECORD-COUNT gt Comments In this example when the internal variable1215 ITHEN STOP-RUN END-THEN was than

215 TE END-THEN IN-RECORD-COUNT was greater than 500 processing was directed by the source COBOL CONCOR language

216 1 program to cease The program text has been - upon this page for illustrationI _-EC superimposed

S IF IN-RECORD-COUNT- lt 2 - - purposes ~~~~~~~~~~- bull - - ------I-

218~ THEN -LIST lt(THAN 20RECS INR

amp-~~~~~ ntai-beDcent

I

APPENDIX J

DIAGNOSTIC MEFSAGE GUIDE EXAMPLE

APPENDIX J DD-2

WARNING(DD-001) BLANK COMMAND STATEMENT LINE

EXPLANATION A blank command statement line was found in the user Dictionary Division command statement file

ACTION TAKEN Parsing continued with the next command statement line in the Dictionary Division command file

USER RESPONSE Put a comment indicator () in column 1 of the command statement line if spacing is desired if not remove the command statement line

ERROR (DD-002) ILLEGAL FIRST CHARACTER IN STRING

EXPLANATION A character not in the CONCOR legal character set (A-Z 0-9 - + the comma character () the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found as the first character in the string

ACTION TAKEN Parsing began again with the next string

USER RESPONSE Check for possible keying errors and ensure that the first character is a member of the valid CONCOR

character set

ERROR (DD-003) END OF LINE REACHED WITH NO DELIMITER FOR ILLEGAL STRING

EXPLANATION A string starting with a character not valid in the CONCOR legal character set (A-Z 0-9 - + the comma character ()p the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found without one of the delimiter characters ( =) when the end of the command line was reached

ACTION TAKEN Parsing began again with the next user command statement line in the Dictionary Division command file

USER RESPONSE Check for possible keying errors and ensure that each of the characters in the string is a member of the CONCOR valid character set (A-Z 0-9 - + the comma character () he delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ())

ERROR (DD-004) COMMENT INDICATOR () IN STRING

EXPLANATION The comment indicator character () was found embedded in a string

Page 48: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …

46 G-2

(5) Increasing the number of area and questionnaire control fields

This implementation would require modification lo the

dictionary as tell as to the error records passed to the Report

Division Another problem would be the formatting of the extra

data values in the report headings

(6) Permit the mixing of difFerent data item types in the CONTROL

commands This would allow the greatest flexibilty in choosing

control fields In order to implement this enhancement the

dictionary and the generated COBOL program (EDITOR) would

require modification

(7) Give the user access to the values in the control fields during

the editing process Either the user would be prohibited from

changing these values (as is currently done with CONCOR

internal identifiers) to prevent the creation of new questionnaires or the summary information gathered on a

questionnaire basis would become meaningless Another

ramification of this enhancement is how to handle fields defined as alphanumeric as CONCOR internally works in binary

(8) Allow input data records with a similar format to be defined by

the same DEFINE-RECORD command statement This could be

accomplished by permitting multiple values in the RECORD-TYPE

clause of the DEFINE-RECORD command A possible problem here

is in the determination of which value to be move to the output

record when the record is to be written out using the OUTPUT

command 0

(9) Implement a COMMION-DATA concept to handle items command to all

record types This could be costly in terms of core storage

and additional conversion time if a storage area is allocated

for each identifier for each record type Only permitting one

common area for all records would require validation of the

data to ensure the values were exactly the same on each record

This wJould require the CONCOR program to perform many more

internal checks when reading the data file into the store area

(10) Permit the selective inclusion of data items found in the

DEFINE-RECORD command statement into the universe count used

for the determination of tolerance levels when using the COUNT-IIPUTES command statement

(11) Increase the number and types of data format referencing now

permitted External numeric data (type N) could be expanded to

handle 18 digit numbers Also signed numeric and packed

decimal data format could be added The signed numeric format

would allow both leading blanks and negative data values This

format is essential ihen dealing with data files generated by

FORTRAN programs

0

+

G-3 47

(12) Increase the number of comparison strings permitted for alphanumeric data items This would allow for easier processing of alphanumeric strings but would require modifications to the system data dictionary

(13) If a range of valid values were specified with the declaration of a user-identifier in the DEFINE-RECORD command an automatic range check could be performed prior to CONCOR giving the user access to the data items on the questionnaire The inclusion of such values are now done by many users in the form of comment statements This enhancement would also facilitate the use of the dictionary by tabulation software

(14) The language format used in the Dictionary Division could be modified to recognize both the space and comma character as valid delimiters The user would still be required to code two commas to receive the system default value for any parameter

(15) Provide a repetition factor in the initialization of arrays

(16) Print the data dictionary name in the titles of the dictionary documentation or provide a means by which the user may specify a title to appear in the listings

112 Execution Division

(M) Implement the Run Control Section This would allow the user to increase execution speed of the EDITOR program This would be accomplished by eliminating some of the system protection features such as division by zero and out of bounds checking In this secion the user could also specify new maximum values for the EDITOR error limits

(2) Provide an internal identifier (OCCURRENCE-PTR) which would contain the occurrence number of the record currently being filtered The only problem to resolve is how to treat this id-ntifier outside of the Filter Routines

(3) Implement a CREATE commnand This command would allow the user to create a record that Was not found on the input file This implementation would require special care as it gives the user the ability to change the valme of a CONCOR internal identifier (TYPE-CGUNT) and modify the contents of the store Another consequence of this command is what action is to be taken in the calculation of tolerance in terms of this new records inclusion into the universe of observations and number of changes

iK

48 G-4

(4) The code generated in the EDITOR program for the DRECODE

command should be modified to utilized a lookup table approach

This would decrease the execution time of the command but woul C

require additional communication between the GENED and GENDD

programs

(5) Permit the user to continue the message text field over

successive lines of code

(6) Implementation of a SET command command that would change

CONCORs default occurrence pointer for the duration of the SET

command This command would be ever helpful as the validation

of the existence of a record would only have to be done once

when the SET command was encounterd The major drawback to

this command is in the users ability to remember the action

taken during the last execution of the command when the

Execution Division command statements may cover many pages of

code

(7) Implement a method of specifying different data formats for

fields on the WRITE command statement This would give greater

flexibility to the user but would require major revision to the

routines now used to process the WRITE command

(8) Provide the user with two sets of internal identifiers The

first set currently exists in the system and is valid on the

basis of the current control area (if specified) being

executed The second set would be accumulated on a run total

basis and would require the creation of new CONCOR internal

ident ifiers

(9) Implement automatic array declarations for capturing allocation

frequency distributions This could be done by the system for

discrete values (especially if point number 13 above is

implemented) but a mechanism for handling continuous values

would need to be designed nother program would then be

required to format the information gathered into a readable

form

113 Report Division

(I) Provide a means by which the user may specify their own headings or titles -for the reports

(2) Give the user the ability to override page ejection as a means

of saving paper

(3) Provide a neans by which the statistics gathered may be

presented in other aqgregations other than the area-break and

total levels now pro ided This would require another

procedure to aggregate the statistics and pass them onto the

Report Division before the printing of the reports began Thi

extra procedure is required because of program size

cons i derat ions

G-5

49

114 System Level

(1) Generation of assembler (ALC) language code for the EDITOR program to be used on the IBM 360370 series type computers

(2) Modification of the GENSRC program i utilize a lookup table instead of the linear search technioa2 now used

(3) Modification of the RDMSGTXT program to look at continuation lines when searching for the text to be supplied by the system

(4) Hlodification to the system data dictionary to better arrange and make available information needed most frequently Also to examine the possibility of making the section dealing with the message text and report generation a separate file Investigate the industry standards for dictionary format and see how CONCOR meets the requirements set forth

(5) Make the parsing and syntacial routines interactive so as to increase the product ity of the CONCOR user This does not mean to imply that the editing process itself should be made interactive but rather only the development of the user code should be put online

12 Documentation

(1)The development of a self-teaching CONCOR manual

(2) The development of a CONCOR Users Guide

G-6

0

50

2 Survey or other Census Processing

21 Software

(1) Make optional the specification of the record type and questionnaire identification information as some files are not

hierarchical in nature

(2) Provide another input file as a means of doing a check-in procedure of sample cases expected This new input file would contain the master list of those oberservations selected to be

examined

(3) Allow floating point calculations

(4) Provide a mechanism for referencing repetitive sections of questionnaire This referencing (loop letter approach) could be accomplished in the same fashion as interrecord checking is now implemented Note that by using this approach two versions of the software would be required One version would be as CONCOR is now implemented and the second would accept flat data files where the subscript indicated a repetitive section on the survey document

(5) Add a weighting scheme to allow meaningful totals and summary

statistics based upon the sampling frame used to gather the data being edited

(6) Increase the number of user-identifiers allowed in the system because of the more detailed information found on survey forms

(7) Provide a means of manual correction of erroneous data items This correction scheme would utilize the data dictionary and allow for correction based upon the user-identifer name and

questionnaire identification CELADE has a COBOL version of

this program but it has not been finished nor tested

42

APPENDIX H

CONCOR SYSTEM INTERNAL VARIABLES

0 APPENO H

CONCOR SYSTEM INTERNAL VARIABLES

1 EOF-FLAG

2 ERROR-TN-QUESTIONNAIRE

3 CURRENT-POINTER-VALUE

4 TYPE-COUNT ( )

5 RECORD-COUNT

6 QUESTIONNAIRE-COUNT

7 RECORDS-IN-STORE

8 INVALID-RECORD-TYPE-COUNT

9 INCOMPLETE-FLAG

10 CONTINUATION-FLAG

11 INVALID-RECORD-FLAG

EOF-FLAG

ERRORS-IN-QUESTIONNAIRE

TYPE-COUNT ( )

IN-RECORD-COUNT

IN-QUESTIONNAIRE-COUNT

RECORDS-IN-STORE

INVALID-RECORD-TYPE-COUNT

INCOMPLETE-FLAG

CONTINUATION-FLAG

OUT-OF-RANGE

NOT-NUMBER

BLANK

(new internal counters) OUT-RECORD-COUNT OUTPUT-QUEST-COUNT WRITE-RECORD-COUNT

Change in spelling of variable

Not mentioned--possible ommission from manual

Not implemented

Note Current documentation equivocates when some variables

are reset--most are initialized with each control area break

Though implemented as reserved identifiers due to poor documentation of (old) CONCOR exact values and utility were unknown

These apparently new counters permit access to valuable totals within control areas

Note These are not cumulative counters Also outputshyquest-count violates naming conversion eg in out

APPENDIX I

CONCOR-EDITOR EXECUTION STATISTICS

C 0 N C o R - E D I T O R EXECUT ION S-TA-T1 ST I CS RUN DATEt 517I

APPENDIX I

CONTROL AREA 0 INPUTS RELD OUTPUTS BY OUTPUTCOHMAND OUTPUTS BY W0ITECOMP4JD NO VALI 0 SEQUENCE NUMBER 00000 00 0 RECTYPES 0

OUESTIONNAIRES RECORDS 0 QUEST KEYWORD- 0 ECORDS- RECORDS FORQUFSTshy

bullDIVISION BY ZERO LINE NUMBER i18 to DIVISION BY ZERO LINE NUMBER 118 bo DIVISION BY ZERO 00 LINE NUMBER 118

Oboo DIVISION BY ZERO 00 LINE NUMBER 118 LTNE |UJMP 000 DIVISION BY ZERO 000 LINE NUMBER 118

bo DIVISION BY ZERO 0 LINE NUMBER 118 00 RUN ABORTED DUE TO DIVISION BY ZERO ERRORS 00 1it LIST IR003 = Ro3t

1 6 42

112 LIST

Comments One of the most sever criticisms 114 LIST IDIVISIION BY ZErO ERROR Cw FOLLOWS of the old CONCOR package was that ]l LET RO01 003 = O it would cease processing for no

apparent reason--ABEND without generating any messages The most )16 LIST iRO01 R003 SET TO ZEROS Pou19 Ron3 conmon types of cessations are division by zero array coordinate 117 LIST RO0 0 = errors and user specified terminations A program to deliberately 118 LET P003 = R002 RO01 test CONCORS ability to providediagnostics was written The system 119 LIST LIST R003 R002RO01 =1 R0031 perfoned exceptionally well and correctly provided the line numbers in the 120 source program which caused the data exceptions The program text has been 121 superimposed upon the Execution Statistics page for illustration purposes 122 END TEST OF OPERANDS AND MSG

123

124

1 5 PEGISTERS RESET TO ZERO

125 LET R001 RO00 R003 = nt

0 N C 1 - Z D I T 0 it - ( r S T A T I S T I C S QJi DAT 0119

I ~ - ~APPENDIX b)APPENDIX TL I A Cr14A) 01 3TUTS JRtTF -0 VtLIQ AP-A I PUT R - 3ITD JTS LY UTPUT - Y C

I)ESTIjAAIES - PECLRW cU] I S T igry-t QEC UP )S irrkqS F] ) T

VLLII= FF= 15( C 1 u U-1 Tr JJ TI A)[ 9 )i 1= TI =WATE ) L i - U = VI L IJ) L T J3r-= I n

D 1 4A 1 EItR P IXSTI J A 1 L = 7 II I fIAT E I i Y C

= y 4 I 19f CZ U AT -Z-2j 1ESTIJh14 E CJ)PI) IATE= 2 V L 1E= L T I P1 =UM 11

= rU F r

F R L- -- I JE TI rjA OrPR eTE= rUMiF=E II V1 LIIF= LIT V M

tYfZU ttj gR R QUIESTI3Ii1 j Vi r IATE= LI n l FJw== 0R V J L T C ( -pJIT E~Q - cUESTT Y2ZjI 1= g rfiflUI)IATL= VeLUJE= P LIf PP= 1 3)

bull~ - A ED 011 1 Tn CCy -tENCE C rGR DI T FOR UP S )r 0 JCCUJ0EICF FJ71 ECGPO

173

175 UNTIL ROO1 gt 10 VARY P001 FROM 1 By Comments

In this example RO01 a 2 dimensional array 177 DO with 10 rows and 10 columns is attempting

178 UNTIL R002 - 10 VARY BO FROM 1 BY 1 to update the smaller (5 X 5) TA2 Nonshy- Uexistent element references caused the error

The program text has been superimposed on this

page for illustration purposes

180 ALLOCATE NOOP = TA1 (RO0] ROO)I

JAI UPDATE TA2 (RO01 R002) RO031

182 LIST N002 = R0037 NOO R003

1P3 LIST VALUES OF TA1 = RO03v TAl (RO01 R002)

184 LIST

185 LIST ITA2=TA2 RO01 Rn02)l

l6 END-DOI LIST

187 END-DO

deg r- -- shy

APPENJDIXI1 J( j

CONTROL AREA INPUTS READ OUTPUTS BY OUTPUT LOMANO OUTPUTS TEltCOMMNDfN

SEQUENCE NUMBER 11T A QUESTIONNAIRES RECORDS QUEST KEYWORD RECORDS SFf- 0 T rr-

JOB STOPPED DUETO USER STOP-RUN COMMAND ON LINC 215 - - -- 2 15 _ _ 4$ A

EDITOR -- NO RMAL END OF J B r T I1)

214 IF IN-RECORD-COUNT gt Comments In this example when the internal variable1215 ITHEN STOP-RUN END-THEN was than

215 TE END-THEN IN-RECORD-COUNT was greater than 500 processing was directed by the source COBOL CONCOR language

216 1 program to cease The program text has been - upon this page for illustrationI _-EC superimposed

S IF IN-RECORD-COUNT- lt 2 - - purposes ~~~~~~~~~~- bull - - ------I-

218~ THEN -LIST lt(THAN 20RECS INR

amp-~~~~~ ntai-beDcent

I

APPENDIX J

DIAGNOSTIC MEFSAGE GUIDE EXAMPLE

APPENDIX J DD-2

WARNING(DD-001) BLANK COMMAND STATEMENT LINE

EXPLANATION A blank command statement line was found in the user Dictionary Division command statement file

ACTION TAKEN Parsing continued with the next command statement line in the Dictionary Division command file

USER RESPONSE Put a comment indicator () in column 1 of the command statement line if spacing is desired if not remove the command statement line

ERROR (DD-002) ILLEGAL FIRST CHARACTER IN STRING

EXPLANATION A character not in the CONCOR legal character set (A-Z 0-9 - + the comma character () the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found as the first character in the string

ACTION TAKEN Parsing began again with the next string

USER RESPONSE Check for possible keying errors and ensure that the first character is a member of the valid CONCOR

character set

ERROR (DD-003) END OF LINE REACHED WITH NO DELIMITER FOR ILLEGAL STRING

EXPLANATION A string starting with a character not valid in the CONCOR legal character set (A-Z 0-9 - + the comma character ()p the delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ()) was found without one of the delimiter characters ( =) when the end of the command line was reached

ACTION TAKEN Parsing began again with the next user command statement line in the Dictionary Division command file

USER RESPONSE Check for possible keying errors and ensure that each of the characters in the string is a member of the CONCOR valid character set (A-Z 0-9 - + the comma character () he delimiter character () the keyword separator character (=) the command terminator character () and the literal string delimiter ())

ERROR (DD-004) COMMENT INDICATOR () IN STRING

EXPLANATION The comment indicator character () was found embedded in a string

Page 49: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …
Page 50: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …
Page 51: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …
Page 52: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …
Page 53: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …
Page 54: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …
Page 55: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …
Page 56: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …
Page 57: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …
Page 58: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …
Page 59: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …
Page 60: THE CONCOR (CONSISTENCY AND CORRECTION) ITS …