Data Editing Workshop and Exposition

  • View

  • Download

Embed Size (px)

Text of Data Editing Workshop and Exposition



    Data Editing Workshop and Exposition

    Prepared by the Organizing Committee for the

    Data Editing Worlcshop and Exposition

    Federal Committee on Statistical Methodology

    Statlstkal PoUcy omce Oflke of Information and Regulatory Atralrs

    Omce of Management and Budget

    DECEMBER 1996

  • Members of the Federel Committee on Statistlcal Methodology

    Maria Oena Ovnulez. Chair Offocc of Managemen1 and Budge1

    M. Denice McConnick Myen, Secretary National Agricultural Sta1is11cs Strvice Ahmed

    National Cenrer for Educarion Sta1is1ics

    Yvonne M. Bishop Energy Infonnarion Adminisrrarion

    Cynthia z. F. Clark Bureau of rhe Census

    Steven Cohen Agency for Heahh Care Policy and Research

    Lawrence H. Cox Environmencal Protccrion Agency

    Zahava o. Doenna Smilhsonian Institution

    Daniel Kasprzyk National Center for Education Swistics

    Nancy Kirkendall Energy Infmatioo Adnun1stra1ion

    Darnel Melnick Naaional Science Foundation

    (M411Ct1 1996)

    Roben P. Parker Bureau of Economic Analysis

    Oiarles P. Pautler, Jr. Bureau of lhe Census

    David A. Pierce Federal Reserve Board

    Thomas J. Plewes Bureau of Labor Sta1is1ics

    Wesley L. Schaible Bureau of Labor Sratistics

    RolfSchmin Bureau of Transportation Sta1is1ics

    Monroe Sirkeo National Center for Heallh Statisrics

    Alan R. Tupek National Science Foundation

    Denton ~ghan Social Security Administration

    RobettWamn Immigration and Naturalization Service

    G. David Williamson Centers for Disease Conrrol and Prevention

  • Foreword

    T his volume, No. 25 in the Federal Committee on Statistical Method-ology (FCSM)'s Working Paper series, is the written reco1d of Liu:: Data Editing Workshop and Exposition, held on March 22, 1996. at the Bureau of Labor Statistics (BLS) Conference and Training Cencer. Thb conference was over a year in plannmg, by an Organizing Commit tee that was an outgrowth of the FCSM's Subcommittee on Data Editing. From an initial plan of ten or 20 papers and computer software demon-strations, and perhaps 100 attendees, the registrations and submissions kept growing until the final program consisted of 44 oral presentations and 19 software demonstrations on tiara editing, with over 500 conference at-tendees . This success is probably due to several causes, not the least of which were the many outstanding contributions by the authors whose work appears in this volume. Perhaps the high participation level also suggests that data editing has been an overlooked area in the work of Federal, state, and international statistical agencies, as well as private-sector organiza-tions.

    From the Stan it was our intention to plan and produce this confer-ence on as close to a zero budget as possible. Our holding this goal seemed to foster an atmosphere of cooperation in which contributions and offers of assistance C'Am~ forth from numerous sources 3Dd at the times they were most needed. From the early publicity provided by several agencies and collaborating organizations to the preparation by IRS of the works published in this volume, donations of lime and efl'on were most gener-ous. The BLS staff was truly outstanding in anticipating and handling the many physical arrangements for the Workshop. And the agencies listed as affiliations of the Organizing Committee members all contributed vary-ing degrees of staff time towards ensuring the success of this conference.

  • Foreword (conrd)

    We began the planning of the Oat.a Editing Workshop and Exposition un-der the guicbnN" of the FCSM and its founding chairperson, Maria Elena Gonzalez. After an illness. Maria pllS"'d away carliu this year aod, while the FCSM continued its sponsorship of the conference and these Proceedings, Maria Gonzalez' departure is a deep personal and professional loss to all of us. Her career as a Federal govern-ment statistician spanned a quarter cen-tury, during which she made many con-tributions to improvins the quality of

    A Dedication in Memory of Maria Eteno Gonzar.z

    Federal (and international) statistics. She did this both directly and as an outstanding leader in bringing forth and leveraging the talents of others for the many valuable statistical projects and conferences that she initiated. The editors would like to dedicate this Proceedings volume to Maria Gonzalez' memory, as was done by the Organizing Committee for the conference itself.

    The next few pages contain the table of contents, followed by the con-ference concrihutinn< them

  • Data Editing WMkshop and Exposition: Organizing Committee

    David A. Pierce, Chair Federal Reserve Board

    Mark Pierzchala, Co-Chair National Agricultural Statistics Service

    YahiaAbmed U. S. Internal Revenue Service

    Frances Chevarley National Center for Heallh Statistics

    Charles Day National Agricultural Statistics Service

    Rich Esposito U.S. Bureau of Labor Statistics

    Sylvia Kay Fisher U. S. Bureau of Labor Statistics

    Laura Bauer Gillis Federal Reserve Board

    Maria Elena Gonulez U. S . Office of Management and Budget

    Robcn Groves Joi.n1 Program in Survey Methodology


    Ken Harris National Center for Health Statistics

    David McDonell National Agriculrural Statistics


    Renee Miller U.S. Energy Information


    M. Denice McCormick Myers National Agricultural Statistics Service

    Jeff Owings National Center for Education Statistics

    Thomas B. Petska U. S. Internal Revenue Service

    Linda Stin.soo U. S. Bureau of Labor Statistics

    Paula Weir U.S. Energy Information


    William E Wuil:.ler U. S. Bureau of the Census

  • TABLE Of Cosn:NTS


    F ORWARD ....................................................................................................................... .. l OVERVIEWS

    A Paradigm Shifl for Da1a Editing. Linda M. &11............................................................... 3 The Ne" View on Edi1ing. l.Lopold Granquist ...................................................................... 16 Data Eduing al the Na1ional Cen1er for Hcallh Statistics, ~niuth W. Hcmis ..................... 24

    2 . FELLEClHOt.T S\'STEMS DISCRETE: A Fellegl-Hol1 Edit Sys1em for Demographic L>ata,

    William E. \\

  • Page

    7 CATICAPI T ECHNICAL Quesiionnairc Programming Language (QPL) [ABSTRACT 01

  • Page

    13 .. CASE Snroras ID Time-Series Editing of Quanerly OcposilS Data.Anwsha Fttnando Dhann4Stna .................. 269 Experiences in Re-Engineering the Approach 10 Editing and Impuring Canadian

    lmpons Data, [AasntAcr O~Lv], Clancy &nm and Fmncois l.a{l11m""' ......................... 283 Data Ed111ng on an Au1oma1ed t:.nv1ronmen1: A Practical Retrospective 11te CPS

    Experience [ABSTRACT ONLY), Grrgory D. Weyland............................................................ 284


    15 ..

    S1a11s1ical Analysis ofTexrual lnfonnation [ABSTRACT oi

  • 1 Overviews Chapter

    Chair: Fred Vogel, National Agricultural Statistics Service

    Linda M. Ball

    Leopold Granquist

    Kenneth W. Harris

  • 1 Chapter

    A Paradigm Shift for Data Editing

    Linda M . Ball, U.S. Bureau of the Census


    V iewed through the current paradigm. the survey process consists of collecting. editing, and summarizing survey data. We think of sur-vey data as the "stuff" that interviewers collect, the basic units of which are individual questionnaire items. On this view, pieces of data are either erroneous or nor erroneous, you can correct erroneous data, and data editing ic a manageable process for most surveys.

    The author proposes that we instead view the survey process as engi-neering and managing socio-economic infonnatlon $y.$tCm.$. ln this para-digm, a survey is an expression of a mental model about society. The basic units that make up the mental model are objects or concepts in the real world about which we wish to collect information. Our mental models fail to cap-ture fully the complexity of those objects and concepts. and a questionnaire fails to capture fully the complexity of our mental model. It is no surprise, then. that surveys yield unexpected results. which may or may not be erro-neous .

    When An edit dctccu an "'error," it often can't tell whc.ll1t;r lhill error was simply an unexpected result or one of the host of errors in administer-ing the questionnaire and in data processing that occur regularly in the ad-ministmtion of surveys. If we write "b1 ut.e fvrcc. c:dilS that ensure many errors are corrected. we may miss geuing feedback on the problems with the mental model underlying the survey. If we take a more hands off ap-proach, users complain that the data set has errors and is difficult to sum-marize and analyze. Is it. then.any surprise that we are usually not satisfied with the results we get from edits?

  • Abstract (cont'd)

    We get a glimpse of the true complexity of the subject matter of a survey when we study the edits of a survey that bas been around for a long time.

    The longer a survey has been around, the more its edits evolve to reflect the complexity of the real world. For the same reason, questionnaire,s tend to become more complex over time. CATI/CAP! allowed us to climb to a new level of possible questionnaire complexity, ::m.d we immediately took advan-tage of it because we always knew that a paper questionnaire could not be designed to handle the complexity of the subject matter of most surveys.

    One way to add