Transcript

Harvard Center for Population and Development Studies

1

Census Editing and the Art of Motorcycle Maintenance

Michael J. LevinCenter for Population and Development Studies

Harvard [email protected]

Harvard Center for Population and Development Studies

2

Appendix A Censuses where some of these methods were applied

Country Census Years American Samoa 1974, 1980, 1990, 2000 Ethiopia 2007 Fiji 1996, 2007 Ghana 1984, 2000, 2010 Grenada 2001 Guam 1980, 1990, 2000 Indonesia 1980, 2010 Kenya 1999 Kiribati 2005 Lesotho 1996, 2006 Malawi 1998, 2008 Maldives 2006 Marshall Islands 1973, 1980, 1988 Micronesia 1973, 1980, 1994, 2000 Northern Marianas 1973, 1980, 1990, 1995, 2000 Palau 1973, 1980, 1990, 1995, 2000, 2005 Papua New-Guinea 1990 Samoa 2001 Sierra Leone 2004 Solomon Islands 1999 South Africa 2001 Sudan 2008 Tanzania 2002 Timor Leste 2004 Tonga 1996, 2006 Uganda 1991, 2002 US Virgin Islands 1980, 1990, 2000 Vanuatu 1989 Zambia 2000 Note: For some, processing occurred during the census, for others it was during preparation or during analysis (including own children estimation).

Harvard Center for Population and Development Studies

3

The Census Process

Data collection Capture Editing Tabulation and Dissemination Archiving

Harvard Center for Population and Development Studies

4

History of census editing

Early years – manual or nothing Computers Within record editing Between record editing Hot decking

Harvard Center for Population and Development Studies

5

What is editing

Editing is the systematic inspection of invalid and inconsistent responses, and subsequent manual or aurtomatic correction according to pre-determined rules.

The editing team!!

Harvard Center for Population and Development Studies

6

Why edit?

Edited vs unedited data Always preserve original data Consider the users!!

Harvard Center for Population and Development Studies

7

Table 1. Sample population by 15-year age group and sex, using unedited and

edited data

Unedited data Edited data Age group Total Male Female Not

reported Total Male Female

Total 4,147 2,033 2,091 23 4,147 2,045 2,102 Less than 15 years 1,639 799 825 15 1,743 855 888 15 to 29 years 1,256 612 643 1 1,217 603 614 30 to 44 years 727 356 369 2 695 338 357 45 to 59 years 360 194 166 0 341 182 159 60 to 74 years 116 54 59 3 114 53 61 75 years and over 34 12 22 0 37 14 23 Not reported 15 6 7 2

Harvard Center for Population and Development Studies

8

Initial data sets contain errors

How over-editing is harmful Timeliness Finances Distortion of true values False sese of security

Harvard Center for Population and Development Studies

9

What we have to look out for

Treatment of unknowns Spurious changes Using tolerances Learning from the editing process Quality assurance Costs of editing

Harvard Center for Population and Development Studies

10

Types of Correction

Manual correction• Names• Sex Automatic correction• Assign an unknown• Assign a value• Impute a value

Harvard Center for Population and Development Studies

11

Types of editing

Top Down• The usual way• Is simple and straight forward Multiple-variable editing approach• Uses more information• Is likely to be a better guess

Harvard Center for Population and Development Studies

12

Two parts of a national edit

Structure editing Content editing

Harvard Center for Population and Development Studies

13

Methods of Correction and Imputation

When imputation is not needed – toggling sexes

Static imputation – cold deck technique

Dynamic imputation – hot deck technique

Harvard Center for Population and Development Studies

14

Goals of the edit

Imputed household should closely resemble failed edit household

Imputed data should come from a single donor person or house resembling donee

Equally good donors should have equal chances

Harvard Center for Population and Development Studies

15

Figure 1. Sample editing specifications to correct sex variable, in pseudocode

If SEX of the HEAD OF HOUSEHOLD = SEX of the SPOUSE If FERTILITY of the HEAD OF HOUSEHOLD is not blank If FERTILITY of the SPOUSE is blank (if the SEX of the head of household is not already female) Make the SEX = female endif (if the SEX of the spouse is not already male) Make the SEX = male endif else Do something else because they have same sex and both have fertility !!! [The “something” could be using the sex of the previous head, or alternating the sex of the Head, or using ratios of sexes of all heads for an appropriate response, etc.] endif Endif Else This is the case where the head of household’s fertility is blank If FERTILITY of the SPOUSE is not blank (if the SEX of the head of household is not already male) Make the SEX = male endif (if the SEX of the spouse is not already female) Make the SEX = female endif else Do something else because BOTH have no fertility!!! [The “something” could be using the sex of the previous head, or alternating the sex of the Head, or using ratios of sexes of all heads for an appropriate response, etc.] endif Endif Endif

Harvard Center for Population and Development Studies

16

Hot Deck

Geographic considerations Use of related items Order of the items changes the

matrices Complexity of the imputation

matrices

Harvard Center for Population and Development Studies

17

In developing hot decks Imputation matrices – structure of the

matrices Standardized imputation matrices Seeding the decks Big, but not too big Understanding what the matrix is doing When the matrix is too small … Occupation and industry!!

Harvard Center for Population and Development Studies

18

Aids to checking edits

1. Listings2. Writing whole households before

and after with changes3. Frequency matrices

Harvard Center for Population and Development Studies

19

Figure 4. Example of a listing summary for Malawi 2008 Census[LISTING]

1718 336574 - ******************************** ... - 1719 336574 - ******* Age & Head ********* ... - 1720 336574 - ******************************** ... - 1805 1546 0.1 *P00-1* Head is not first person, is %2d... 1748490 1823 877 0.1 *P00-2* No head of household, first person 14+... 1748490 1835 62 0.0 *P00-3* No head 14+, first person becomes head... 1748490 1850 5074 0.3 *P00-4* Too many heads of household - 1 ... 1748490 1860 5238 0.3 *P00-5* Remaining heads made other RELATIONSHI... 1748490 1874 939 0.1 *P00-6* After head edit, not one and only one ... 1748490 1889 2301 0.1 *P00-6a* Spouses too young made other relative... 1748490 1909 1062 0.1 *P00-6ax* Multiple spouses for unmarried head... 1748490 1911 1062 0.1 *P00-6ax* Multiple spouses for unmarried head... 1748490 1929 44 0.0 *P00-6a1* Crazy case where spouse is visitor a... 1748490 1949 89 0.0 *P00-6a3* Crazy case where spouse is visitor a... 1748490 1998 12 0.0 *P00-6a1* Extra spouses who are visitors... 1748490 2017 1483 0.1 *P00-6a2* Extra spouses not married... 1748490

Harvard Center for Population and Development Studies

20

Figure 5. Example of a listing summary for Lesotho 2006 Census[LISTING]

4388 21471 - ... - 4389 21471 - ******* Sisterhood Characteristics *********... - 4390 21471 - ... - 4401 1449 1.2 *G45-1* Total sisters out of range [%2d] illeg... 124839 4410 2897 2.3 *G45-2* Dead sisters out of range [%2d] illega... 124839 4419 3791 3.0 *G45-3* Pregnant sisters [%2d] illegal... 124839 4426 3895 3.1 *G45-4* At birth sisters [%2d] illegal... 124839 4433 4908 3.9 *G45-5* Week 6 sisters [%2d] illegal... 124839 4440 103 0.1 *G45-6* Sum of Dead Sisters [%2d][%2d][%2d] gr... 124839 4453 8 0.0 *G45-7* Sum of Dead Sisters [%2d][%2d][%2d] gr... 124839 4461 616 0.5 *G45-8* Dead Sisters [%2d] greater than total ... 124839

Harvard Center for Population and Development Studies

21

Figure 8. Example of a write listing for Ethiopia 2007 Census[WRITE]

BARCODE REGION ZONE WEREDA TOWN SUB_CITY SA KEBELE EA HHNO HUNO ------------------------------------------------------------------------- PN RS RH SX AG RL MT ET DS 1 2 3 4 5 6 7 8 9 0 1 2 3 CS YR PR ZN MO FA LT SC HG WL RS LY ES MS MH FH MA FA MD FD LB 01 01 01 01 31 01 05 67 02 08 01 01 01 97 12 01 01 07 01 02 01 06 01 34 01 05 67 02 08 01 01 01 97 17 01 01 03 01 03 01 09 02 30 01 05 05 02 07 02 02 97 05 01 01 05 04 00 00 00 00 00 00 00 04 01 09 02 20 01 05 05 02 03 02 02 02 98 03 01 03 01 01 00 00 00 00 00 00 05 01 09 01 01 01 05 05 02 08 03 07 08 01 P18-3 No literacy , but schooling 97, so literate, PN = 3 P20-20 Unable to read and write 98 because never attended school , PN = 4 P16-1 Mother's vital status invalid = PN = 5 P17-1 Father's vital status invalid = PN = 5 PN RS RH SX AG RL MT ET DS 1 2 3 4 5 6 7 8 9 0 1 2 3 CS YR PR ZN MO FA LT SC HG WL RS LY ES MS MH FH MA FA MD FD LB 01 01 01 01 31 01 05 67 02 08 01 01 01 97 12 01 01 07 01 02 01 06 01 34 01 05 67 02 08 01 01 01 97 17 01 01 03 01 03 01 09 02 30 01 05 05 02 07 02 02 01 97 05 01 01 05 04 00 00 00 00 00 00 00 04 01 09 02 20 01 05 05 02 03 02 02 02 98 00 03 01 03 01 01 00 00 00 00 00 00 05 01 09 01 01 01 05 05 02 08 01 01 01 01

Harvard Center for Population and Development Studies

22

Figure 10. Example of a frequency distribution for Sudan 2008 Census[FREQUENCY]

Imputed Item Q18_ATTAINMENT: Education Attainment - all occurrences _____________________________ _____________ _____________ Categories Frequency CumFreq % Cum % Net %|cNet % _______________________________ _____________________________ _____________ _____________ 1 No Qualification 105 105 2.2 2.2 2.4 2.4 2 Incomplete Primary 1564 1669 33.5 35.7 35.3 37.7 3 Primary 4 529 2198 11.3 47.0 11.9 49.6 4 Primary 6 492 2690 10.5 57.6 11.1 60.7 5 Primary 8 302 2992 6.5 64.0 6.8 67.5 6 Junior 3 251 3243 5.4 69.4 5.7 73.2 7 Junior 4 58 3301 1.2 70.7 1.3 74.5 8 Secondary 3 95 3396 2.0 72.7 2.1 76.6 9 Secondary 4 5 3401 0.1 72.8 0.1 76.7 10 Post Secondary Diploma 2 3403 0.0 72.8 0.0 76.8 11 University Degree 154 3557 3.3 76.1 3.5 80.3 12 Post Graduate Diploma 10 3567 0.2 76.3 0.2 80.5 13 Master 52 3619 1.1 77.5 1.2 81.7 14 Ph.D 1 3620 0.0 77.5 0.0 81.7 15 Khalwa 1 3621 0.0 77.5 0.0 81.7 @17 144 3765 3.1 80.6 3.2 85.0 @98 667 4432 14.3 94.9 15.0 100.0 _______________________________ _____________________________ _____________ _____________ NotAppl 240 4672 5.1 100.0 _______________________________ _____________________________ _____________ TOTAL 4672 4672 100.0 100.0

Harvard Center for Population and Development Studies

23

Figure 11. Example of a frequency distribution for additional edit for Zambia 1990 Census[FREQUENCY]

Input: 1IN100.DAT Program: ZAMHOUSE ROOMS ------------------------------------------------------------- Values Number of Cum. Imputed Imputations Percent Percent ------------------------------------------------------------- < 1 1,415 37.21 37.21 1 2,185 57.45 94.66 2 121 3.18 97.84 3 22 0.58 98.42 4 16 0.42 98.84 5 23 0.60 99.45 6 21 0.55 100.00 > 6 - - - ------------------------------------- 3,803

Harvard Center for Population and Development Studies

24

Other considerations

Running the edit three times: seed, run, check

Saving original responses Imputation flags

Harvard Center for Population and Development Studies

25

Conclusions

Edits part of the series of census procedures

Usually more for aesthetics than technical enhancement

Hardware and software changing rapidly

The revolution continues!


Recommended