Upload
franklin-baldwin
View
226
Download
0
Embed Size (px)
Citation preview
What is IPUMS-International?
• The IPUMS-International project is creating an integrated global database of over 150 censuses from at least 44 countries.
• It will be the world’s largest public-use population database, with multiple samples from each country enabling analyses across time and space.
• The microdata and accompanying documentation will be freely available for scholarly and educational research through a web-based data dissemination system.
The Problem
• A vast body of raw census microdata covering much of the world over the past four decades survives in machine-readable form.
• In most countries, these census microdata are either unavailable to researchers or difficult to obtain.
• These data are at constant risk of destruction because of technological obsolescence, physical aging of computer tapes, and loss of institutional memory and documentation
Why it matters
• In the few countries where census microdata are readily available to researchers, they have become an indispensable part of social science infrastructure.
– In the journal Demography, the leading U.S. journal of population, census microdata are used three times as often as any other source for studies of the U.S. or Canada.
• No alternate source offers comparable sample sizes, chronological depth, or widespread availability across countries.
Advantages of Census Microdata Samples
Many more cases than any alternative datasetsEnable study of relatively small populationsAllows analysis of effects of local conditions on behavior
• Large
• Long-term
Data usually available for multiple decades
• Flexible Tabulations can be customized to research problemMultivariate analysis feasibleHarmonization is possible, allowing analyses that cross borders and time periods
Cross-National Harmonization and Open Access:National Academy of Science recommendations
• “National and international funding agencies should establish mechanisms that facilitate the harmonization of data collected in different countries.”
• “Cross national studies conducted within a framework of comparable measurement can be a substantially more useful tool for policy analysis than studies of single countries.”
• “The scientific community, broadly construed, should have widespread and unconstrained access to the data.”
Source: Preparing for an Aging World: The Case for Cross-National Research (National Academy, 2001)
The Model: IPUMS-USA
• Project to harmonize U.S. Census microdata for the period 1850-2000
• 1992-1995: NSF-funded IPUMS project harmonized samples using composite codes, documented comparability; 250,000 transformations, 3,000 pages of printed documentation
• 1995-1999: Another NSF project funded an online data access system with integrated hypertext documentation
Success of IPUMS-USA
User friendly access, harmonized codes, and integrated comprehensive hypertext documentation led to flood of historical census-based research:
• 12,000 users, 75,000 custom data extracts
• Currently distributing an average of 638 MB/hr, 24/7
• 1,300 publications and working papers
– IPUMS-based research is concentrated in the top U.S. journals: the most common venues are Demography, American Economic Review, Journal of Political Economy, American Sociological Review, Social Forces, and Quarterly Review of Economics
IPUMS-International
• After 1960, most censuses around the world were tabulated by computer
• McCaa decided that IPUMS model should be applied to other countries
• Began with a project for Columbia, then in 1999 NSF Infrastructure grant to add six more countries
• 2005-2009: new HSD grant to increase database to 44 countries
• NICHD is also assisting with funding
IPUMS-International samples: First releaseCountry Census Year % Sample N of Persons N of Households
Brazil 1960 5 3,001,400 313,3001970 5 4,953,800 1,022,2001980 5 5,870,500 1,343,8001991 5.8 8,522,700 2,012,3002000 6 10,136,000 2,652,400
China 1982 0.1 1,002,700 242,700Colombia 1964 2 349,700 n.a.
1973 10 1,988,800 349,9001985 10 2,643,100 571,0001993 10 3,213,700 788,000
France 1962 5 2,320,900 748,9001968 5 2,487,800 815,7001975 5 2,629,400 915,6001982 5 2,631,700 969,6321990 4.2 2,360,900 949,893
Kenya 1989 5 1,074,100 224,9001999 5 1,410,200 318,200
Mexico 1960 1.5 502,800 n.a.1970 1 483,400 98,3001990 10 8,118,000 1,648,0002000 10.6 10,099,200 2,312,000
United States 1960 1 1,799,900 579,2001970 1 2,029,700 744,5001980 5 11,336,500 4,711,0001990 5 12,500,500 5,528,0002000 5 14,095,000 6,185,000
Vietnam 1989 5 2,627,000 534,2001999 3 2,368,200 534,100
TOTAL 122,557,600 37,112,725
IPUMS-International Users
• Prospective users must sign confidentiality agreement and provide an abstract explaining need for the data
• Through 9/1/05 we had 980 applicants to use the database, of which 582 were approved (59 percent)
• Users represent 40 countries and 250 institutions, including many international organizations (e.g., ILO, WHO, World Bank, Inter-American Development Fund)
Early results
National Academy of Sciences panel (2005) used data from Colombia, Kenya, Mexico, and Vietnam to analyze changing outcomes such as schooling, work, fertility, and marriage as a function of age, gender, and household characteristics.
Early results
Cynthia Feliciano (2005) compared the education of immigrants to the United States with those who remained behind to understand patterns of selectivity
Other topics include:
• Changing living arrangements of the aged• Concentration of mortality within families• Impact of rainfall on health and economic welfare• Female labor-force participation and educational attainment • Regional inequality differentials• Brain drain from developing countries• Effects of emigration on labor markets• Relationship between divorce and family composition • Relationship between disease factors and education• Relationship between educational attainment and cohort
size. • Effect of NAFTA on educational attainment and school
enrollment by region within Mexico
Number of countries requested by IPUMS-International users
(percent distribution)
1 country 39
2 countries 24
3 countries 10
4 countries 6
5-8 countries 20
Most users request multiple countries
IPUMS-International Tasks
• Inventory and preservation of data and documentation
• Processing • Documentation (especially comparability)• Dissemination—obtain licenses that allow us to
disseminate data for educational and scholarly use, and set up secure web-based dissemination system
IPUMS-International Tasks
• Inventory and preservation of data and documentation
• Processing • Documentation (especially comparability)• Dissemination—obtain licenses that allow us to
disseminate data for educational and scholarly use, and set up secure web-based dissemination system
UN Demographic Center for Latin America (CELADE, Santiago, Chile)~3000 microdata tapes recovered and metadata (documentation)
IPUMS-International Preservation Initiatives
Status of Data Acquisition
dark green = disseminating
medium green = data received
light green = negotiating
Current IPUMSI Latin America Europe Asia, Africa, Other
Brazil Argentina Austria ArmeniaChina Bolivia Bulgaria CambodiaColombia Chile Belarus CanadaFrance Costa Rica Czech Republic EgyptKenya Dominican Republic Germany FijiMexico Ecuador Greece IndonesiaUnited States El Salvador Hungary IraqVietnam Guatemala Ireland Israel
Honduras Netherlands MalaysiaNicaragua Romania MongoliaPanama Slovenia PakistanParaguay Spain Palestinian AuthorityPeru United Kingdom PhilippinesUruguay South AfricaVenezuela Tajikistan
Turkmenistan
Data Received or Agreement Signed
Current IPUMS-International Partners
Current funding for 44 countries by 2009Next data release late spring 2006
Current IPUMSI Latin America Europe Asia, Africa, Other
Brazil Argentina Austria ArmeniaChina Bolivia Bulgaria CambodiaColombia Chile Belarus CanadaFrance Costa Rica Czech Republic EgyptKenya Dominican Republic Germany FijiMexico Ecuador Greece IndonesiaUnited States El Salvador Hungary IraqVietnam Guatemala Ireland Israel
Honduras Netherlands MalaysiaNicaragua Romania MongoliaPanama Slovenia PakistanParaguay Spain Palestinian AuthorityPeru United Kingdom PhilippinesUruguay South AfricaVenezuela Tajikistan
Turkmenistan
Data Received or Agreement Signed
Current IPUMS-International Partners
Current funding for 44 countries by 2009Next data release late spring 2006
IPUMS-International Tasks
• Inventory and preservation of data and documentation
• Processing • Documentation (especially comparability)• Dissemination—obtain licenses that allow us to
disseminate data for educational and scholarly use, and set up secure web-based dissemination system
Processing
1. Standardize format
2. Correct format errors
3. Draw samples
4. Add confidentiality protections
5. Harmonize codes
6. Edit and allocate missing or inconsistent data
7. Add standard constructed variables
Pernum Relationship Age Sex Marst Chborn
1 head 53 female separated 6
2 child 28 male single n/a
3 child 22 male single n/a
4 child 21 male single n/a
5 child 25 female married 2
6 child-in-law 28 male married n/a
7 grandchild 3 male single n/a
8 grandchild 1 male single n/a
9 non-relative 32 female separated 2
10 non-relative 10 male single n/a
11 non-relative 5 female single n/a
Location
Location
Location
0
0
0
0
0
6
5
0
0
0
0
0
0
1
1
1
1
0
5
5
0
9
9
0
0
0
6
6
0
0
0
0
0
Spouse’s Father’sMother’s
Constructed Variables: IPUMS Family Interrelationship Pointers
IPUMS-International Tasks
• Inventory and preservation of data and documentation
• Processing • Documentation (especially comparability)• Dissemination—obtain licenses that allow us to
disseminate data for educational and scholarly use, and set up secure web-based dissemination system
Documentation
1. Translate codebooks, enumeration forms, and enumeration instructions into English
2. Standardize format and add xml tags 3. Write documentation identifying comparability
problems across countries, and within countries, across time periods
4. Assemble and scan ancillary documentation (e.g. census maps, post-enumeration survey results, and additional information on post-enumeration processing).
IPUMS-International Tasks
• Inventory and preservation of data and documentation
• Processing • Documentation (especially comparability)• Dissemination—obtain licenses that allow us to
disseminate data for educational and scholarly use, and set up secure web-based dissemination system
Dissemination
• Uniform perpetual agreements with national statistical agencies allows us to disseminate anonymized microdata to researchers who agree to a web-based confidentiality agreement
• MPC staff assess research proposals for feasibility • Disputes with agencies, if they arise, will be settled by
the International Court of Arbitration in Paris• Data dissemination occurs exclusively through the
IPUMS-International web-based data access system