Upload
blaise-townsend
View
217
Download
0
Tags:
Embed Size (px)
Citation preview
2014 SDC and CIC Annual Training Conference:
Accessing ACS PUMS Data
Tim GilbertU.S. Census Bureau
April 2, 2014
Outline
Fundamentals of PUMS Data
Geography and the PUMS
Accessing PUMS Data
Multiple Vintage Variables in PUMS
Working with multiple vintage variables
Considerations with PUMS
Resources
2
3
Summary Data and Microdata What’s the Difference?
Summary data are predefined tables for specific geographic areas (states, counties, etc.)
In the ACS microdata, the basic unit is an individual housing unit or person
4
Public Use
Microdata
Sample
anonymized, downloadable
records of individual people
a representative sample of the population
What are PUMS data?
5
PUMS Overview
PUMS sample is a subsample of ACS interviews, one percent of all US households
PUMS is a “weighted” sample Weighting variables must be used in analysis
A set of two files - housing units and persons
ACS produces 1-, 3-, and 5-year PUMS files
Available as SAS files and CSV files, or via DataFerrett
6
Why Use PUMS?
Data needed for a tabulation or a specific universe not supported by standard ACS tables (e.g., population groups by single year of age)
Statistical analysis required to understand relationships between economic, demographic or housing variables (e.g., correlation analysis)
Can create new measures using multiple variables or other people in household (spouse’s occupation, same-sex couples, number of kids)
7
Types of PUMS Files Released
We release 3 new PUMS files every year 1 year PUMS (example: 2012 1-year PUMS)
December 2013
3-year PUMS (example: 2010-2012 3-year PUMS) February 2014
5-year PUMS (example: 2008-2012 5-year PUMS) March 2014
8
Modifications to Multiyear PUMS
Multiyear PUMS have the same cases and geography as their component 1-year files
How are multiyear PUMS different from single year? Weights are produced using latest population estimate
“vintages” Dollar amounts are standardized
Why use the multiyear PUMS files? For studying small groups, where more cases are needed When analysis is also making use of multiyear summary
data
Outline
Fundamentals of PUMS Data
Geography and the PUMS
Accessing PUMS Data
Multiple Vintage Variables in PUMS
Working with multiple vintage variables
Considerations with PUMS
Resources
9
10
Limited Geographic Detail
Geographic identifiers are region, division, state, Public Use Microdata Area (PUMA)
PUMAs can be used to identify geographic areas of 100,000+
PUMS is not designed for statistical analysis of small geographic areas
11
Public Use Microdata Area (PUMA)
Defined after each census by the states in coordination with the Census Bureau’s Geography Division http://www.census.gov/geo/puma/puma2010.html Redefined PUMAs for 2012 PUMS files DY 2012 multiyear files have dual PUMA vintages
Large enough to meet disclosure avoidance requirements
PUMAs are identified by a five-digit number, unique within each state
12
PUMA Reference Mapshttp://www.census.gov/geo/maps-data/maps/reference.html
13
Interactive PUMA Mapshttp://tigerweb.geo.census.gov/tigerwebmain/tigerweb_main.html
Outline
Fundamentals of PUMS Data
Geography and the PUMS
Accessing PUMS Data
Multiple Vintage Variables in PUMS
Working with multiple vintage variables
Considerations with PUMS
Resources
14
15
American FactFinderhttp://www.census.gov/acs/www/data_documentation/pums_data/
16
American FactFinder
17
American FactFinder
18
PUMS on FTP sitewww2.census.gov
19
PUMS on FTP sitewww2.census.gov
20
PUMS on FTP sitewww2.census.gov
21
DataFerretthttp://dataferrett.census.gov/
Outline
Fundamentals of PUMS Data
Geography and the PUMS
Accessing PUMS Data
Multiple Vintage Variables in PUMS
Working with multiple vintage variables
Considerations with PUMS
Resources
22
23
What are multiple vintage variables?
2010-2012 3-Year PUMS and 2008-2012 5-Year PUMS contain variables with multiple vintages
Multiple vintage variables have differing sets of values for different years within the same multi-year file
24
Multiple Vintage Variables in PUMS2010-2012 ACS 3-Year PUMS 2008-2012 ACS 5-Year PUMS
2010&2011 2012 2008&2009 2010&2011 2012
ANC1P05ANC2P05CITWP05 LANP05MARHYP05 MIGSP05OCCP10POBP05 POWSP05 RAC2P05RAC3P05 SOCP10YOEP05
ANC1P12ANC2P12CITWP12LANP12MARHYP12MIGSP12OCCP12POBP12POWSP12RAC2P12RAC3P12SOCP12YOEP12
ANC1P05ANC2P05CITWP05LANP05MARHYP05MIGSP05OCCP02POBP05 POWSP05 RAC2P05 RAC3P05SOCP00YOEP05
ANC1P05ANC2P05CITWP05LANP05MARHYP05MIGSP05OCCP10POBP05 POWSP05 RAC2P05 RAC3P05SOCP10YOEP05
ANC1P12ANC2P12CITWP12LANP12MARHYP12MIGSP12OCCP12 POBP12POWSP12RAC2P12RAC3P12SOCP12YOEP12
Outline
Fundamentals of PUMS Data
Geography and the PUMS
Accessing PUMS Data
Multiple Vintage Variables in PUMS
Working with multiple vintage variables
Considerations with PUMS
Resources
25
26
PUMS Documentation
PUMS ReadMe
List of variables with multiple vintages
PUMS Data Dictionary
Variable names, descriptions, and values
Accuracy of the PUMS
Information about working with multiple vintage variables
http://www.census.gov/acs/www/data_documentation/pums_documentation/
27
1. Verify variable has multiple vintages from PUMS ReadMe – Marital History
Using Multiple Vintage Variables
28
2. Look up differences between vintages in data dictionary
Using Multiple Vintage Variables
29
Using Multiple Vintage Variables
3. Recode and combine vintages to create one variable
If MARHYP05 less than or equal to 1932
OR
If MARHYP12 equals 1932
THEN
MARHYP (derived variable) equals 1932
30
1. Look up PUMA variable vintages in data dictionary
Using Multiple Vintage PUMAs
31
Using Multiple Vintage PUMAs
2. Look up PUMA in Missouri State Data Center’s MABLE/Geocorr12 at http://mcdc.missouri.edu/websas/geocorr12.html
32
Using Multiple Vintage PUMAs
3. Find corresponding PUMAs across vintages
33
Using Multiple Vintage PUMAs
4. Combine the PUMA vintages across years
If ST equals 26 and PUMA00 equals 03806
OR
If ST equals 26 and PUMA10 equals 03204
THEN
PUMA (derived variable) equals XXXXX
Outline
Fundamentals of PUMS Data
Geography and the PUMS
Accessing PUMS Data
Multiple Vintage Variables in PUMS
Working with multiple vintage variables
Considerations with PUMS
Resources
34
35
Analyzing PUMS Data
National level files must be concatenated See PUMS ReadMe
Use SERIALNO variable to merge housing and person records to create complete file See PUMS ReadMe
http://www.census.gov/acs/www/data_documentation/pums_documentation/
36
Types of PUMS Weights
PUMS household weights (wgtp) must be used to produce housing unit estimates
PUMS person weights (pwgtp) must be used to produce population estimates
PUMS replicate weights (wgtp1 – wgtp80 and pwgtp1 – pwgtp80) are used for calculating standard errors
37
Estimating Variance with PUMS
Problem: PUMS is not a simple random sample Stratified samples with complex weighting Sample drawn at household level (i.e., not a simple
random sample of individuals)
Solutions: Use weighting variable and a “design factor” Use weighting variable and 80 “replicate weights”
See Accuracy of the PUMS
http://www.census.gov/acs/www/data_documentation/pums_documentation/
Outline
Fundamentals of PUMS Data
Geography and the PUMS
Accessing PUMS Data
Multiple Vintage Variables in PUMS
Working with multiple vintage variables
Considerations with PUMS
Resources
38
39
http://www.census.gov/acs/www/data_documentation/pums_documentation/
40
Accuracy of the PUMS
41
http://www.acsdatausers.org/
42
Contact Information
ACS/PRCS website:www.census.gov/acs
ACS User Support:
Questions?