Upload
antony-sims
View
228
Download
0
Embed Size (px)
DESCRIPTION
Coverage What is the target (theoretical) population? Does the statistical frame fully cover this population? What are the known deficiencies in the frame? Can these be addressed with other data? 3/16/2011 (c) John M. Abowd and Lars Vilhuber 2011, all rights reserved 3
Citation preview
INFO 4470/ILRLE 4470 Visualization Tools and Data
Quality
John M. Abowd and Lars VilhuberMarch 16, 2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
2
Outline
• Data quality issues• Coverage• Known biases• Missing data• Extended example using the Quarterly
Workforce Indicators
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
3
Coverage
• What is the target (theoretical) population?• Does the statistical frame fully cover this
population?• What are the known deficiencies in the
frame?• Can these be addressed with other data?
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
4
Biases and Sampling Error
• Bias: the statistical model does not estimate the quantity of interest in expected value
• Sampling error: the statistical model has imprecision in estimating the quantity of interest
• Non-sampling error: variation in the statistical model introduced by editing, imputation, non-random non-response, etc.
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
5
What are Missing Data?
• Random or systematic missing items or entities in a statistical sample or frame
• One of the most ubiquitous data quality problems known
• Cannot be ignored, or rather
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
6
How to code missing data?
• How about giving a zero, “0” for missing?• Note with a distinctive value such as:– -9 for non-responses– -8 for invalid responses– Or even “.”
• Statistical software like SAS, Stata and SPSS can then recognize those values as missing and treat them according to your instructions
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
7
How to Handle Missing Data
• It depends on the type of analysis:– Simple summary tabulations– Descriptive stats, including mean and standard
deviation– Correlation, relation between two or more
variables– Model to estimate the effects of characteristics
3/16/2011
Missing vs Non-missing
"There is no difference between cases with missing versus observed
data"
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
9
“Inference and Missing Data” Rubin (1976)
• MCAR – Missing Completely At Random– Rigorous and hard to achieve– Sometimes by design
• MAR – Missing At Random– Less Rigorous– Cannot be tested without the data which are
missing
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
10
Assume MAR?
• YES – then mechanism by which data are missing is IGNORABLE
• NO – then mechanism is not ignorable, requires more involved procedures
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
11
MAR - YES
Delete• Listwise
• Pairwise
Impute• Assign Mean
• Hot Deck
• Maximum Likelihood
• Multiple Imputation
3/16/2011
Visualization of Missing Data in the Quarterly Workforce Indicators
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
13
Data Sources (Quarterly Only)
• Quarterly Workforce Indicators (Census Bureau)
• Quarterly Census of Employment and Wages (BLS)
• Business Employment Dynamics (BLS)• Job Openings and Labor Turnover Survey• Adjustments by to JOLTS by Davis, Faberman,
Haltiwanger, and Rucker (2010)
3/16/2011
Quarterly Workforce Indicators I
• Flows are based on longitudinally linked (by employer and employee) Unemployment Insurance Wage Records
• Beginning-of-quarter employed if wage record with earnings > $1.00 in quarters t-1 and t (B)
• End-of-quarter employed if wage record with earnings > $1.00 in quarters t and t+1 (E)
• Accession if wage record in t but not t-1 (A)• Separation if wage record in t but not t+1 (S)
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
15
Quarterly Workforce Indicators II
• Job creation if establishment has positive employment change from beginning to end of quarter (JC)
• Job destruction if establishment has negative employment change from beginning to end of quarter (JD), always stated as absolute value of change
• Demographic (age x gender), geographic (county, CBSA, WIB), NAICS (sector, sub-sector, industry group), and ownership (All, private)3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
16
Quarterly Census of Employment and Wages
• Stocks of employment measured as of the 12th day of the month for each month in the quarter for each establishment (no job level data)
• BLS uses month-3 employment to measure changes• Beginning of quarter employment is month-3
employment from quarter t-1• End of quarter employment is month-3 employment
for quarter t• Geographic (county, MSA), NAICS (sector, sub-sector,
industry group), and ownership (all, public, private)
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
17
Business Employment Dynamics
• Gross job gains (job creations) is the change in employment at an establishment between month-3 in quarter t-1 and month-3 in quarter t, if positive
• Gross job losses (job destructions) is the absolute value of the change in employment at an establishment between month-3 in quarter t-1 and month-3 in quarter t, if negative
• Limited geography (state), NAICS (sector) detail3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
18
Job Openings and Labor Turnover Survey
• Monthly survey of continuing establishments• Accessions measured as all new employment over
the course of the month (summed over the three months in the quarter here)
• Separations measured as quits, layoffs, discharges, and other over the course of the month (summed over the three months of the quarter here)
• Limited geography (national) and NAICS (sector) detail
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
19
QWI Coverage of the Private Workforce
Sub-period 1
Sub-period 2
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
20
Worker Reallocation Rate
• This rate is available in the QWI for 8 age groups, both genders, NAICS sector, state and quarter from 1990:Q1 to 2008:Q4 (more detail is available than we used)
2agsktagskt
agsktagsktagskt EB
SAWRR
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
21
Job Reallocation Rate
• This rate is available in the QWI for 8 age groups, both genders, NAICS sector, state and quarter from 1990:Q1 to 2008:Q4 (more detail is available than we used)
2agsktagskt
agsktagsktagskt EB
JDJCJRR
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
22
Excess Reallocation Rate
• ERR = WRR – JRR• The excess reallocation rate measures the extent
to which gross worker flows exceed the minimum required to service the gross job flows
• This has been very difficult to estimate nationally because there were no data collected on a consistent basis for all the component flows
• QWI solved that problem
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
23
Statistical Methodology
• Divide the analysis into two periods– 1993:Q1-2001:Q4 (early period, many states are completely
missing, 10 states complete)– 1999:Q1-2008:Q4 (later period, 37 states are complete)
• For each sub-period use a multiple imputation model to complete the missing data
• For the overlap period, use a ramped weight to compute the average implicate combining the two periods
• Use the standard multiple imputation formulae to combine implicates
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
24
National Estimates
• The combining formula for producing the national WRR is shown above (similar formulae apply to other rates)
WRRag+kt = 1MP M
`=1
2664P
8s
µB ( `)agsk t +E ( `)
agsk t2
¶WRR ( `)
agsk t
P8v
µB ( `)agvk t +E ( `)
agvk t2
¶
3775
agkstsk
agkstagkst
agtagt
agtagt
agtagtagt
WRREB
EB
EBSA
WRR
, 22
2
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
25
Implicate Combining Formulae I
V (`)hdWRRag+kt
i= 1
49P
8s
µB (`)agsk t +E (`)
agsk t2
¶³WRR (`)
agsk t ¡ dWRR ( `)ag+k t
´ 2
P8v
µB (`)agvk t +E ( `)
agvk t2
¶ :
B £WRRag+kt¤= 1
M ¡ 1P M
`=1
µdWRR (`)
ag+kt ¡ WRRag+kt
¶2
M
s ktagktag
agsktagsktagskt
agktEB
WRREB
MWRR
1
2
21
s ktagktag
agsktagsktagskt
agkt EB
WRREB
RRW
2
2
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
26
Implicate Combining Formulae II
T £WRRag+kt¤= 1
MP M
`=1V (`)hdWRRag+kt
i+ M +1
M B £WRRag+kt¤
MR £WRRag+kt¤= B [WRR ag+k t ]
T [WRR ag+k t ]
df £WRRag+kt¤= (M ¡ 1)
µ1+ 1
M +11MP M
`=1 V( `)£dWRR ag+k t
¤B [WRR ag+k t ]
¶2:
s ktagktag
agktagsktagsktagskt
agkt EB
RRWWRREB
RRWV
2
2491
2
M
agktagktagkt WRRRRWM
WRRB1
2
11
agkt
M
agktagkt WRRBMMRRWV
MWRRT
1
11
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
27
Implicate Combining Formulae III
T £WRRag+kt¤= 1
MP M
`=1V (`)hdWRRag+kt
i+ M +1
M B £WRRag+kt¤
MR £WRRag+kt¤= B [WRR ag+k t ]
T [WRR ag+k t ]
df £WRRag+kt¤= (M ¡ 1)
µ1+ 1
M +11MP M
`=1 V( `)£dWRR ag+k t
¤B [WRR ag+k t ]
¶2:
agkt
agktagkt
WRRTWRRBWRRMR
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
28
Results
• Comparison of WRR between QWI and JOLTS• Comparison of JRR between QWI and BED• Demographic detail• Industry detail
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
29
WRR: QWI v. JOLTS
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
30
WRR: QWI v. Adjusted JOLTS
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
31
JRR: QWI v. BED
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
32
JRR WRR ERR by Gender
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
33
ERR (Churning) by Age Group
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
34
Job Creation Rate
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
35
Job Destruction Rate
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
36
Job Reallocation Rate
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
37
Accession Rate
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
38
Separation Rate
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
39
Worker Reallocation Rate
3/16/2011