Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Learning outcomes
• Greater knowledge of micro integration
• Identify the key steps in the micro integration process
• Appreciate some specific techniques which may be applied
in micro integration
What is micro-integration?
“Micro integration involves matching data from statistical units
at an individual level, with the goal of compiling better
information than is possible when using the separate
sources.” – Bart Bakker
Why?
Survey sampling paradigm:
• Use as sampling frame
• Reduce sampling error
– Post-stratification
– Ratio/regression estimation
• Reduce non-response error
– Response enhancement during data collection
– Statistical adjustment after data collection
Why?
Beyond…
• Use as target data
– New statistics
– Reduce costs
• Harmonisation of statistics
• Small population groups
• Small area estimation
• Quality measures
Italian Integrated System of Statistical
Registers (ISSR)
6
• Integrated System of Statistical Registers• single logical environment to
support the consistency of statistical production processes and improve outputs for users
• in particular consistency in “identification” and “estimation” of units and variables for the system as a whole
• New analyses starting from populations in registers
PopulationRegister
TerritoryRegister
Business Register
Activity Register
Micro integration process
• Harmonisation of units
• Harmonisation of reference periods
• Completion of populations
• Harmonisation of variables
• Harmonisation of classifications
• Adjusting for measurement errors
• Adjusting for missing data
• Derivation of variables
• Overall consistencyPaul van der Laan (2000)
Harmonisation of units
Example of wages in LFS
• Labour Force Survey (LFS) – large panel survey of people
measuring the labour force.
• Statistical unit: person 15-74 year old
• Variable: Gross monthly earnings for employees main job
• 6 countries integrate from register (DK, ES, NL, AT, SI, SE)
Norwegian wages integration into LFS
• Employers must report information on wages for all
workers to a central system every month
• Personal Identity number is associated with employees
• Statistical unit: work relation/job
Multiple job holders decide for
themselves which job is to be
considered as the main job. In doubtful
cases the main job should be the one
with the greatest number of hours
usually worked.
Harmonisation of units – main job
1. Compulsory service
2. Ordinary job
• Longest hours
• Held longest
• Greatest income
3. Central Tax Office for foreign affairs
• Greatest income
• Held longest
4. Freelancer
• Longest hours
• Held longest
• Greatest income
5. Other
Labour Force Survey Norwegian Jobs register
Micro integration process
• Harmonisation of units
• Harmonisation of reference periods
• Completion of populations
• Harmonisation of variables
• Harmonisation of classifications
• Adjusting for measurement errors
• Adjusting for missing data
• Derivation of variables
• Overall consistencyPaul van der Laan (2000)
Harmonisation of reference periods
• Fortnightly payroll, monthly statistics
• Environmental monitoring and health outcomes
Smoothing
splines
Micro integration process
• Harmonisation of units
• Harmonisation of reference periods
• Completion of populations
• Harmonisation of variables
• Harmonisation of classifications
• Adjusting for measurement errors
• Adjusting for missing data
• Derivation of variables
• Overall consistencyPaul van der Laan (2000)
Central population register Address and building register
Person ID Household
ID
Dwelling ID
001 1 H010557
002 1 H010557
003 2 H022588
Dwelling ID Owner ID Address
H010557 001 Akersveien 26
H022588 003 Akersveien 26
Central population register Address and building register
Person ID Household
ID
Dwelling ID
001 1 H010557
002 1 H010755
003 3 H022589
Dwelling ID Owner ID Address
H010557 001 Akersveien 26
H010758 002 Akersveien 26
H022588 005 Akersveien 26
?
?
?
Central population
register
Address and
building register
85%
(Norway 2011)
Household type
Household size
Average age – adults
Average age – children
Building type
Address
Building type
Living areal
(Single) Nearest Neighbour Imputation
• Select a donor unit which is considered to be similar to the
unit requiring imputation, and donate the response from
that unit to the unit requiring a response.
• Requires additional variables to calculate a distance
measure matrix between the donors and recipients
Household
ID
Average age
of adults
Household
size
Dwelling
number
001 30 2 ?
002 25 2 H02033
003 45 1 H01038
004 40 3 H02078
005 55 4 B01093
001 002 003 004 005
001 0
002 5 0
003 15.03 20.02 0
004 10.05 15.03 5.39 0
005 25.08 30.07 10.44 15.03 0
(Single) Nearest Neighbour Linkage
Euclidean distance:
H02033
HouseholdCentral population register
DwellingAddress and building register
Household type
Household size
Average age – adults
Average age – children
Building type
Address
Building type
Living areal
Double Nearest Neighbour Linkage
Micro integration process
• Harmonisation of units
• Harmonisation of reference periods
• Completion of populations
• Harmonisation of variables
• Harmonisation of classifications
• Adjusting for measurement errors
• Adjusting for missing data
• Derivation of variables
• Overall consistencyPaul van der Laan (2000)
Enterprise surveys example
• The Labour Cost Survey (LCS) concerning structural
statistics on earnings and on labour costs.
• The Continuing Vocational Training Survey (CVTS),
concerning statistics relating to vocational training in
enterprises.
• Register of enterprises
Harmonisation of variables
All enterprises
Number of employees
Survey of enterprises with
>= 10 employees
Number of employees
includes:
• homeworkers if there is
an explicit agreement
that the homeworker is
remunerated on the
basis of the work done
and they are included
on the pay-roll.
Survey of enterprises
with >=10 employees
Number of employees
includes:
• unpaid family workers
Labour Cost Survey (LCS) Continuing vocational
training survey (CVTS)Register of enterprises
Harmonisation of variables
• Coverage of costs of CVT courses
and other forms of training
• Includes costs associated with
apprentices
• Coverage of costs of CVT courses
only
• Exclusion of costs for apprentices.
Labour Cost Survey (LCS) Continuing vocational training survey (CVTS)
Variable names
Same name/label Different name/label
Same definition Same value Same value
Different definition Different value Different value
Micro integration process
• Harmonisation of units
• Harmonisation of reference periods
• Completion of populations
• Harmonisation of variables
• Harmonisation of classifications
• Adjusting for measurement errors
• Adjusting for missing data
• Derivation of variables
• Overall consistencyPaul van der Laan (2000)
Harmonisation of classifications
All enterprises
Number of employees
• Exact number
Survey of enterprises
with >= 10 employees
Number of employees:
• 10-49 employees
• 50-249 employees
• 250-499 employees
• 500-999 employees
• 1000+ employees
Survey of enterprises
with >=10 employees
Number of employees
• 10-49 employees
• 50-249 employees
• 250+ employees
Labour Cost Survey (LCS) Continuing vocational
training survey (CVTS)Register of enterprises
Exercises – micro integration
• The harmonisation of variables and classifications is an
important part of micro integration.
1. Describe an example of when the harmonisation of variables or
classifications was done well in your organisation. What was the
outcome?
2. Do you have any examples of when this wasn’t done optimally.
What was the outcome?
Micro integration process
• Harmonisation of units
• Harmonisation of reference periods
• Completion of populations
• Harmonisation of variables
• Harmonisation of classifications
• Adjusting for measurement errors
• Adjusting for missing data
• Derivation of variables
• Overall consistencyPaul van der Laan (2000)
Measurement error
• Difference between the actual value of a quantity and the
value obtained by a measurement. Repeating the
measurement will improve (reduce) the random error
(caused by the accuracy limit of the measuring instrument)
but not the systemic error (caused by incorrect calibration
of the measuring instrument).
World estimates of maternal mortality ratio
Number of maternal deaths per 100 000 live births
Proportion of deaths among women of reproductive age due to maternal causes
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑙𝑖𝑣𝑒 𝑏𝑖𝑟𝑡ℎ𝑠∙ 100000
• Country and world region estimates
• Denominator: UNPD estimates
Maternal mortality data sources
• Maternal mortality data can come from a variety of sources:
– Vital registration
– Household surveys (sisterhood method)
– Censuses
– Reproductive-age mortality studies (RAMOS)
– Verbal autopsy
Measurement errors in vital registration
Vital registration:
• Country/year specific adjustment factors used if available
• 1.5 adjustment used if not
Survey/census data
• Underreporting of maternal deaths (10% adjustment)
• Over-reporting when all pregnancy deaths reported (10%-
15% adjustment)
Micro integration process
• Harmonisation of units
• Harmonisation of reference periods
• Completion of populations
• Harmonisation of variables
• Harmonisation of classifications
• Adjusting for measurement errors
• Adjusting for missing data
• Derivation of variables
• Overall consistencyPaul van der Laan (2000)
Imputation
• Many techniques
– Donor imputation: hot deck, cold deck, nearest neighbour, historic
– Explicit model: average, regression
• Single and multiple imputation
• Stochastic
• Restrictions
• Multivariate and univariate
Micro integration process
• Harmonisation of units
• Harmonisation of reference periods
• Completion of populations
• Harmonisation of variables
• Harmonisation of classifications
• Adjusting for measurement errors
• Adjusting for missing data
• Derivation of variables
• Overall consistencyPaul van der Laan (2000)
Derivation of variables - employment
• Integration of:
– Micro data fra a-ordningen
– Business register
– Central pop register
– Unemployment register
– Compulsory military service register
– SFU – overseas tax office
– Illness - doctor register
– Education database
– Immigration data
Derivation of variables - employment
• Start and stop dates indicate person is in work and:
– Received income in the current period or
– Completing compulsory service or
– Received governmental welfare indicating temporary absence (sick pay,
maternity leave etc) or
– Registered as temporarily laid off or permitted leave (<90 days) or
– The start date for work is very recent
– Received income in the previous month and next month
Micro integration process
• Harmonisation of units
• Harmonisation of reference periods
• Completion of populations
• Harmonisation of variables
• Harmonisation of classifications
• Adjusting for measurement errors
• Adjusting for missing data
• Derivation of variables
• Overall consistencyPaul van der Laan (2000)
Overall consistency
• Obvious errors:– Logical constraints
• Pregnant males
• Parts don’t add to the sum
– Unreasonable
• Negative wages or number of employees
– Extreme values
• Probable errors– All jobs equate to < 160% work load
– Statistical distributions: regression controls, quartile methods
Exercises – Measurement error
Scenario:
The Rental Market Survey provides estimates for the average monthly rents of rental housing by district. The statistical unit is dwelling, while individual people living in those dwellings are asked to provide monthly rental price details. It is a panel survey with sampled units are asked every month for 13 months.
Your organisation has recently swapped the Rental Market Survey from a CATI data collection to a CAWI collection mode. The new estimates show rental property prices have gone up significantly in the most recent period. You suspect it could be due a measurement error due to mode effects.
1. What technique(s) could you use to investigate and measure a measurement error? (Given unlimited resources)
2. How could the micro integration of survey data with other data sources be used to investigate measurement error and improve estimation.
References
Alkema L, Zhang S, Chou D, et al. (2015) A Bayesian approach to the global estimation of maternal
mortality.
Bakker, Bart (2011) Micro Integration. Statistics Netherlands, The Hague
HLG-MOS (2017) DRAFT A guide to data integration for Official Statistics
Van der Laan, Paul (2000) Integrating administrative registers and household surveys. Netherlands Official
Statistics, 2
WHO (2015) Trends in maternal mortality: 1990 to 2015: estimates by WHO, UNICEF, UNFPA, World Bank
Group and the United Nations Population Division. Geneva, Switzerland
Working group Labour Market Statistics (LAMAS), Document for item 3.4 of the agenda. (2014) Future of
the CVTS data collection. Eurostat/F3/LAMAS/31/14.
Zhang, L.-C. (2012). Topics of statistical theory for register-based statistics and data integration. Statistica
Neerlandica, vol. 66, pp. 41-63.
Zhang, L.C. & Hendriks, C. (2012) Micro integration of register-based census data for dwelling and
household. Work Session on Statistical Data Editing