Upload
mircea
View
63
Download
2
Embed Size (px)
DESCRIPTION
Editing the 2011 Census data with CANCEIS and options considered for 2016. Lyne Guertin Census Data Processing and Estimation Section Social Survey Methods Division Methodology Branch, Statistics Canada UNECE April 28-30, 2014. Overview of CANCEIS - PowerPoint PPT Presentation
Citation preview
Lyne GuertinCensus Data Processing and Estimation Section
Social Survey Methods Division Methodology Branch, Statistics Canada
UNECE April 28-30, 2014
Editing the 2011 Census data with CANCEIS
and options considered for 2016
1 UNECE 2014Statistics Canada • Statistique Canada
Outline
1. Overview of CANCEIS
2. Recent improvements to CANCEIS and to the 2011 E&I strategy
3. Options considered for 2016
2 UNECE 2014Statistics Canada • Statistique Canada
1. Overview of CANCEIS (CANadian Census Edit and Imputation System)
3 UNECE 2014Statistics Canada • Statistique Canada
4 UNECE 2014Statistics Canada • Statistique Canada
CANCEIS users
Domestic Users (other than Census)
• National Household Survey
• Canadian Income Survey
• Survey on Financial Security
• Survey of Household Spending
• Longitudinal and International Study of Adults
5 UNECE 2014Statistics Canada • Statistique Canada
6
Other countries (users, past users, or exploring CANCEIS)
Argentina Australia Brazil
Israel Italy Japan
New Zealand Peru Switzerland
UK USA
CSPA initiative (Common Statistical Processing Architecture)
• Targeted CANCEIS in a pilot with New Zealand to test portability.
UNECE 2014Statistics Canada • Statistique Canada
Imputation methods available
Deterministic imputation
Donor imputation
• Based upon the principles of
– minimum change
– preserving distribution of the data
7 UNECE 2014Statistics Canada • Statistique Canada
8
Developed by Mike Bankier in the 1990’s
A. Apply edits
Search for invalid values, missing & inconsistencies
Classify records as Passed or Failed
New Imputation methodology (NIM)
UNECE 2014Statistics Canada • Statistique Canada
9
B. Perform donor imputation• Step1: establish list of best donors (i.e. that most
resemble the failed record)
• Step2: find best imputation actions for these donors
• Step3: select an imputation action at random
New Imputation methodology (NIM) (cont’d)
UNECE 2014Statistics Canada • Statistique Canada
10
Advantages of this methodology
Offers a practical solution to an operational problem Allows simplification of
edits use minimum set in relation
to the donor chosen
Computationally efficient
Can deal with non-linear edits
Data driven imputation
UNECE 2014Statistics Canada • Statistique Canada
CANCEIS Features Categorical, numerical and alphanumeric
variables
Large numbers of edits & large data files
Portable, flexible & efficient
All parameterized easy to customize
• Ten different distance functions to find best donors,
which cover different types of variables
11 UNECE 2014Statistics Canada • Statistique Canada
12
over all paired fields (i) where
• Vfi is the value of matching variable i for the failed record;
• Vpi is the value of matching variable i for the passed record;
• wi is the weight of variable i (wi≥0);
• Di is the distance function chosen for variable i (0≤Di≤1).
Distance Measure for Potential Donors
),( pifii
iifp VVDwD
UNECE 2014Statistics Canada • Statistique Canada
CANCEIS System ComponentsDataData Data DictionaryData Dictionary System
ParametersSystem
ParametersDecision Logic
TablesDecision Logic
Tables
Donor Imputation
Deterministic Imputation
Imputed DataImputed Data Reports & LogsReports & Logs
13
Inputs
CANCEIS Components
Outputs
UNECE 2014Statistics Canada • Statistique Canada
14
2. Recent improvements to CANCEIS and to the 2011 E&I strategy
UNECE 2014Statistics Canada • Statistique Canada
Improvements
For 2011, CANCEIS was rewritten in C# (C-sharp)
in a .NET environment
• Easier to maintain
• Improved efficiency (lower processing time)
• Increased stability
15 UNECE 2014Statistics Canada • Statistique Canada
Improvements (cont’d)
Multi-threading now possible in donor imputation
• Enables processing of multiple failed units at one time
• Increases performance and reduces processing time
16 UNECE 2014Statistics Canada • Statistique Canada
Improvements (cont’d)
CANCEIS is more user friendly
• Before: could handle only .txt files (inputs/outputs)
• Now: handling also data dictionaries in Excel and
creating summary reports in HTML
17 UNECE 2014Statistics Canada • Statistique Canada
Improvements (cont’d)
Increased content and level of detail in the logs
• Facilitate troubleshooting
• Facilitate validating desired strategy for each module
18 UNECE 2014Statistics Canada • Statistique Canada
New features added Additional flexibility in specifying imputation
parameters
New parameter to specify that the staged search will not stop until an excellent donor is found
• Continue to search if the target quality is not reached
19 UNECE 2014Statistics Canada • Statistique Canada
Modification to the 2011 E&I strategy
Group these five processes• Place of birth of parents• Immigration status• Aboriginal status• Citizenship • Visible minorities
into one ethnocultural process
20 UNECE 2014Statistics Canada • Statistique Canada
Modification to the 2011 E&I strategy (cont’d)
Goals:
• Increase data coherence between processes by
using one single donor to impute all variables
• Reduce manual fixes after E&I
Challenge: manage lots of edits & data
21 UNECE 2014Statistics Canada • Statistique Canada
22
3. Options considered for 2016
UNECE 2014Statistics Canada • Statistique Canada
23
Planning E&I strategy for 2016• Evaluating the use of administrative data as alternative
source of data• Exploring if the language processes could be grouped
(mother tongue, home language, official language)
• Exploring if steps within processes could be grouped
• Exploring if processes could be run in parallel
Goals improve quality, reduce processing time
UNECE 2014Statistics Canada • Statistique Canada
24
Continue improving CANCEIS to serve future requirements of the Census• Research and development ongoing
Done by programmers and methodologists
CANCEIS v5.2 to be released by Dec.2014
• Allowing DLTs and System Parameters in Excel
• Revisited contents of Inputs & Outputs
• Standardized naming convention
• Improvements to default values of parameters
UNECE 2014Statistics Canada • Statistique Canada
Will offer the CANVERT conversion tool• Ensures smooth transition from v5.1 to v5.2
Updated documentation will be provided• Basic User Guide (with two simple examples and basic
features)• Comprehensive User Guide (with more examples, and
all features)
25 UNECE 2014Statistics Canada • Statistique Canada
Merci!
For more information, Pour plus d'information,
please contact: veuillez contacter :
Lyne Guertin (1-613-951-4543)
Thank you for your attention!
26