26
Lyne Guertin Census Data Processing and Estimation Section Social Survey Methods Division Methodology Branch, Statistics Canada UNECE April 28-30, 2014 Editing the 2011 Census data with CANCEIS and options considered for 2016 1 UNECE 2014 Statistics Canada • Statistique Canada

Lyne Guertin Census Data Processing and Estimation Section Social Survey Methods Division

  • Upload
    mircea

  • View
    63

  • Download
    2

Embed Size (px)

DESCRIPTION

Editing the 2011 Census data with CANCEIS and options considered for 2016. Lyne Guertin Census Data Processing and Estimation Section Social Survey Methods Division Methodology Branch, Statistics Canada UNECE April 28-30, 2014. Overview of CANCEIS - PowerPoint PPT Presentation

Citation preview

Page 1: Lyne Guertin Census Data Processing and Estimation Section Social Survey Methods Division

Lyne GuertinCensus Data Processing and Estimation Section

Social Survey Methods Division Methodology Branch, Statistics Canada

UNECE April 28-30, 2014

Editing the 2011 Census data with CANCEIS

and options considered for 2016

1 UNECE 2014Statistics Canada • Statistique Canada

Page 2: Lyne Guertin Census Data Processing and Estimation Section Social Survey Methods Division

Outline

1. Overview of CANCEIS

2. Recent improvements to CANCEIS and to the 2011 E&I strategy

3. Options considered for 2016

2 UNECE 2014Statistics Canada • Statistique Canada

Page 3: Lyne Guertin Census Data Processing and Estimation Section Social Survey Methods Division

1. Overview of CANCEIS (CANadian Census Edit and Imputation System)

3 UNECE 2014Statistics Canada • Statistique Canada

Page 4: Lyne Guertin Census Data Processing and Estimation Section Social Survey Methods Division

4 UNECE 2014Statistics Canada • Statistique Canada

Page 5: Lyne Guertin Census Data Processing and Estimation Section Social Survey Methods Division

CANCEIS users

Domestic Users (other than Census)

• National Household Survey

• Canadian Income Survey

• Survey on Financial Security

• Survey of Household Spending

• Longitudinal and International Study of Adults

5 UNECE 2014Statistics Canada • Statistique Canada

Page 6: Lyne Guertin Census Data Processing and Estimation Section Social Survey Methods Division

6

Other countries (users, past users, or exploring CANCEIS)

Argentina Australia Brazil

Israel Italy Japan

New Zealand Peru Switzerland

UK USA

CSPA initiative (Common Statistical Processing Architecture)

• Targeted CANCEIS in a pilot with New Zealand to test portability.

UNECE 2014Statistics Canada • Statistique Canada

Page 7: Lyne Guertin Census Data Processing and Estimation Section Social Survey Methods Division

Imputation methods available

Deterministic imputation

Donor imputation

• Based upon the principles of

– minimum change

– preserving distribution of the data

7 UNECE 2014Statistics Canada • Statistique Canada

Page 8: Lyne Guertin Census Data Processing and Estimation Section Social Survey Methods Division

8

Developed by Mike Bankier in the 1990’s

A. Apply edits

Search for invalid values, missing & inconsistencies

Classify records as Passed or Failed

New Imputation methodology (NIM)

UNECE 2014Statistics Canada • Statistique Canada

Page 9: Lyne Guertin Census Data Processing and Estimation Section Social Survey Methods Division

9

B. Perform donor imputation• Step1: establish list of best donors (i.e. that most

resemble the failed record)

• Step2: find best imputation actions for these donors

• Step3: select an imputation action at random

New Imputation methodology (NIM) (cont’d)

UNECE 2014Statistics Canada • Statistique Canada

Page 10: Lyne Guertin Census Data Processing and Estimation Section Social Survey Methods Division

10

Advantages of this methodology

Offers a practical solution to an operational problem Allows simplification of

edits use minimum set in relation

to the donor chosen

Computationally efficient

Can deal with non-linear edits

Data driven imputation

UNECE 2014Statistics Canada • Statistique Canada

Page 11: Lyne Guertin Census Data Processing and Estimation Section Social Survey Methods Division

CANCEIS Features Categorical, numerical and alphanumeric

variables

Large numbers of edits & large data files

Portable, flexible & efficient

All parameterized easy to customize

• Ten different distance functions to find best donors,

which cover different types of variables

11 UNECE 2014Statistics Canada • Statistique Canada

Page 12: Lyne Guertin Census Data Processing and Estimation Section Social Survey Methods Division

12

over all paired fields (i) where

• Vfi is the value of matching variable i for the failed record;

• Vpi is the value of matching variable i for the passed record;

• wi is the weight of variable i (wi≥0);

• Di is the distance function chosen for variable i (0≤Di≤1).

Distance Measure for Potential Donors

),( pifii

iifp VVDwD

UNECE 2014Statistics Canada • Statistique Canada

Page 13: Lyne Guertin Census Data Processing and Estimation Section Social Survey Methods Division

CANCEIS System ComponentsDataData Data DictionaryData Dictionary System

ParametersSystem

ParametersDecision Logic

TablesDecision Logic

Tables

Donor Imputation

Deterministic Imputation

Imputed DataImputed Data Reports & LogsReports & Logs

13

Inputs

CANCEIS Components

Outputs

UNECE 2014Statistics Canada • Statistique Canada

Page 14: Lyne Guertin Census Data Processing and Estimation Section Social Survey Methods Division

14

2. Recent improvements to CANCEIS and to the 2011 E&I strategy

UNECE 2014Statistics Canada • Statistique Canada

Page 15: Lyne Guertin Census Data Processing and Estimation Section Social Survey Methods Division

Improvements

For 2011, CANCEIS was rewritten in C# (C-sharp)

in a .NET environment

• Easier to maintain

• Improved efficiency (lower processing time)

• Increased stability

15 UNECE 2014Statistics Canada • Statistique Canada

Page 16: Lyne Guertin Census Data Processing and Estimation Section Social Survey Methods Division

Improvements (cont’d)

Multi-threading now possible in donor imputation

• Enables processing of multiple failed units at one time

• Increases performance and reduces processing time

16 UNECE 2014Statistics Canada • Statistique Canada

Page 17: Lyne Guertin Census Data Processing and Estimation Section Social Survey Methods Division

Improvements (cont’d)

CANCEIS is more user friendly

• Before: could handle only .txt files (inputs/outputs)

• Now: handling also data dictionaries in Excel and

creating summary reports in HTML

17 UNECE 2014Statistics Canada • Statistique Canada

Page 18: Lyne Guertin Census Data Processing and Estimation Section Social Survey Methods Division

Improvements (cont’d)

Increased content and level of detail in the logs

• Facilitate troubleshooting

• Facilitate validating desired strategy for each module

18 UNECE 2014Statistics Canada • Statistique Canada

Page 19: Lyne Guertin Census Data Processing and Estimation Section Social Survey Methods Division

New features added Additional flexibility in specifying imputation

parameters

New parameter to specify that the staged search will not stop until an excellent donor is found

• Continue to search if the target quality is not reached

19 UNECE 2014Statistics Canada • Statistique Canada

Page 20: Lyne Guertin Census Data Processing and Estimation Section Social Survey Methods Division

Modification to the 2011 E&I strategy

Group these five processes• Place of birth of parents• Immigration status• Aboriginal status• Citizenship • Visible minorities

into one ethnocultural process

20 UNECE 2014Statistics Canada • Statistique Canada

Page 21: Lyne Guertin Census Data Processing and Estimation Section Social Survey Methods Division

Modification to the 2011 E&I strategy (cont’d)

Goals:

• Increase data coherence between processes by

using one single donor to impute all variables

• Reduce manual fixes after E&I

Challenge: manage lots of edits & data

21 UNECE 2014Statistics Canada • Statistique Canada

Page 22: Lyne Guertin Census Data Processing and Estimation Section Social Survey Methods Division

22

3. Options considered for 2016

UNECE 2014Statistics Canada • Statistique Canada

Page 23: Lyne Guertin Census Data Processing and Estimation Section Social Survey Methods Division

23

Planning E&I strategy for 2016• Evaluating the use of administrative data as alternative

source of data• Exploring if the language processes could be grouped

(mother tongue, home language, official language)

• Exploring if steps within processes could be grouped

• Exploring if processes could be run in parallel

Goals improve quality, reduce processing time

UNECE 2014Statistics Canada • Statistique Canada

Page 24: Lyne Guertin Census Data Processing and Estimation Section Social Survey Methods Division

24

Continue improving CANCEIS to serve future requirements of the Census• Research and development ongoing

Done by programmers and methodologists

CANCEIS v5.2 to be released by Dec.2014

• Allowing DLTs and System Parameters in Excel

• Revisited contents of Inputs & Outputs

• Standardized naming convention

• Improvements to default values of parameters

UNECE 2014Statistics Canada • Statistique Canada

Page 25: Lyne Guertin Census Data Processing and Estimation Section Social Survey Methods Division

Will offer the CANVERT conversion tool• Ensures smooth transition from v5.1 to v5.2

Updated documentation will be provided• Basic User Guide (with two simple examples and basic

features)• Comprehensive User Guide (with more examples, and

all features)

25 UNECE 2014Statistics Canada • Statistique Canada

Page 26: Lyne Guertin Census Data Processing and Estimation Section Social Survey Methods Division

Merci!

For more information, Pour plus d'information,

please contact: veuillez contacter :

Lyne Guertin (1-613-951-4543)

[email protected]

Thank you for your attention!

26