17
The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical Data Confidentiality Work Session Helsinki, 5 7 October 2015

The Application for Statistical Processing at SURS · 2017-11-03 · The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The Application for Statistical Processing at SURS · 2017-11-03 · The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical

The Application for

Statistical Processing at

SURS

Andreja Smukavec, SURS

Rudi Seljak, SURS

UNECE Statistical Data Confidentiality Work Session

Helsinki, 5 – 7 October 2015

Page 2: The Application for Statistical Processing at SURS · 2017-11-03 · The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical

Old system

• Stove-pipe oriented production

– Ad-hoc solutions were developed for a

particular survey

• Survey methodologists‘ strive for

improvement was crucial

– “Our data are not confidential“

• Process metadata were not organized

– Difficulties when a survey methodologist

resigns

Page 3: The Application for Statistical Processing at SURS · 2017-11-03 · The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical

Renovation

• An internal project started in 2012

– IT, General Methodology and subject-matter

specialists

– Build a global solution appropriate for most of

the surveys

– Solution which covers most of the parts of

statistical production:

• Data validation

• Data editing and imputation

• Aggregation and standard error estimation

• Statistical disclosure control for tabular data

• Tabulation

Page 4: The Application for Statistical Processing at SURS · 2017-11-03 · The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical

Renewed system

• Generalised metadata driven application

– Database of process metadata

• MS Access -> ORACLE

• For each survey instance

– General SAS code

– GUI for process metadata

– Different microdata environments allowed,

just some basic rules for the structure of

microdata databases

• Ad hoc SAS program for preparation of

microdata

Page 5: The Application for Statistical Processing at SURS · 2017-11-03 · The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical

Schematic presentation of the

renewed system

Different microdata databases

General SAS

Ad -

Database of process

metadata

Metadata repository

Different kind of

output

… program program

Application for management

Data on tables and variables

Ad-hoc

Page 6: The Application for Statistical Processing at SURS · 2017-11-03 · The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical

Tabular data protection

1. Calculation of primary sensitivity for

seven types of statistics: number, total,

share, ratio, average…

– Threshold, p%-rule, (n,k)-dominance rule

– „Holding rule“ + sampling weights

– Zeroes unsafe

2. Secondary suppression applied in case

of sensitive statistics (number and total)

– SAS-Tool (Excel file with metadata, Tau

Argus, SAS macros)

Page 7: The Application for Statistical Processing at SURS · 2017-11-03 · The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical

Tabular data protection

• Results for each survey instance saved in

the database with statistics (ORACLE)

– Statuses for lower precision

– Confidentiality flags for the type of primary

and secondary suppression

• 3 types of tabulation (codelists)

– Excel format (the most user-friendly)

– plain text format (.tab,.hrc) for Tau-Argus

– plain text format (.csv) for PX-Edit (SURS’s

publication tool)

Page 8: The Application for Statistical Processing at SURS · 2017-11-03 · The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical

Tabulation & Tabular Data Protection

program

General SAS program

Database of process metadata

Caculation of statistics

Tabulation

Different microdata databases

Ad - hoc program

Tabular

protection

Output tables

General SAS program

Database with

statistics

Database of process metadata

Page 9: The Application for Statistical Processing at SURS · 2017-11-03 · The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical

Parameters for SDC in MetaSOP

Page 10: The Application for Statistical Processing at SURS · 2017-11-03 · The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical

Tabulation in MetaSOP

Page 11: The Application for Statistical Processing at SURS · 2017-11-03 · The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical

Processing in MetaSOP

Page 12: The Application for Statistical Processing at SURS · 2017-11-03 · The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical

Example of 3-dimensional

table After aggregation

CC_SI / Dim_2

Dim_3

TOT F O TOT TOT 1209943548 1.09E+09 1.23E+08

1 37700934.42 35625442 2075493 11 47110694.48 46417660 693034.1 2 733763444.2 6.62E+08 71456295 21 517712620.1 4.8E+08 37489998 22 161044502.5 1.1E+08 50837088 23 37903335.85 37783060 120275.8 24 343495995.1 2.86E+08 57438583

11 TOT 59283130.99 56199883 3083248 1 64428657.15 62453677 1974980 11 21989840.69 21609892 379948.2 2 69502173.33 67377101 2125073 21 13959568.67 13959569 - 22 338148.7639 338148.8 z 23 7911125.122 7911125 - 24 27886089.54 26016025 1870064

12 TOT 215349659.2 2.04E+08 11792968 1 5993635.356 5993635 - 11 2035728.954 2035729 - 2 55635358.28 54430511 1204847 21 146242216.3 1.43E+08 2783876 22 4164502.417 3872003 292499.2 23 38774447.75 34931862 3842585 24 42332750.72 37447112 4885639

21 TOT 176972728 1.76E+08 1323998 1 2248602.352 2248602 z 11 166013.5624 166013.6 z 2 372993785.9 3.69E+08 4134769 21 418831917.8 4.08E+08 10337323 22 29411096.08 29411096 z 23 56581.5975 56581.6 z 24 88244091.34 86483431 1760660

After use of SAS-Tool

CC_SI / Dim_2

Dim_3

TOT F O TOT TOT 1209943548 1.09E+09 1.23E+08

1 37700934.42 35625442 2075493 11 47110694.48 46417660 693034.1 2 733763444.2 6.62E+08 71456295 21 517712620.1 4.8E+08 37489998 22 161044502.5 1.1E+08 50837088 23 37903335.85 37783060 120275.8 24 343495995.1 2.86E+08 57438583

11 TOT 59283130.99 56199883 3083248 1 64428657.15 z z 11 21989840.69 z z 2 69502173.33 z z 21 13959568.67 13959569 -

22 338148.763 z z 23 7911125.122 7911125 - 24 27886089.54 z z

12 TOT 215349659.2 2.04E+08 11792968 1 5993635.356 5993635 - 11 2035728.954 2035729 - 2 55635358.28 54430511 1204847 21 146242216.3 1.43E+08 2783876 22 4164502.417 z z 23 38774447.75 z z

24 42332750.72 z z 21 TOT 176972728 1.76E+08 1323998

1 z z z 11 z z z 2 z z z 21 418831917.8 4.08E+08 10337323 22 29411096.08 z z

23 z z z 24 88244091.34 z z

Page 13: The Application for Statistical Processing at SURS · 2017-11-03 · The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical

New organization

• Old system:

– Every survey had its own programmer and its

own general methodologist

• Renewed system:

– General methodologist and IT expert

(„support team“) help the subject-matter

specialist to

• insert and edit the process metadata (except for

SDC) into the application

• run particular parts of the statistical process

Page 14: The Application for Statistical Processing at SURS · 2017-11-03 · The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical

Advantages

• The subject-matter personnel‘s skills

improve (higher quality of data)

• The process metadata can be changed

easily and the procedure can be repeated

in short time (flexibility)

• The rules for data processing are gathered

in one place (transparency)

Page 15: The Application for Statistical Processing at SURS · 2017-11-03 · The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical

Drawbacks

• High risk of syntax errors in the process of

the insertion of metadata expressions

• Subject-matter personnel has to learn

some new skills (SAS expressions)

• An error during the execution can cause

problem if the support team is busy or not

available

Page 16: The Application for Statistical Processing at SURS · 2017-11-03 · The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical

Challenges for the future

• Introduce the application successfully into

the production

– Adjusting to changes by the subject-matter

specialists

– Building a qualified support team

• Adding new functionalities

– Indices

– Secondary suppression for other types of

statistics

– GUI instead of the Excel file for the SAS - Tool

Page 17: The Application for Statistical Processing at SURS · 2017-11-03 · The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical

Thank you for attention.