20
Statistical Disclosure Control (SDC) at SURS Andreja Smukavec General Methodology and Standards Sector

Statistical Disclosure Control (SDC) at SURS Andreja Smukavec General Methodology and Standards Sector

Embed Size (px)

Citation preview

Page 1: Statistical Disclosure Control (SDC) at SURS Andreja Smukavec General Methodology and Standards Sector

Statistical Disclosure Control

(SDC) at SURS

Andreja SmukavecGeneral Methodology and Standards Sector

Page 2: Statistical Disclosure Control (SDC) at SURS Andreja Smukavec General Methodology and Standards Sector

Why is confidentiality protection needed?

• One of the fundamental principles of official statistics is that statistical information of data suppliers is strictly confidential, and is used only for statistical purposes.

• Legislation places a legal obligation on NSIs to protect data suppliers.

• Data suppliers should have confidence in the NSI to preserve the confidentiality of individual information – better quality of the collected data.

Page 3: Statistical Disclosure Control (SDC) at SURS Andreja Smukavec General Methodology and Standards Sector

National legislation• National Statistics Act

– Data published in aggregated form. – Data may be published individually if

• written consent of reporting units is obtained;• data are collected from public data collections; • data are published in such a way that the reporting units cannot be

directly identified.

– The Office or authorized producers shall transmit individual data to users on the basis of a written application.

• Other legislation – Personal Data Protection Act;

– …

Page 4: Statistical Disclosure Control (SDC) at SURS Andreja Smukavec General Methodology and Standards Sector

European legislation

• European Regulation (EC) No 223/2009– General definitions;– Chapter 5 – Statistical Confidentiality

• Access to confidential data for scientific purposes

• European Statistics Code of Practice- Principle 5: The confidentiality of the

information the data providers provide and its use only for statistical purposes are absolutely guaranteed.

Page 5: Statistical Disclosure Control (SDC) at SURS Andreja Smukavec General Methodology and Standards Sector

What does SDC cover at SORS?• Tabular data protection

– Publication– Eurostat and other institutions– Users‘ requests

• Microdata protection– Preparation of public-use files and scientific-

use files– Checking rules set up by Eurostat

• Output checking

Page 6: Statistical Disclosure Control (SDC) at SURS Andreja Smukavec General Methodology and Standards Sector

Tabular data protection

• Tables – aggregated data– Magnitude tables

Sum of quantitative variable of respondents, where respondents are grouped by categorical variables.

– Frequency tables

Number of respondents, where respondents are grouped by categorical variables.

Page 7: Statistical Disclosure Control (SDC) at SURS Andreja Smukavec General Methodology and Standards Sector

Tabular data protection at SURS

• Method Cell Suppression- Post-tabular method- Non-perturbative method (less information

available)- Implemented in Tau-Argus software (CASC

project)- The interval of possible values for each

sensitive cell is sufficiently large

Page 8: Statistical Disclosure Control (SDC) at SURS Andreja Smukavec General Methodology and Standards Sector

Tabular data protection

Page 9: Statistical Disclosure Control (SDC) at SURS Andreja Smukavec General Methodology and Standards Sector

Cell Suppression

• Sensitivity rules – defining unsafe cells– Threshold

The number of respondents in a cell is below a certain threshold value.

– Concentration rulesOne or two respondents are dominant.

– Group disclosureAll respondents in one cell have the same value for a sensitive variable.

Page 10: Statistical Disclosure Control (SDC) at SURS Andreja Smukavec General Methodology and Standards Sector

Cell Suppression• Secondary suppression

- Needed due to sums in the tables. The feasibility interval for each unsafe cell has to be wide enough.

- Optimisation problem -> LP-solver used (XPress, CPlex).

Page 11: Statistical Disclosure Control (SDC) at SURS Andreja Smukavec General Methodology and Standards Sector

Cell Suppression - Publication

Page 12: Statistical Disclosure Control (SDC) at SURS Andreja Smukavec General Methodology and Standards Sector

Microdata protection

• Microdata are deindividualized pieces of information for individual units (enterprises, persons, households).– no direct identifiers (ID numbers, TAX

numbers, name + address…)• Microdata files are available to our

researchers in the secure room and via remote access.

Page 13: Statistical Disclosure Control (SDC) at SURS Andreja Smukavec General Methodology and Standards Sector

Microdata protectionScientific-use file (SUF)

• Prepared for researchers• Signed contract• Usually sent by CD + password, has to be

destroyed after usage• More information (variables) available• Only unintentional disclosures are

protected

Page 14: Statistical Disclosure Control (SDC) at SURS Andreja Smukavec General Methodology and Standards Sector

Microdata protectionPublic-use file (PUF)

• Publicly available or after registration• Less information (variables) available• All microdata protection methods are NOT

usable (too complex for normal users)• Intentional disclosures are protected

Page 15: Statistical Disclosure Control (SDC) at SURS Andreja Smukavec General Methodology and Standards Sector

Microdata protection

• The goal of microdata protection is to make a safe microdata file, where– disclosure risk is low; – analyses done on a safe file have to give

results which are close or equal to results of analyses done on original data.

Page 16: Statistical Disclosure Control (SDC) at SURS Andreja Smukavec General Methodology and Standards Sector

Microdata protection methods, used at SURS

• Modifying original microdata file, done by– non-perturbative methods:

• global recoding;• top and bottom coding;• local suppression (not very usable for PUFs).

– some perturbative methods:• microaggregation;• rounding.

• Software packages Mu-Argus and R.

Page 17: Statistical Disclosure Control (SDC) at SURS Andreja Smukavec General Methodology and Standards Sector

Labour Force Survey - PUF

• Prepared for Social Data Archives (DwB project).

• We used Eurostat‘s rules for creating SUF and by method sampling created PUF (one third of original sample).

• We didn‘t use local suppression.• The quality of statistics used as parameters

for method sampling is ensured, other should be used with precaution.

Page 18: Statistical Disclosure Control (SDC) at SURS Andreja Smukavec General Methodology and Standards Sector

Output checking1. Researchers fill out our form after finishing

work.

2. An e-mail is sent to our common e-mail address [email protected].

3. One of the SDC methodologists checks the output. In case of disclosive data or incorrectly filled form, the researchers are contacted for additional information or to correct the output.

4. After the SDC methodologist agrees with the dissemination, the output is sent to the researcher by e-mail.

Page 19: Statistical Disclosure Control (SDC) at SURS Andreja Smukavec General Methodology and Standards Sector

Rules for output checking• Rule-of-thumb model

– Threshold N – all tabular and similar output should have at least N units.

– Dominance rule – the analysis should not be done on groups with a dominant unit.

– Maximum and minimum are usually not released (exception if they refer to more than one unit).

– 100% percentile is usually not released (maximum).

Page 20: Statistical Disclosure Control (SDC) at SURS Andreja Smukavec General Methodology and Standards Sector

Thank you for your attention!

[email protected]