22
Eurostat Statistical Disclosure Control

Eurostat Statistical Disclosure Control. Presented by Peter-Paul de Wolf, Statistics Netherlands (CBS)

Embed Size (px)

Citation preview

Eurostat

Statistical Disclosure Control

Presented by

• Peter-Paul de Wolf, • Statistics Netherlands (CBS)

Content

• Introduction• What’s the problem?

– Specific for business statistics

• Formalising the problem• What to do?

– Methods– Software

• Summary

Introduction

• General definition of confidential data:

Data can not be published “as is”

» By law (e.g. statistical law)» Sensitive data (what’s sensitive?)» Respondent considers it confidential» …

Introduction

• Physical protection– Entrance– Network

• Legal protection– Oath

• Statistical Disclosure Control– Protection of statistical output

What’s the problem?

Statistical output• Microdata

– Not often in case of business data– Obvious: each record represents a single respondent

• Tabular data– In business data often magnitude tables– Sometimes frequency tables– But: aggregated data?!?!?!?

• Cell value itself not sensitive:– All contributions are equal (1)

• Spanning variables– Indentifying, e.g. NACE, Region– Sensitive, e.g. “environmental offence”

(illegal dumping of waste, illegal fishing, oil spills, …)

What’s the problem (frequency table)

What’s the problem (frequency table)

Example: number of ship-owners

Environmental offenceRegion Yes No Total … A 9 0 9 ...

What’s the problem (frequency table)

Example: number of ship-owners

Environmental offenceRegion Yes No Total … B 14 2 16 ...

What’s the problem (frequency table)

Example: number of ship-owners

Environmental offenceRegion Yes No Total … C 1 1 2 ...

What’s the problem (magnitude table)Turnover (106 €) of instrument producing companies

Region A B C

TotalHarps 58 151 47 123 36 98 141 372

Organs 71 16 124 21 24 9 219 46

Pianos 92 5 157 2 59 1 308 8

Other 800 302 934 362 651 287 2385 951

Total 1021474 1262 508 770 395 3053 1377

What’s the problem (magnitude table)Turnover (106 €) of instrument producing companies

Region A B C

TotalHarps 58 151 47 123 36 98 141 372

Organs 71 16 124 21 24 9 219 46

Pianos 92 5 157 2 59 1 308 8

Other 800 302 934 362 651 287 2385 951

Total 1021474 1262 508 770 395 3053 1377

?

Formalising the problem

Suppose cell (Piano, A) consists of

Company X: 81106 €Company Y: 5106 €Other three: 2106 € eachTotal : 92106 €

92 – 5 = 87

is within 7.4%!

Formalising the problem

General, objective rules needed

• Threshold rule• Dominance rule or (n,k)-rule• p%-rule

p%-rule is favoured over (n,k)-rule and implies minimum of 3 contributors

What to do?

• Redesign table– Combine rows/columns– Define different categories

• Rounding• Add noise• Cell suppression

Cell suppression

Region A B C D Total

Harps 58 47 36 89 230

Organs 71 124 24 31 250

Pianos 92 157 59 28 336Other 800 934 651 742 3127Total 1021 1262 770 890 3943

Cell suppression

Region A B C D Total

Harps 58 47 36 89 230

Organs 71 124 24 31 250

Pianos 92 157 59 28 336Other 800 934 651 742 3127Total 1021 1262 770 890 3943

X

X

X

Cell suppression

Region A B C D Total

Harps 58 47 36 89 230

Organs 71 124 24 31 250

Pianos 92 157 59 28 336Other 800 934 651 742 3127Total 1021 1262 770 890 3943

X

X

X

X

X X

Cell suppression

Region A B C D Total

Harps 58 47 36 89 230

Organs 71 124 24 31 250

Pianos 92 157 59 28 336Other 800 934 651 742 3127Total 1021 1262 770 890 3943

X

X

X

XX

X

X

X X

Cell suppression

Region A B C D Total

Harps 58 47 36 89 230

Organs 71 124 24 31 250

Pianos 92 157 59 28 336Other 800 934 651 742 3127Total 1021 1262 770 890 3943

X

X

X

XX

X

X

X X

Software

Latest version can be found on

http://neon.vb.cbs.nl/casc

New Open Source versionavailable end 2014

Contact/info

• Glossary, handbook, project info– http://neon.vb.cbs.nl/casc

• Wiley book

[email protected]