32
WP. 46 Providing access to data and making microdata safe, experiences of the ONS Jane Longhurst Paul Jackson ONS

WP. 46 Providing access to data and making microdata safe, experiences of the ONS Jane Longhurst Paul Jackson ONS

Embed Size (px)

Citation preview

WP. 46 Providing access to data and making microdata

safe, experiences of the ONS

Jane LonghurstPaul Jackson

ONS

The Statistical Disclosure Control Problem

Original Data

Data Utility

Maximum Tolerable Risk

Accessed Data

No data

Disclosure Risk

Protecting and Providing Access to Microdata

Legal Issues

Policy Issues

The Data

Risk Assessment

Risk Management

Output

Test and

Evaluate

Protecting and Providing Access to Microdata

Legal Issues

Policy Issues

The Data

Risk Assessment

Risk Management

Output

Test and

Evaluate

Legal Issues

Legal Context

• No general statistics act

• No comprehensive business register

• No population register

• Registrations of Births, Marriages and Deaths are

public – including cause of death

• A system of common law

• An Information Commissioner – a privacy and an access

to information champion with court powers

• Data Protection / Human Rights

• Freedom of Information

Legal Issues

Legal Context continued :

•Business Surveys have statutory

protection

– But ONS has the lawful authority to

disclose identified business survey

data to any central government

department for any purpose, and any

local authority for their planning

purposes.

Legal Issues

Legal Context continued :

•Census records have statutory

protection

– But ONS has lawful authority to

disclose personal census information

to any person for statistical purposes.

Legal Issues

Legal Context continued :

•Household survey records are protected

by the civil “common law duty of

confidence.”

– But ONS has lawful authority to disclose

identifying household survey data to any

person where there is informed consent.

– And ONS survey pledges obtain consent for

disclosures of ‘detailed but anonymised

data’to any genuine researcher.

Legal Issues

Legal Context

This extraordinary authority to disclose

identifying microdata to certain persons,

departments and authorities only delays

the real issue –

– The access needs management – MRP

– When it is not ONS applying the SDC standards

for outputs, then someone else has to.

– Therefore usable standards and guidance are

essential

Legal Issues

Legal Context

•So when ONS has so many options,

how does it decide –

– i) who should have controlled access

under what conditions, and

– ii) what ONS or other users’ outputs

should look like.

So we need Policy

Protecting and Providing Access to Microdata

Legal Issues

Policy Issues

The Data

Risk Assessment

Risk Management

Output

Test and

Evaluate

Policy Issues

So we need Policy•National Statistics Code of Practice for the GSS•Protocol for data access and confidentiality

– A Confidentiality Guarantee, – National Statistics are guaranteed not likely to identify

an individual, assuming an intruder is prepared to use a proportionate amount of time, effort and expertise .

•Departmental policy– Variations according to considerations of :

• data source type• risk analysis and management• methodology• access / release options

Protecting and Providing Access to Microdata

Legal Issues

Policy Issues

The Data

Risk Assessment

Risk Management

Output

Test and

Evaluate

Protecting and Providing Access to Microdata

Legal Issues

Policy Issues

The Data

Risk Assessment

Risk Management

Output

Test and

Evaluate

Risk Assessment

•An element of disclosure risk comes from records that are unique in the sample and in a known population

•Several approaches to assessing the disclosure risk in microdata:– Disclosure risk scenarios– Variable checklist– Quantitative risk measures

Disclosure Risk Scenarios

•Identify possible situations where disclosure risk could occur

•Assumptions concerning prior knowledge of intruder and information available to him, e.g. private database, journalist, nosy neighbour

•Identify key variables - indirectly identifying variables

•Use this process to decide what needs to be protected against– can be complex– requires discussion and judgement

SDC Checklist for Microdata Release

•Level of geography•Ethnic classification•Detail of occupation•Visible variables •Traceable variables •Survey design •Dissemination

Quantitative Risk Assessment

•Recognised need for quantitative risk measures

•Research project initiated•Need for individual and global risk measures

•Problem for sample microdata is that population is an unknown parameter

•Different methods for estimating the disclosure risk measures– Heuristics– Probabilistic models

Probabilistic Modelling

•Estimate the disclosure risk based on natural assumptions about the distribution of the population

•Provides linked estimates of individual and global risk measures

•Research focused on– Model selection techniques– Robustness of estimates– Goodness of fit criteria

•Tested on ONS social surveys

Heuristics

•DIS/SUDA method consists of two elements– DIS - file level assessment of risk– SUDA - grades and orders records within a

file according to level of risk

•Provide variable and variable value contribution to the risk

•Implemented by ONS for 2001 Census SAR

Evaluation of Quantitative Risk Measures

•Simulate sample surveys from Census data

•Compare risk measures with true risk•Practical considerations•How to set thresholds•Incorporate risk measures into MRP decision process

Protecting and Providing Access to Microdata

Legal Issues

Policy Issues

The Data

Risk Assessment

Risk Management

Output

Test and

Evaluate

SDC for Microdata

•Perturbative methods– Record swapping– Adding noise

•Non-perturbative methods– Recoding– Suppression– Sub-sampling

•Mixed strategies•ONS mainly implements recoding•PRAM implemented for 2001 Census SAR

Access Options - SPECIALISTS

Data Laboratory

• Only government can use identifying business micro-data

• Identified census data is high risk• Hence the on-site lab and the

employment contracts• Only safe data can leave the

laboratory.• Approx 150 users/yr

Access Options - GOVERNMENT

Access Agreements in central and local government.

• UK is a devolved statistical system• ONS discloses identifying survey micro-

data to other government departments for statistics and research purposes– Users are professionals like us, subject to the

same Code of Practice, and the same laws.– We don’t screen for research validity– We don’t check outputs

• Approximately 300 disclosures of confidential micro-data every year

• No known breaches of confidentiality.

Access Options - RESEARCHFor the academic researchers, the UK Data Archive

• If it didn’t exist, we’d have to invent it.• All ONS household survey datasets are deposited with

UKDA– Year of birth, regional geography, all other variables (limited

coding)– Some large households removed

• Academic researchers and government departments can download the dataset upon signing a user license. Takes about an hour.– This year, 16,600 downloads have taken place. Each can have

up to 10 users in the institution….– ONS does not screen the license applications– ONS does not vet the research proposals– ONS does not check outputs– In place for 30 years now– No known instance of wrongful identification.

Access Options

The UK Data Archive, con’tBut this is not enough.• So ONS has now created the ‘Special

License’– Month of birth– Local authority geography– All households– Still access by downloading the data.

• ONS does check each Special License application– But not for valid research, only data needs,– And we still don’t check any outputs

Access Options - PUBLIC

For the Public, Freedom of Information

– ONS can only withhold microdata where its disclosure to an applicant would be likely to result, in :• A breach of any law it was collected under• An actionable breach of confidence• A breach of a data protection principle

– The Scottish Information Commissioner has instructed the Scottish Health Service to disclose to an applicant the counts of Leukaemia in under 14 yr olds by Ward (average ward population approx 4,000)• The table was all 1s and zeros – effectively microdata, and

‘safe‘.

Access Options

Are ONS access options and practices reasonable?

• They follow the constructs used by the Courts and Information Commissioners, in that policies are written in plain English

• Licensed academic users are, in 30 years of experience, not intruders. They are trusted colleagues – and like us they can make mistakes sometimes.

• Other civil service professionals are not intruders – they are as reliable and trustworthy as we are. They too have professional codes of conduct, ethics, and moral principles

• All statisticians and researchers need clear rules, and should be trusted to follow them.

Protecting and Providing Access to Microdata

Legal Issues

Policy Issues

The Data

Risk Assessment

Risk Management

Output

Test and

Evaluate

Protecting and Providing Access to Microdata

Legal Issues

Policy Issues

The Data

Risk Assessment

Risk Management

Output

Test and

Evaluate

OUTPUTS

• Whatever access privileges • Whatever research topic• Whoever you are• Outputs must be protected to the

same standards• Best research carried out when richest

microdata is made available to those that can be trusted to apply these standards for outputs