Linking, selecting cut-offs, and examining quality in the Integrated Data Infrastructure (IDI)
Laura O’SullivanStatistics New Zealandlaura.o’[email protected]
IAOS Vietnam October 2014
Outline
The Integrated Data Infrastructure (IDI)
Terminology
IDI linking• Near-exact and non-exact• Selecting cut-offs• Quality• Clerical review
Linking at Statistics New Zealand and at the Australian Bureau of Statistics
2
33
Business data
Education
TaxMigration
& movements
Student loans & allowances
Benefits
Person-centred dataPerson-centred data
Health & safety
JusticeFamilies
& households
Integrated Data Infrastructure (IDI)
Terminology
Data integration (aka Record linkage)
Deterministic linking
Probabilistic linking (Fellegi-Sunter theory)
WeightsRepresent the probability that two records are from the same person
4
Cut-offs
5
Quality
6
True positives False positives
False negatives True negatives
True matches Non matches
Unlinked
Linked
Near-exact and non-exact
First name and Last name agreement
Date of birth agreement
7
Data Insert Delete Replace Double Single Swap Append Truncate
A Robert Robert Robert Robert Robbert Robert Kat Katie
B Robiert Robrt Rovert Roobert Robert Robret Katie Kat
Data Replace Swap Transpose
A 04/08/1982 02/08/1982 02/08/1982
B 04/02/1982 20/08/1982 08/02/1982
Selecting the cut-off
8
Quality in the IDI
False positive rates• Sample from non-exact links• Assume near-exact links are true matches• Use proportional sampling
Non-exact rates • Monitoring
9
Clerical review
10
Dataset First names Last names Date of birth Sex
A Mary Louise Brown 04/11/1984 2B Mary Lou Hughes 04/11/1984 2
A link with two first names matching and different last name
Dataset Identifier First names Last names Date of birth SexA 12345 Owen Keyes 06/01/1951 1B 12345 - - 06/01/1951 1
A link with unique identifiers and missing name information in one dataset
Dataset First names Last names Date of birth SexA Holly Jessica Gordon 01/05/1940 2B Holly 01/05/1940 2
A link with missing name information and without unique identifiers
Statistics New Zealand and the Australian Bureau of Statistics
Statistics New Zealand Census to the Post-enumeration survey (PES)
Linking the longitudinal census
Australian Bureau of Statistics Linking projects using name and address
Census data enhancement project
11