12
Linking, selecting cut-offs, and examining quality in the Integrated Data Infrastructure (IDI) Laura O’Sullivan Statistics New Zealand laura.o’[email protected] IAOS Vietnam October 2014

Linking, selecting cut-offs, and examining quality in the Integrated Data Infrastructure (IDI)

Embed Size (px)

DESCRIPTION

Linking, selecting cut-offs, and examining quality in the Integrated Data Infrastructure (IDI). Laura O’Sullivan Statistics New Zealand laura.o’[email protected]. IAOS Vietnam October 2014. Outline. The Integrated Data Infrastructure (IDI) Terminology IDI linking - PowerPoint PPT Presentation

Citation preview

Page 1: Linking, selecting cut-offs, and examining quality in the Integrated Data Infrastructure (IDI)

Linking, selecting cut-offs, and examining quality in the Integrated Data Infrastructure (IDI)

Laura O’SullivanStatistics New Zealandlaura.o’[email protected]

IAOS Vietnam October 2014

Page 2: Linking, selecting cut-offs, and examining quality in the Integrated Data Infrastructure (IDI)

Outline

The Integrated Data Infrastructure (IDI)

Terminology

IDI linking• Near-exact and non-exact• Selecting cut-offs• Quality• Clerical review

Linking at Statistics New Zealand and at the Australian Bureau of Statistics

2

Page 4: Linking, selecting cut-offs, and examining quality in the Integrated Data Infrastructure (IDI)

Terminology

Data integration (aka Record linkage)

Deterministic linking

Probabilistic linking (Fellegi-Sunter theory)

WeightsRepresent the probability that two records are from the same person

4

Page 5: Linking, selecting cut-offs, and examining quality in the Integrated Data Infrastructure (IDI)

Cut-offs

5

Page 6: Linking, selecting cut-offs, and examining quality in the Integrated Data Infrastructure (IDI)

Quality

6

True positives False positives

False negatives True negatives

True matches Non matches

Unlinked

Linked

Page 7: Linking, selecting cut-offs, and examining quality in the Integrated Data Infrastructure (IDI)

Near-exact and non-exact

First name and Last name agreement

Date of birth agreement

7

Data Insert Delete Replace Double Single Swap Append Truncate

A Robert Robert Robert Robert Robbert Robert Kat Katie

B Robiert Robrt Rovert Roobert Robert Robret Katie Kat

Data Replace Swap Transpose

A 04/08/1982 02/08/1982 02/08/1982

B 04/02/1982 20/08/1982 08/02/1982

Page 8: Linking, selecting cut-offs, and examining quality in the Integrated Data Infrastructure (IDI)

Selecting the cut-off

8

Page 9: Linking, selecting cut-offs, and examining quality in the Integrated Data Infrastructure (IDI)

Quality in the IDI

False positive rates• Sample from non-exact links• Assume near-exact links are true matches• Use proportional sampling

Non-exact rates • Monitoring

9

Page 10: Linking, selecting cut-offs, and examining quality in the Integrated Data Infrastructure (IDI)

Clerical review

10

Dataset First names Last names Date of birth Sex

A Mary Louise Brown 04/11/1984 2B Mary Lou Hughes 04/11/1984 2

A link with two first names matching and different last name

Dataset Identifier First names Last names Date of birth SexA 12345 Owen Keyes 06/01/1951 1B 12345 - - 06/01/1951 1

A link with unique identifiers and missing name information in one dataset

Dataset First names Last names Date of birth SexA Holly Jessica Gordon 01/05/1940 2B Holly 01/05/1940 2

A link with missing name information and without unique identifiers

Page 11: Linking, selecting cut-offs, and examining quality in the Integrated Data Infrastructure (IDI)

Statistics New Zealand and the Australian Bureau of Statistics

Statistics New Zealand Census to the Post-enumeration survey (PES)

Linking the longitudinal census

Australian Bureau of Statistics Linking projects using name and address

Census data enhancement project

11

Page 12: Linking, selecting cut-offs, and examining quality in the Integrated Data Infrastructure (IDI)

Thank you for listening

Questions

laura.o’[email protected]

12