View
64
Download
0
Category
Tags:
Preview:
DESCRIPTION
ALSPAC Record Linkage to External Databases. Andy Boyd ALSPAC, Social Medicine University of Bristol. The data sources and processes involved. The processes involved in linkage projects Overview of ALSPAC’s existing data linkage projects National Pupil DB & Geographic linkage as examples - PowerPoint PPT Presentation
Citation preview
ALSPAC Record Linkage to External Databases
Andy Boyd
ALSPAC, Social Medicine
University of Bristol
The data sources and processes involved
• The processes involved in linkage projects
• Overview of ALSPAC’s existing data linkage projects
• National Pupil DB & Geographic linkage as examples
• Data Availability & Linkage Problems
Processes involved in linkage projects
• Find the contact
• Ethics – informed consent and/or Section 60 support
• Data Security
• HM Revenue & Customs
• Creating a linkage data set
• Data QC checks
• Identifiers
• Formats and data ‘normalisation’
Processes involved in linkage projects cont…
• Who links the data?
• one of the two parties or an independent 3rd party
• Processing the data• Anonymity vs sufficient data for research
• Ages in Months & Years• First Half of Postcode• Recode unusual outcomes into wider categories
Major External DatabasesHealth related datasets
• Office National Statistics (ONS) Tracing
– Cancer Registry & GRO
• NSTS (NHS Strategic Tracing Service)
• Electronic antenatal & birth records
• PCT data (Exeter DB, My Quest)*
Non health Datasets
• National Pupil Database (DCSF, DIUS*, UCAS*)
• ALSPAC Schools Collection
• G.I.S Datasets (Geographic Information Systems)
• DWP*
• Home Office* * Linkage currently being investigated
National Pupil Database• Maintained by Dept. Children Schools &
Families• Covers all state maintained schools in England• Annual / now 3 time points, census• Data at school and pupil level• Key data include:
– Exam results– Attendance– Pupil demographics (including address, ethnicity,
Free School Meals, Special Educational Needs)– School Characteristics (pupil numbers, staff pupil
ratios)
NPD – How we did it
• 3rd party conducted match – The Fischer Trust – independent charity
• Provided data on the eligible cohort• ALSPAC & DCSF provided the following
linkage variables:– Surname, Forename, Familiar name– Date of Birth, Gender– Postcode, Previous Postcode & Postcode
accuracy flag– Current School (from ALSPAC data
collection)
NPD - Details• ALSPAC Cohort covers three academic years• We hold data on all YPs across these three
years – approx. 600,000 cases a year• Figures based on eligible cohort
17671 linked (86%)• Majority of unlinked cases thought to be in
private education (will be in NPD from KS4)
NPD - Advantages
• Covers all English state schools• Good match rate for eligible cohort• Regular updates• Access to ‘confidential’ variables• PLUG workshops provide good opportunities
to discuss data and solutions to problems
NPD - Problems• Central ID QC issues (a few duplicates)• Only applies to English state maintained until
KS4, then re-link – extra costs and bias until then
• Data collection method/standards varies from school to school
• Documentation (lack of)• Size of raw data, time consuming to process• Fixed time point census, doesn’t record all
school movements (especially annual census)
G.I.S Data• Spatial data held at many geographic levels• Geographies range in scale from 0.1 meters
to regional/national data• Tied together via postcode or grid reference
as central ID• Key data include:
– NSPD ( was All Fields Postcode Directory) - geo linking database
– Deprivation & Socio Economic indices (IMD, Townsend, Acorn)
– Census data
G.I.S – How we link cases to data
• Master file of Postcodes• Postcodes linked to grid
reference• Grid references of various
scales• PCs/GridRef mapped to:
– Electoral geographies– Census geographies
• Ethics:– We don’t generally identify
residence at PC or equivalent level
Ordinance Survey – The National Grid
G.I.S - Details• 50,000 ALSPAC address points, associated
with a date range which can then be linked to ALSPAC data collection
• Linkage examples:– Indices of multiple deprivation– Travel from home to
school patterns– Cancer rates and residential
distance from power lines
The geographic relation between household income and polluting factories – FoE 1999
G.I.S advantages
• Many data sets in public domain (or available through ‘athens’)
• Many geographies are broad enough to not identify cohort members
• National picture (some exclude Scotland)
G.I.S Problems
• Shifting geographies across time points• Royal Mail change postcodes• Postcode not precise enough in some cases• Postcode boundaries are not contiguous with
other geographic boundaries
Accuracy issues with analysis at postcode level
Address level Postcode level
Accuracy issues with analysis at postcode level
Address level Postcode level
Accuracy issues with analysis at postcode level
Address level Postcode level
Data Availability & Linkage Problems
• Cohort Data
• GIS Data
• GIS Ethics
Linkage problems with the cohort data
• Missing data– Especially problematic for the cases who
didn’t enrol in the original recruitment– Partners– 69 cases with no known birth outcome– Gaps in the address data
• However…– ONS matched 99.7% mothers, so we have
their old & new NHS numbers and cleaned data (original recruitment cases only)
Linkage problems we encounter• Many of the early records are paper based or
in varied formats.• Quality Control – ONS data returned to us
with 37 incorrect ALSPAC Ids• Unknown methods – No documentation from
ONS or Fischer regarding the quality of the match
• Lack of uniqueness in the ID (either duplicates or multiple IDs per case)
GIS Data Availability
• Collected as administrative resource• Not yet cleaned, documented and
presented to usual ALSPAC standards• Initiatives under way to validate and fill
gaps in record• Schools GIS data in the main not
processed• Aim to build into standard ALSPAC
resource
GIS Ethics• Postcode level or greater accuracy treated as
a personal identifier• Research proposals to use these data need
ALSPAC Law & Ethics Approval• Broader geographical data can be released in
normal manner• A two-stage process is used to collect and
process precise data
GIS Ethics
Step 1 – Postcodes (or full address) provided to researcher with unique collection ID with no other data attached
Step 2 – Researcher attaches their data and returns file to ALSPAC
Step 3 – ID converted to the appropriate collaborator ID, postcode data removed
Step 4 – Requested ALSPAC data added to the file and data sent to the researcher
Andy BoydA.W.Boyd@Bristol.ac.uk
Recommended