View
434
Download
0
Embed Size (px)
Citation preview
Research Outputs for small
areas 2017: analysis and
findings
This SlideShare highlights factors around the
differences between administrative data-
based population estimates (Research
Outputs) and official population estimates at
Output Area (OA) level
Please note that these Research Outputs are
NOT official statistics on the population
Administrative Data Census – Background
• The Administrative Data Census (ADC) Project is working to
assess whether the Government-stated ambition that “censuses
after 2021 will be conducted using other sources of data” can be
met.
• The work being undertaken by ADC is to produce population
estimates, household estimates and population and housing
characteristics using administrative data, to meet demands for
improved population statistics and as a possible alternative to
the census.
Research Outputs – Background
• In November 2016, the ADC Research Outputs publication
included population estimates at five-year age group and sex at
Lower Super Output Area (LSOA). ADC have also previously
published population estimates for local authorities.
• Using a Statistical Population Dataset (SPD) we anonymously
matched individual records across multiple data sources into
a single, coherent dataset that forms the basis for estimating
the population.
Research Outputs - Background
• Now we are producing administrative data population estimates
at Output Area (OA) using the same method.
• This analysis is based on comparisons of SPD V2.0 2016
estimates with official Small Area Population Estimates 2016
(SAPE), and SPD V2.0 2011 estimates with the 2011 OA level
census estimates.
• SAPE is the annual release of population estimates for
geographies below local authority (LA) level for England and
Wales, including Output Areas (OAs).
• We have also published detailed methodological information on
the production of these Research Outputs.
Useful links
• Administrative Data Research Outputs – Current release.
• Information on our methodology used to produce the Research
Outputs.
• Although we can explain some of the differences in the
estimates from the examples given here, we require local
knowledge and further research to help improve the
performance of the SPD estimates at OA level. We would
welcome your feedback – please email us at:
Constructing an SPD to produce the
population estimates
NHS Patient
Register (PR)DWP/HMRC Customer
Information
System (CIS)
HESA data
(students)
SPD population
estimates
Statistical Population Dataset – SPD V2.0
SPD
DWP Benefits data
– used to resolve
address conflicts
Patient
Demographic
Service (PDS)
“Address moves” –
used to resolve
address conflicts
English/Welsh School
Census
What is an OA?
• An Output Area (OA) is a geographic area forming part of a geographic
hierarchy designed to improve the reporting of small-area statistics in England
and Wales.
• These are aligned with LSOAs, which in turn fit within MSOA and LA
boundaries.
• OAs are designed to have a population size of between 100 and 625 people at
the time of the 2011 Census and a minimum of 40 households.
• LSOAs are designed to have a population size of between 1,300 and 3000
people at the time of the 2011 Census, or approximately 1,200 households
• MSOAs are designed to have a population size of between 3,000 – 15,000
people at the time of the 2011 Census, or approximately 6,000 households
• The total number of OAs is 171,372 in England and 10,036 in Wales (a total of
181,408).
• OAs are useful for detailed analysis of population and can also be aggregated to
produce statistics for bespoke geographies that users require.
OA analysis vs local authority analysis
• Advantages of OA analysis:
Increased granularity improves our understanding of issues that may
influence the accuracy of SPD methods
Differences in SPD population compared to official estimates can be
explained by specific issues that may be obscured at LA level
Can explain differences noted at higher geographic levels
OA level data can be aggregated to produce population data for
custom geographies
• Disadvantages of OA analysis:
Scale – over 181,000 OAs in England and Wales, division by age and
sex make for very large datasets
Data quality – OA level estimates are not National Statistics, and have
associated issues with reliability of characteristic data. This makes
comparisons of age/sex at this level difficult to interpret
Initial analysis: detecting differences
• Carried out at OA level – total, male and female populations.
• Of 181 thousand OAs only 23 thousand had a greater than +/-
10% difference between SPD V2.0 2011 and census estimates
• Identified OAs with the largest difference between SPD and
census, with extreme outliers (5th and 95th percentiles) for
further investigation.
• This found just 516 extreme outliers in male populations, 570
extreme outliers in female populations and 634 extreme outliers
in total populations.
• We investigated this minority of extreme OAs to understand why
these differences were occurring.
0%
2%
4%
6%
8%
10%
12%
14%
16%
18%
Pro
po
rtio
n o
f OA
s in
SP
D
OA distribution of difference between SPD estimates and census estimates 2011
Distribution of SPD V2.0 2011 estimates
vs census estimates for 2011
For approximately 87% of OAs the
difference between the census and
SPD estimates was within +/-10%, and
for 70% of OAs the difference was
within +/- 5%
SPD estimate lower than census estimate SPD estimate higher than census estimate
Extreme OAs were of the most
interest, as they are the most
different from official estimates.
SPD differences may be higher or
lower than the official estimate.
Initial analysis: geography of differences
• Several factors were identified which explain some of the differences
detected, but we can’t explain everything - local intelligence will help
to better understand!
• These were primarily areas with “institutional” populations: prisons,
military bases and private boarding schools. Additional data
sources for specific special populations could help resolve these
discrepancies.
• In subsequent slides, differences between census estimates and SPDs
are shown for males and females. Blue represents a difference in
male populations, while red represents female populations. Purples
show differences present in both male and female populations.
• Light shades highlight areas where the SPD population is lower than
the census, while dark shades are higher SPD populations than the
census.
What factors can help explain large
differences at OA level?
The largest differences between SPD and official estimates usually occur
in areas with communal establishments (CEs) such as armed forces
bases, prisons, boarding schools.
• People who live in these CEs tend to move around more than the rest
of the population, and these movements might not be recorded in
admin data.
• Also, when people move this might be recorded after the event, with
people registering and de-registering from admin data later. This will
lead to higher differences between the SPD and official estimates in
areas with high population churn.
• Boundary issues can also affect OAs - a population can be spread
across two OAs (e.g. a CE that crosses a boundary from one OA to
another), but the population may be counted into one. This can lead to
unexpected population numbers in adjacent OAs.
Home Armed Forces (HAF) personnel
• Armed forces personnel may register with providers of medical care other than
the NHS, such as on-base doctors.
• In these cases, individuals may not be matched on two forms of administrative
data and so are absent from an SPD, being present only on the CIS, but not on
PR and rarely present on HESA .
• This is likely to be more exaggerated in the male portion of the population, due
to the demographic profile of the armed forces (mainly males under age 45). It
could lead to a lower SPD population in armed forces areas when compared
with the census.
• However, to make up for this gap military populations are added in an
aggregate adjustment after the population has been calculated using admin
data. This may lead to double counting the military population if they are
already on administrative data sources.
• Alternatively, in areas where bases encourage personnel to reregister with a
local GP and there is high population turnover, admin data may be out of date
(lagged). This can inflate SPD numbers as personnel who have left the area
are retained if they fail to reregister at their new address.
HAF example: Sandhurst
• SPD V2.0 2011 male
population higher than census
in an OA within the boundaries
of the Royal Military Academy
(red hashed area).
Difference could be due to:
1. Undercount of armed forces personnel in
census
2. Failure of personnel to reregister after
moving out of the area, leading to inflated
numbers in administrative data
3. Double counting of personnel who are
present in the administrative data and
added in the military adjustment
Our SPD estimate is based on
those people included in two
admin sources.
Contains National Statistics data © Crown copyright and
database right 2017
Contains OS data © Crown copyright and database right 2017
Map data copyrighted OpenStreetMap contributors and
available from https://www.openstreetmap.org
HAF example: Tidworth Garrison
• The light red OAs indicate a lower
female SPD population than census.
These areas seem to be housing for the
families of military personnel. Some
bases allow dependents to use on base
medical facilities. In these cases,
dependents may not be present on two
administrative data sources and be
missing from SPD V2.0
Contains National Statistics data © Crown copyright and
database right 2017
Contains OS data © Crown copyright and database right 2017
Map data copyrighted OpenStreetMap contributors and
available from https://www.openstreetmap.org
HAF example: Tidworth Garrison
• SPD male population higher than
census in three OAs within the
boundaries of Tidworth Garrison.
• Difference could be due to:
1. Underestimate of personnel in census
2. Failure of personnel to reregister after
moving out of the area, leading to
inflated numbers in administrative data
3. Double counting as personnel may be
present in administrative data and
added in the armed forces adjustment
• The light blue area has a lower male
SPD population than census. This area
contains Tedworth House, the recovery
centre run by Help for Heroes.
Treatment may not be provided by the
NHS, so those personnel present may
not be on the PR.
Contains National Statistics data © Crown copyright and
database right 2017
Contains OS data © Crown copyright and database right 2017
Map data copyrighted OpenStreetMap contributors and
available from https://www.openstreetmap.org
Foreign Armed Forces (FAF) personnel
• Several military bases in the UK are populated by American
personnel and their dependents.
• These bases are often self contained, with their own medical
provision.
• Military personnel and their dependents may be missing from
one or more administrative sources.
• This may lead to differentiation between SPD V2.0 and official
estimates in these areas.
FAF example: RAF Lakenheath and
Mildenhall
• Both RAF Lakenheath and RAF Mildenhall have
US Foreign Armed Forces present (red hashed
areas).
• These are effectively self-contained bases, with
housing, medical care and schooling provided on
site.
• SPD V2.0 2011 population estimates for these
OAs have lower numbers of females compared
with the census estimates.
• The SPD military adjustment adds in the
military population, who are unlikely to be
present in administrative data. Dependents are
not included in the adjustment.
• Most FAF dependents will not be in
administrative data as they use on site services.
They are also unlikely to be present on CIS.
Therefore they are missing from SPD V2.0
RAF
Mildenhall
RAF
Lakenheath
Contains National Statistics data © Crown copyright and
database right 2017
Contains OS data © Crown copyright and database right 2017
Map data copyrighted OpenStreetMap contributors and
available from https://www.openstreetmap.org
Private boarding schools
• The SPD V2.0 2011 uses data from the School Census to
improve accuracy for school-age children.
• This acts as a second data source with which to match PR or
CIS data.
• Pupils at some private and independent schools are not
included in the School Census. In these cases, a child would
need to be present in the PR and CIS to be counted in SPD
V2.0. Areas with such schools are likely to have a lower SPD
population than the official estimate.
Private boarding schools example:
Winchester
• SPD V2.0 2011 males lower than census
in the OA containing Winchester College -
an all boys private boarding school
• Differences could be due to:
children in these OAs being registered in
administrative data at their parents
home address
• SPD V2.0 has less females than census in
the OA containing St Swithun’s School -
an all girls private boarding school
• Differences could be due to:
children in these OAs being registered in
administrative data at their parents
home address
Contains National Statistics data © Crown copyright and
database right 2017
Contains OS data © Crown copyright and database right 2017
Map data copyrighted OpenStreetMap contributors and
available from https://www.openstreetmap.org
Prisons
• Institutions such as prisons, which house special populations,
often make use of health services independent of the NHS for
healthcare provision.
• Prisoners may not have an up to date address in the PR,
leading to their incorrect placement in their last registered
address.
• This will manifest in SPD V2.0 as lower populations compared
with official estimates in prison OAs, as official estimates include
aggregate counts of people in prison. In future, making
adjustments for people in prison could help improve SPDs.
Prison example: Falfield
• SPD V2.0 2011 male
population lower than census
in the OA containing HMP
Leyhill
- a category D men’s prison
• SPD V2.0 2011 female
population lower than census
in the OA containing HMP
Eastwood Park
- a closed women’s prison
(roughly equivalent to a
Category A/B/C men’s
prison) Contains National Statistics data © Crown copyright and
database right 2017
Contains OS data © Crown copyright and database right 2017
Map data copyrighted OpenStreetMap contributors and
available from https://www.openstreetmap.org
Prison example: Falfield
• Differences in this area are
likely a result of prisoners
being counted at their
previous address if PR and
CIS data are not updated to
reflect their time spent in
prison
Contains National Statistics data © Crown copyright and
database right 2017
Contains OS data © Crown copyright and database right 2017
Map data copyrighted OpenStreetMap contributors and
available from https://www.openstreetmap.org
Student/graduate areas
• Student populations should be captured in the SPD through a
combination of the PR, CIS and Higher Education Statistics
Agency (HESA) data.
• Students are often encouraged to register with a GP as part of
their induction to university – but often fail to update address
information if they subsequently move.
• After completion of studies, graduates (being young and
healthy) often fail to interact with health services. This means
that their movement out of university areas can be missed –
causing SPD inflation in university towns. In future, linkage
between admin data sources or across time should help better
understand movements of students.
Student/graduate example: Oxford
• SPD V2.0 2011 for Oxford is
higher than official estimates
for several OAs.
• Probably due to the multitude
of Oxford Colleges present in
the city centre.
• Students will register at these
addresses in their first year –
on HESA and PR.
• But are unlikely to update their
address for subsequent moves.Contains National Statistics data © Crown copyright and
database right 2017
Contains OS data © Crown copyright and database right 2017
Map data copyrighted OpenStreetMap contributors and
available from https://www.openstreetmap.org
Student/graduate example: Oxford
• This will lead to a “lag” where the
SPD population will represent
those students living in the OAs at
present, but also some of those
who have lived in the areas
previously, but have since moved.
• This is usually more extreme for
males, but in Oxford, both males
and females have been flagged
as extreme outliers for this type
of difference (hence the purple
colouring) .
Contains National Statistics data © Crown copyright and
database right 2017
Contains OS data © Crown copyright and database right 2017
Map data copyrighted OpenStreetMap contributors and
available from https://www.openstreetmap.org
Student/graduate example: Norwich
• SPD V2.0 2011 estimates for the OA
containing the University of East Anglia
are lower than official estimates.
• The majority of UEA student halls are
present on campus.
• Lower SPD estimates may reflect a
greater propensity of student at the UEA
not to register with a doctor. Whilst they
are likely to be present on a HESA
dataset, they may not have a matching
second data source.
• This is the case for both the male and the
female population of the OA (light purple
colouring).Contains National Statistics data © Crown copyright and
database right 2017
Contains OS data © Crown copyright and database right 2017
Map data copyrighted OpenStreetMap contributors and
available from https://www.openstreetmap.org
Summary of difference in OA
populations, 2011
• Amongst the top 10 OAs with less males in SPD V2.0 than census
the likely causes are limitations of the admin data for prisons,
student accommodation. Although areas with military personnel
may have less males in SPD V2.0, only one area with army
barracks appears in these top 10 OAs
• Amongst the top 10 OAs with more males in SPD V2.0 than census
the difference relates to student accommodation and military
bases.
• Amongst the top 10 OAs with less females in SPD V2.0 student
accommodation is the likely explanation. This is also true of the top
10 areas with more females than census.
• In previous SPD analysis seasonal workers were identified as an
issue in LSOA level data, as they may be present in administrative
data but are no longer resident (causing inflated SPD estimates).
• In the OA level analysis, none of the most extreme outliers were
due to the seasonal worker effect.
Distribution of SPD V2.0 2016 estimates vs
Small Area Population Estimates for 2016
For approximately 88% of OAs the
difference between the SAPE and
SPD estimates was within +/-10%
This pattern is very similar to the SPD
for 2011 compared with census
estimates.
SPD estimate lower than official SAPE estimate SPD estimate higher than official SAPE estimate
Extreme OAs were of the most
interest, as they are the most
different from official estimates - the
distribution of extremes for 2016
differs from that seen in the 2011
analysis
SPD differences may be higher or
lower than the official estimate.
Changes in OA extreme outliers from
2011 to 2016
• 754 OAs were identified as extreme outliers when comparing
the SPD V2.0 2016 with the 2016 SAPE.
• Of the 634 extreme outlier OAs detected in the 2011
comparison, 294 of these were also present in the outliers
dataset for the 2016 comparison.
• This means that 340 of the extreme OAs from 2011 became
less extreme by 2016.
• Furthermore, 460 OAs were present in the extreme outlier list for
2016 that were not extreme outliers in 2011.
• Interpretation of these changes is challenging due to the wide
range of possible causes.
Changes in extreme outliers
634 extreme
outliers754 extreme
outliers
294 matching
extreme
outliers
17 shifted to
overcounts from
undercounts
31 shifted to
undercounts from
overcounts
460 OAs entered the most
extreme in 16
340 dropped out of
the most extreme
between 2011 and
2016
SPD V2.0 2011 outliers SPD V2.0 2016 outliers
Interpretation of changes in extreme
outliersIt should be noted that reasons for changes between the 2011 and 2016
SPD estimates are complex. Interpretation should be undertaken with
caution, as differences may be due to:
• intercensal drift in SAPE – accumulation of error in the mid-year
estimation process
• changes in SAPE reflecting shifts in the population on the ground that
aren’t reflected in administrative data
• differences in the quality of administrative data over time - not reflecting
the population on the ground (e.g. list inflation)
• change in administrative data reflecting changes in population on the
ground that aren’t reflected in SAPE
• any combination of the above
• having extreme values in both years may reflect different types of extreme
behaviour – e.g. overestimate in 2011 and overestimate in 2016
SPD V2.0 2011 to SPD V2.0 2016
change: interpretation
• Some of the most extreme changes in OA population from the
SPDV2.0 2011 are in areas with large building
developments, such as the Olympic Village in Newham.
• This is due to people moving to the areas and updating their
admin records – leading to them being correctly placed in their
new OA.
• University areas where populations are increased, but no new
development has occurred are likely due to some student
moves being missed after completing their studies.
• While we can detect some of these, changes local intelligence is
essential – if you have comments, please email us: