42
www.statcan.gc.ca Automated geocoding using postal codes An overview of the PCCF-SLI and PCCF+ Michael Tjepkema Saeeda Khan Health Analysis Division, Statistics Canada April 15, 2015

Automated geocoding using postal codes · 4/15/2015  · •Several geocoding scenarios are possible 1. Only postal codes available • Use PCCF-SLI or PCCF+ to assign geographic

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Automated geocoding using postal codes · 4/15/2015  · •Several geocoding scenarios are possible 1. Only postal codes available • Use PCCF-SLI or PCCF+ to assign geographic

www.statcan.gc.ca

Automated geocoding using postal codes

An overview of the PCCF-SLI and PCCF+

Michael Tjepkema

Saeeda Khan Health Analysis Division, Statistics Canada

April 15, 2015

Page 2: Automated geocoding using postal codes · 4/15/2015  · •Several geocoding scenarios are possible 1. Only postal codes available • Use PCCF-SLI or PCCF+ to assign geographic

Overview

1. Introduction to the PCCF and PCCF+

• Uses of small-area data

2. Components of a postal code

• SLI geocoding versus population-weighting

3. Typical case strategy

• Pitfalls of automated geocoding

4. Why PCCF+?

4/15/2015 Statistics Canada • Statistique Canada 2

Page 3: Automated geocoding using postal codes · 4/15/2015  · •Several geocoding scenarios are possible 1. Only postal codes available • Use PCCF-SLI or PCCF+ to assign geographic

Key points of presentation

• Consider using PCCF+ rather than PCCF-SLI if any of the following apply

• You want to use variables present on the PCCF+ which are not present in regular PCCF

• Your file is less than perfect with respect to postal codes

• You want help to evaluate the quality of the postal code on your data file

• The “vintage” of the postal codes on your file spans more than one census

• You want to do better coding in rural areas

• If your file includes postal codes used by residents of “incompletely enumerated Indian Reserves”

4/15/2015 Statistics Canada • Statistique Canada 3

Page 4: Automated geocoding using postal codes · 4/15/2015  · •Several geocoding scenarios are possible 1. Only postal codes available • Use PCCF-SLI or PCCF+ to assign geographic

1. Introduction to the PCCF and PCCF+

4/15/2015 Statistics Canada • Statistique Canada 4

Page 5: Automated geocoding using postal codes · 4/15/2015  · •Several geocoding scenarios are possible 1. Only postal codes available • Use PCCF-SLI or PCCF+ to assign geographic

Introductory remarks

• Postal codes are part of most administrative data sets

• PCCF, PCCF+, and related tools are now the standard

• Allows for the conversion of address and postal code attributes to standard geographical codes

• Used in data collection, processing, and analysis

• Resulting small-area geography have a variety of uses

• Familiarity with the methods, strengths, and limitations will help researchers exploit the potential

4/15/2015 Statistics Canada • Statistique Canada 5

Page 6: Automated geocoding using postal codes · 4/15/2015  · •Several geocoding scenarios are possible 1. Only postal codes available • Use PCCF-SLI or PCCF+ to assign geographic

What is the PCCF?

• A flat file produced by STC that links between postal codes and geographic areas

• Allows for:

• Association of postal codes to standard geographic areas

• Selection of statistical units by geographic areas

• Provides linkages (including an SLI) to block-face, dissemination block, and dissemination area

• However, some postal codes are only linked to post office locations, many serve multiple DAs, and some are non-residential (government offices, etc)

4/15/2015 Statistics Canada • Statistique Canada 6

Statistics Canada. Postal Codes Conversion File (PCCF), Reference Guide. Catalogue no. 92-153-G, no 02. Ottawa, ON: Statistics Canada, 2011.

Page 7: Automated geocoding using postal codes · 4/15/2015  · •Several geocoding scenarios are possible 1. Only postal codes available • Use PCCF-SLI or PCCF+ to assign geographic

What is PCCF+?

• A SAS control program, reference files derived from the PCCF, and a postal code population-weight file

• Assigns geographic identifiers based on postal codes

• Postal codes for rural areas are assigned to DA & DB using population-weighted random allocation

• Able to assign geographic coding from firth 5, 4, 3 characters of the postal code, as well as from all 6.

• Full diagnostic output permits resolution of results for potentially troublesome postal codes

• Provides residential and institutional coding separately

4/15/2015 Statistics Canada • Statistique Canada 7

Wilkins R, Peters PA. PCCF+ Version 5K User’s Guide: Automated geocoding based on the Statistics Canada Postal Code Conversion File.

Catalogue no. 82F0086-XDB. Ottawa, ON: Statistics Canada, 2011.

Page 8: Automated geocoding using postal codes · 4/15/2015  · •Several geocoding scenarios are possible 1. Only postal codes available • Use PCCF-SLI or PCCF+ to assign geographic

Uses of the PCCF and PCCF+

• A 2011 literature review for publications using the PCCF and PCCF+ resulted in 622 publications

• Health Sciences 463 (74%)

• Social Sciences & Economics 93 (15%)

• Education, data, & statistics 34 (6%)

• Natural & applied sciences 12 (2%)

• Other 20 (3%)

• Articles appeared in 233 different journals, with CMAJ (23) and CJPH (19) the top two journals

4/15/2015 Statistics Canada • Statistique Canada 8

Peller P. An analysis of the Postal Code Conversion File’s use in research. DLI research paper series, 2011. Calgary, AB: University of Calgary.

Page 9: Automated geocoding using postal codes · 4/15/2015  · •Several geocoding scenarios are possible 1. Only postal codes available • Use PCCF-SLI or PCCF+ to assign geographic

Uses of small area data

• Add policy relevance by aggregating to admin areas

• Health Regions, School Districts, etc…

• Deal with changes over time (boundary shifts)

• Assign neighbourhood SES and other confounders

• Determine point-distance, road distance, travel time

• Allow for studies of migration over time (longitudinal)

• Help in the imputation of missing data

• Obtain additional identifiers for record linkage

4/15/2015 Statistics Canada • Statistique Canada 9

Page 10: Automated geocoding using postal codes · 4/15/2015  · •Several geocoding scenarios are possible 1. Only postal codes available • Use PCCF-SLI or PCCF+ to assign geographic

2. Components of a Postal code

4/15/2015 Statistics Canada • Statistique Canada 10

Page 11: Automated geocoding using postal codes · 4/15/2015  · •Several geocoding scenarios are possible 1. Only postal codes available • Use PCCF-SLI or PCCF+ to assign geographic

Components of a postal code

• The postal code is a six-character code defined and maintained by Canada Post Corporation for the purpose of sorting and delivering mail

• Postal codes are not geographic attributes

• Only spatial in that mail is delivered by geographic area

• Six character code ‘ANA NAN’

• First 3 – Forward Sortation Area (FSA)

• Last 3 – Local Delivery Unit (LDU)

4/15/2015 Statistics Canada • Statistique Canada 11

Statistics Canada. Postal Codes Conversion File (PCCF), Reference Guide. Catalogue no. 92-153-G, no 02. Ottawa, ON: Statistics Canada, 2011.

Page 12: Automated geocoding using postal codes · 4/15/2015  · •Several geocoding scenarios are possible 1. Only postal codes available • Use PCCF-SLI or PCCF+ to assign geographic

What is a postal code?

4/15/2015 Statistics Canada • Statistique Canada 12

ANA NAN Forward Sortation Area

Local Delivery Unit

Province / Territory / Region First Character

Newfoundland and Labrador A

Nova Scotia B

Prince Edward Island C

New Brunswick E

Eastern Québec G

Metropolitan Montréal H

Western Québec J

Eastern Ontario K

Central Ontario L

Metropolitan Toronto M

Southwestern Ontario N

Northern Ontario P

Manitoba R

Saskatchewan S

Alberta T

British Columbia V

Northwest Territories and Nunavut X

Yukon Y

if 0 then rural if 1-9 then urban

Page 13: Automated geocoding using postal codes · 4/15/2015  · •Several geocoding scenarios are possible 1. Only postal codes available • Use PCCF-SLI or PCCF+ to assign geographic

Components of a postal code

4/15/2015 Statistics Canada • Statistique Canada 13

Page 14: Automated geocoding using postal codes · 4/15/2015  · •Several geocoding scenarios are possible 1. Only postal codes available • Use PCCF-SLI or PCCF+ to assign geographic

Components of a postal code

• Local Delivery Unit (LDU)

• Letter carrier delivery to ordinary urban address

• Community mailbox

• Apartment building

• Business building

• Large firm or organisation (CBC: M5W 1E6)

• Federal department or agency (Statistics Canada: K1A 0T6)

• Mail delivery route (suburban, rural, or mobile)

• General delivery and post office boxes (large or small)

4/15/2015 Statistics Canada • Statistique Canada 14

Statistics Canada. Postal Codes Conversion File (PCCF), Reference Guide. Catalogue no. 92-153-G, no 02. Ottawa, ON: Statistics Canada, 2011.

Page 15: Automated geocoding using postal codes · 4/15/2015  · •Several geocoding scenarios are possible 1. Only postal codes available • Use PCCF-SLI or PCCF+ to assign geographic

Importance of Identifying Non-residential PCs

• In the following cases, we may not know much about the true place of residence, which could be any place in the CMA (or even further out)

• Government Offices, e.g., Statistics Canada

• Coroners Offices

• Children’s Aid Societies

• Hospitals in a Birth File

• UPS Store, Mailboxes Etc,

4/15/2015 Statistics Canada • Statistique Canada 15

Page 16: Automated geocoding using postal codes · 4/15/2015  · •Several geocoding scenarios are possible 1. Only postal codes available • Use PCCF-SLI or PCCF+ to assign geographic

Components of a postal code

Haydu G. The Postal Code – Geographic classification code conversion file, a tool for social science research. Paper presented at the

1979 annual meeting of the Canadian Association of Geographers, Victoria, BC, Canada.

4/15/2015 Statistics Canada • Statistique Canada 16

Page 17: Automated geocoding using postal codes · 4/15/2015  · •Several geocoding scenarios are possible 1. Only postal codes available • Use PCCF-SLI or PCCF+ to assign geographic

Single-link (PCCF-SLI) vs. PCCF+

• PCCF-SLI forces each postal code to be assigned to a single DA & DB, regardless of how large the actual service area may be

• For most research purposes, the distribution of the population across the entire service area is needed

• PCCF+ uses a population-weighted method of geocoding where multiple-matches are possible

• As such, the distribution of respondents more accurately reflects the underlying population

• “Numerator-denominator consistency”

4/15/2015 Statistics Canada • Statistique Canada 17

Page 18: Automated geocoding using postal codes · 4/15/2015  · •Several geocoding scenarios are possible 1. Only postal codes available • Use PCCF-SLI or PCCF+ to assign geographic

Population assignment using PCCF-SLI

4/15/2015 Statistics Canada • Statistique Canada 18

Saskatchewan

Manitoba

Alberta

Page 19: Automated geocoding using postal codes · 4/15/2015  · •Several geocoding scenarios are possible 1. Only postal codes available • Use PCCF-SLI or PCCF+ to assign geographic

Population assignment using PCCF+

4/15/2015 Statistics Canada • Statistique Canada 19

Saskatchewan

Manitoba

Alberta

Page 20: Automated geocoding using postal codes · 4/15/2015  · •Several geocoding scenarios are possible 1. Only postal codes available • Use PCCF-SLI or PCCF+ to assign geographic

Population assignment via PCCF-SLI & PCCF+

4/15/2015 Statistics Canada • Statistique Canada 20

Geographic Unit PCCF-SLI PCCF+

# of Units Percent of Population

# of Units Percent of Population

DA 8,476 2.9 187 0

CT 73 0.1 7 0

CMA .. .. .. ..

CSD 1,438 0.6 109 0

CD .. .. .. ..

Percent of total 2006 census population in areas with no respondent assignment

Page 21: Automated geocoding using postal codes · 4/15/2015  · •Several geocoding scenarios are possible 1. Only postal codes available • Use PCCF-SLI or PCCF+ to assign geographic

Population assignment using PCCF-SLI

4/15/2015 Statistics Canada • Statistique Canada 21

Gatineau

Ottawa

Page 22: Automated geocoding using postal codes · 4/15/2015  · •Several geocoding scenarios are possible 1. Only postal codes available • Use PCCF-SLI or PCCF+ to assign geographic

Population assignment using PCCF+

4/15/2015 Statistics Canada • Statistique Canada 22

Gatineau

Ottawa

Page 23: Automated geocoding using postal codes · 4/15/2015  · •Several geocoding scenarios are possible 1. Only postal codes available • Use PCCF-SLI or PCCF+ to assign geographic

Population miss-assignment using PCCF-SLI & PCCF+

4/15/2015 Statistics Canada • Statistique Canada 23

Geographic Unit PCCF PCCF+

% of total population % of total population

DA 37.4 7.6

CT 6.6 1.4

CMA 4.3 0.1

CSD 11.4 2.7

CD 1.1 0.3

Comparison of population coding errors using PCCF-SLI versus PCCF+ (5J)*

* Population coding errors are defined as the sum over all areas at this geographic level of the absolute value of the population coded less the population known from the census sample, expressed as a percentage of the total population in all areas at this level.

Page 24: Automated geocoding using postal codes · 4/15/2015  · •Several geocoding scenarios are possible 1. Only postal codes available • Use PCCF-SLI or PCCF+ to assign geographic

3. Typical case strategy

4/15/2015 Statistics Canada • Statistique Canada 24

Page 25: Automated geocoding using postal codes · 4/15/2015  · •Several geocoding scenarios are possible 1. Only postal codes available • Use PCCF-SLI or PCCF+ to assign geographic

Typical case scenario

• Researcher has access to a data file containing records of individuals (students, clients, respondents)

• Data file contains postal code of place of residence

• Data file is missing some aspect required for analysis (socio-economic, environmental, geographic codes)

• Desire is to exploit some or all uses of small area geography as described above

• Postal codes may be appropriate for this purpose

4/15/2015 Statistics Canada • Statistique Canada 25

Page 26: Automated geocoding using postal codes · 4/15/2015  · •Several geocoding scenarios are possible 1. Only postal codes available • Use PCCF-SLI or PCCF+ to assign geographic

Additional case scenarios

• Insufficient documentation

• Vintage of coding standard not included (don’t assume)

• Method of assigning multiple links not specified (SLI)

• Diagnostic codes not included

• Problem codes not identified (business, PO Box, etc…)

• Available geographic coding not suitable

• Not available at the level needed

• Not of correct vintage

• Too imprecise or inaccurate for intended use

4/15/2015 Statistics Canada • Statistique Canada 26

Page 27: Automated geocoding using postal codes · 4/15/2015  · •Several geocoding scenarios are possible 1. Only postal codes available • Use PCCF-SLI or PCCF+ to assign geographic

Potential case strategies

• Several geocoding scenarios are possible

1. Only postal codes available

• Use PCCF-SLI or PCCF+ to assign geographic codes, etc…

2. Full street address available

• Use address geocoding software (GIS)

• Use PCCF-SLI or PCCF+ on postal code portion of address

3. Telephone numbers available

• Reverse lookup to get postal code or address

• Use 911 system maps to get location from address

4/15/2015 Statistics Canada • Statistique Canada 27

Page 28: Automated geocoding using postal codes · 4/15/2015  · •Several geocoding scenarios are possible 1. Only postal codes available • Use PCCF-SLI or PCCF+ to assign geographic

4. Why PCCF+?

4/15/2015 Statistics Canada • Statistique Canada 28

Page 29: Automated geocoding using postal codes · 4/15/2015  · •Several geocoding scenarios are possible 1. Only postal codes available • Use PCCF-SLI or PCCF+ to assign geographic

Why PCCF+ and not regular PCCF (with SLI=1)?

1. Supplemental coding

2. Postal codes less than perfect

3. Documentation and diagnostics

4. Vintage of postal codes

5. Population weighted approach

6. Postal codes used by residents for “incompletely enumerated Indian Reserves”

4/15/2015 Statistics Canada • Statistique Canada 29

Page 30: Automated geocoding using postal codes · 4/15/2015  · •Several geocoding scenarios are possible 1. Only postal codes available • Use PCCF-SLI or PCCF+ to assign geographic

Why PCCF+? – 1: supplemental coding

• ID, PCODE

• PR, CD, CSD, CCSD

• CMA, CT, MIZ, ER, FED

• DA, BLK

• BLKURB*, DPL*

• LAT, LONG

• HR, SUB, AHR, ASUB

• QAIPPE, IMMTER

• CSIZE, NSREL, AIRLIFT, AR

• EA81uid, EA86uid, EA91uid EA96uid, DA01uid, DA06uid, DA11uid

4/15/2015 Statistics Canada • Statistique Canada 30

* Poorly coded and not recommended for analytic use

Also available from PCCF single-link

Page 31: Automated geocoding using postal codes · 4/15/2015  · •Several geocoding scenarios are possible 1. Only postal codes available • Use PCCF-SLI or PCCF+ to assign geographic

Why PCCF+? – 2: postal codes less than perfect

• Most files will include some postal codes that never existed (reporting or data capture errors)

• Sensitive files may omit the last digit of the postal code

• Some files may only contain the first 3 digits of the postal code

• PCCF+ can be used to geocode the above information

4/15/2015 Statistics Canada • Statistique Canada 31

Page 32: Automated geocoding using postal codes · 4/15/2015  · •Several geocoding scenarios are possible 1. Only postal codes available • Use PCCF-SLI or PCCF+ to assign geographic

Why PCCF+? – 3: documentation & diagnostics

• Output is documented with user manual and version

• Method has been validated in many publications

• SAS code can be tweaked so results are exactly reproducible

• Define a specific kernel for probabilistic assignment

• Diagnostic codes for problem codes are provided

• Two outputs: Full file & Problem File

4/15/2015 Statistics Canada • Statistique Canada 32

DMT, DMTDIFF RPF, SERV, PREC

LINK (PROB) BLG NAME + ADR

SOURCE CSDNAME + TYPE

NCSD, NCD CPCCODE

RESFLG, INSTFLG

Page 33: Automated geocoding using postal codes · 4/15/2015  · •Several geocoding scenarios are possible 1. Only postal codes available • Use PCCF-SLI or PCCF+ to assign geographic

Why PCCF+? – 4: “Vintage” of postal codes

• Postal codes on your file spans more than one census

• PCCF+ assigns DA or EA from each census from 1981 through 2011

• Useful for time-varying variables

4/15/2015 Statistics Canada • Statistique Canada 33

Page 34: Automated geocoding using postal codes · 4/15/2015  · •Several geocoding scenarios are possible 1. Only postal codes available • Use PCCF-SLI or PCCF+ to assign geographic

Why PCCF+? – 5: population weighting

• Almost all rural and several urban categories of postal code provide service to multiple DAs, CSDs, etc…

• Use of the SLI=1 in PCCF forces any occurrence of a postal code to only one set of geocodes

• Using single-link approach introduces systematic bias

• PCCF+ probabilistically assigns each postal code record using census derived population weights

4/15/2015 Statistics Canada • Statistique Canada 34

Page 35: Automated geocoding using postal codes · 4/15/2015  · •Several geocoding scenarios are possible 1. Only postal codes available • Use PCCF-SLI or PCCF+ to assign geographic

Why PCCF+? – 6: Indian Reserves

• Your file includes postal codes used by residents of “incompletely enumerated Indian Reserves”

• These postal codes will not properly be coded by PCCF-SLI

• PCCF+ includes census population weights adjusted to account for estimates of the population living on the incompletely enumerated reserves

4/15/2015 Statistics Canada • Statistique Canada 35

Page 36: Automated geocoding using postal codes · 4/15/2015  · •Several geocoding scenarios are possible 1. Only postal codes available • Use PCCF-SLI or PCCF+ to assign geographic

Limitations with PCCF-SLI & PCCF+

• In rural areas and at urban fringe, probabilistic assignment leads to random misclassification of DA and neighbourhood income quintiles

• Reduced ability to detect effects in rural areas

• Lower RRs and RDs for epidemiologic studies

• This is effect modification not confounding, so it is recommended to stratify analysis by urban & rural

• Take care in interpreting lower effect estimates in rural versus urban areas

4/15/2015 Statistics Canada • Statistique Canada 36

Page 37: Automated geocoding using postal codes · 4/15/2015  · •Several geocoding scenarios are possible 1. Only postal codes available • Use PCCF-SLI or PCCF+ to assign geographic

Limitations with PCCF and PCCF+

• Postal codes may change over time

1. Many technical changes to address ranges

• Usually no change at block-face of block level

• Very little change at higher levels

2. Some reuse of retired postal codes within same FSA

3. Two FSA in British Columbia moved in mid-90s

• Moral – Code as received and interpret the output

4/15/2015 Statistics Canada • Statistique Canada 37

Page 38: Automated geocoding using postal codes · 4/15/2015  · •Several geocoding scenarios are possible 1. Only postal codes available • Use PCCF-SLI or PCCF+ to assign geographic

Concluding remarks

• Small-area geography & spatial coordinates are part of most data sets and useful in most studies

• Familiarity with methods, limitations, and interpretation of data helps research more meaningfully exploit data potential

• It is not enough to use the data mechanically, users need to think about what they are doing and why

4/15/2015 Statistics Canada • Statistique Canada 38

Page 39: Automated geocoding using postal codes · 4/15/2015  · •Several geocoding scenarios are possible 1. Only postal codes available • Use PCCF-SLI or PCCF+ to assign geographic

More information…

• Contact:

[email protected]

• Acknowledgments

• Russell Wilkins (retired) & Paul A Peters (University of New Brunswick)

4/15/2015 Statistics Canada • Statistique Canada 39

Page 40: Automated geocoding using postal codes · 4/15/2015  · •Several geocoding scenarios are possible 1. Only postal codes available • Use PCCF-SLI or PCCF+ to assign geographic

Extra slides

4/15/2015 Statistics Canada • Statistique Canada 40

Page 41: Automated geocoding using postal codes · 4/15/2015  · •Several geocoding scenarios are possible 1. Only postal codes available • Use PCCF-SLI or PCCF+ to assign geographic

FSAs do not respect CSD boundaries

4/15/2015 Statistics Canada • Statistique Canada 41

Page 42: Automated geocoding using postal codes · 4/15/2015  · •Several geocoding scenarios are possible 1. Only postal codes available • Use PCCF-SLI or PCCF+ to assign geographic

Limitation of SLI (e.g., 2001 Census Geography)

• Over a third of the total population of rural and small town Canada can never get the correct DA code when using the PCCF SLI since nearly 11,000 DAs are never linked to postal codes when only the SLI is selected.

• Also at the CSD level, over a quarter of all CSDs never get coded using SLI. In rural and small town Canada, nearly 30% of CSDs never get coded using the SLI.

4/15/2015 Statistics Canada • Statistique Canada 42