22
26/11/2015 1 www.statcan.gc.ca Postal Code Conversion for Data Analysis An overview of the PCCF and PCCF+ Saeeda Khan Michael Tjepkema Health Analysis Division, Statistics Canada December 1, 2015 Outline 1. Postal codes Components of a postal code Uses of small-area data 2. Introduction to the Postal Code Conversion File (PCCF) and the Postal Code Conversion File Plus (PCCF+) 3. Single link indicator geocoding versus population- weighting 4. Why PCCF+? 5. Limitations of PCCF & PCCF+ 11/26/2015 Statistics Canada • Statistique Canada 2

Postal Code Conversion for Data Analysis - … · 26/11/2015 1 Postal Code Conversion for Data Analysis An overview of the PCCF and PCCF+ Saeeda Khan Michael Tjepkema Health Analysis

  • Upload
    buidan

  • View
    240

  • Download
    0

Embed Size (px)

Citation preview

26/11/2015

1

www.statcan.gc.ca

Postal Code Conversion for Data Analysis

An overview of the PCCF and PCCF+

Saeeda KhanMichael Tjepkema

Health Analysis Division, Statistics Canada

December 1, 2015

Outline

1. Postal codes

• Components of a postal code

• Uses of small-area data

2. Introduction to the Postal Code Conversion File (PCCF) and the Postal Code Conversion File Plus (PCCF+)

3. Single link indicator geocoding versus population-weighting

4. Why PCCF+?

5. Limitations of PCCF & PCCF+

11/26/2015Statistics Canada • Statistique Canada2

26/11/2015

2

1. Postal Codes

11/26/2015Statistics Canada • Statistique Canada3

What are postal codes?

• An identifier managed by Canada Post Corporation for the efficient sorting and delivery of mail.

• They are not created as units for the analysis or mapping of population, business or dwelling characteristics.

• However, postal codes are part of most administrative data sets and are usually the only variable available for geographic identification

• Thus, they are important identifiers for geocoding

11/26/2015Statistics Canada • Statistique Canada4

26/11/2015

3

Components of a postal code

• The postal code is a six-character alphanumeric code

• Postal codes are not geographic attributes

• Only spatial in that mail is delivered by geographic area

• Six character code ‘ANA NAN’

• First 3 – Forward Sortation Area (FSA)

• Last 3 – Local Delivery Unit (LDU)

11/26/2015Statistics Canada • Statistique Canada5

Statistics Canada. Postal Codes Conversion File (PCCF), Reference Guide. Catalogue no. 92-153-G, no 02. Ottawa, ON: Statistics Canada, 2011.

What is a postal code?

11/26/2015Statistics Canada • Statistique Canada6

ANA NAN

Province / Territory / Region First Character

Newfoundland and Labrador A

Nova Scotia B

Prince Edward Island C

New Brunswick E

Eastern Québec G

Metropolitan Montréal H

Western Québec J

Eastern Ontario K

Central Ontario L

Metropolitan Toronto M

Southwestern Ontario N

Northern Ontario P

Manitoba R

Saskatchewan S

Alberta T

British Columbia V

Northwest Territories and Nunavut X

Yukon Y

ForwardSortationArea

LocalDeliveryUnit

if 0 then ruralif 1-9 then urban

26/11/2015

4

Components of a postal code

11/26/2015Statistics Canada • Statistique Canada7

Components of a postal code

• Local Delivery Unit (LDU)

• Letter carrier delivery to ordinary urban address

• Community mailbox

• Apartment building

• Business building

• Large firm or organisation (Foothills Medical Centre: T2N2T9; CBC: M5W 1E6)

• Federal department or agency (Statistics Canada: K1A 0T6)

• Mail delivery route (suburban, rural, or mobile)

• General delivery and post office boxes (large or small)

11/26/2015Statistics Canada • Statistique Canada8

Statistics Canada. Postal Codes Conversion File (PCCF), Reference Guide. Catalogue no. 92-153-G, no 02. Ottawa, ON: Statistics Canada, 2011.

26/11/2015

5

Components of a postal code

Haydu G. The Postal Code – Geographic classification code conversion file, a tool for social science research. Paper presented at the

1979 annual meeting of the Canadian Association of Geographers, Victoria, BC, Canada.

11/26/2015Statistics Canada • Statistique Canada9

How can postal codes be used for analysis

• Postal codes are part of most administrative data sets

• PCCF, PCCF+, and related tools are now the standard

• Allows for the conversion of address and postal code attributes to standard geographical codes

• Used in data collection, processing, and analysis, e.g., dissemination area (DA), census tract (CT), health region (HR)

• Resulting small-area geography have a variety of uses

• Familiarity with the methods, strengths, and limitations will help researchers exploit the potential

11/26/2015Statistics Canada • Statistique Canada10

26/11/2015

6

Uses of small area data

• Add policy relevance by aggregating to admin areas

• Health Regions, School Districts, etc…

• Deal with changes over time (boundary shifts)

• Assign neighbourhood socio-economic status (SES) and other confounders

• Determine point-distance, road distance, travel time

• Allow for studies of migration over time (longitudinal)

• Help in the imputation of missing data

• Obtain additional identifiers for record linkage

11/26/2015Statistics Canada • Statistique Canada11

2. Introduction to the PCCF and PCCF+

11/26/2015Statistics Canada • Statistique Canada12

26/11/2015

7

What is the PCCF?

• A flat file that links postal codes (active and retired) to standard geographic areas

• Allows for:• Association of postal codes to standard geographic areas

• Selection of statistical units by geographic areas

• Provides linkages (including a single link indicator (SLI)) to block face (BF), dissemination block (DB), and dissemination area (DA)

• However, some postal codes are only linked to post office locations, many serve multiple DAs, and some are non-residential (government offices, etc)

11/26/2015Statistics Canada • Statistique Canada13

Statistics Canada. Postal Codes Conversion File (PCCF), Reference Guide. Catalogue no. 92-153-G, no 02. Ottawa, ON: Statistics Canada, 2011.

What is the PCCF+?

• The PCCF+ consists of:

1. SAS control program,

2. reference files primarily derived from the PCCF

3. postal code population-weight file derived from the Census of Population

• Assigns geographic identifiers based on postal codes

• Full diagnostic output (troublesome postal codes, precision of geocoding, etc.)

• Provides residential & institutional coding separately

11/26/2015Statistics Canada • Statistique Canada14

Wilkins R, Peters PA. PCCF+ Version 5K User’s Guide: Automated geocoding based on the Statistics Canada Postal Code Conversion File.

Catalogue no. 82F0086-XDB. Ottawa, ON: Statistics Canada, 2011.

26/11/2015

8

Importance of Identifying Non-residential PCs

• PCCF+ is able to identify non-residential postal codes

• Government Offices, e.g., Statistics Canada

• Coroners Offices

• Children’s Aid Societies

• Hospitals in a Birth File

• Tax preparers office in a Tax File

• UPS Store, Mailboxes Etc,

11/26/2015Statistics Canada • Statistique Canada15

How does the PCCF+ geocode postal codes?

• Assigns geographic identifiers based on postal codes in a staged approached:

1. assigns 6-digit postal codes in rural areas to disseminations areas (DA) and dissemination blocks (DB) using population-weighted random allocation

2. assigns 6-digit postal codes with an exact match to a PCCF unique record

3. randomly assigns 6-digit postal codes with an exact match to a PCCF duplicate record

4. imputes full geography for the first 5-, first 4- and first 3-digit postal codes using census population weights

5. imputes partial geography for the first 2-digit postal codes

11/26/2015Statistics Canada • Statistique Canada16

Wilkins R, Peters PA. PCCF+ Version 5K User’s Guide: Automated geocoding based on the Statistics Canada Postal Code Conversion File.

Catalogue no. 82F0086-XDB. Ottawa, ON: Statistics Canada, 2011.

26/11/2015

9

Uses of the PCCF and the PCCF+

• A 2011 literature review for publications using the PCCF and PCCF+ resulted in 622 publications

• Health Sciences 463 (74%)

• Social Sciences & Economics 93 (15%)

• Education, data, & statistics 34 (6%)

• Natural & applied sciences 12 (2%)

• Other 20 (3%)

• Articles appeared in 233 different journals, top two:

• Canadian Medical Association Journal (23)

• Canadian Journal of Public Health (19)

11/26/2015Statistics Canada • Statistique Canada17

Peller P. An analysis of the Postal Code Conversion File’s use in research. DLI research paper series, 2011. Calgary, AB: University of Calgary.

3. PCCF-SLI vs. PCCF+

11/26/2015Statistics Canada • Statistique Canada18

26/11/2015

10

Single-link (PCCF-SLI) vs. PCCF+

• PCCF-SLI forces each postal code to be assigned to a single dissemination area (DA) & dissemination block (DB), regardless of how large the actual service area may be

• For most research purposes, the distribution of the population across the entire service area is needed

• PCCF+ uses a population-weighted method of geocoding where multiple-matches are possible

• As such, the distribution of respondents more accurately reflects the underlying population

• “Numerator-denominator consistency”

11/26/2015Statistics Canada • Statistique Canada19

11/26/2015Statistics Canada • Statistique Canada20

PCCF (SLI) PCCF+

Of 10 records reporting this postal code,

all 10 will be assigned to DA 1 using the

PCCF single link indicator (SLI)

Of 10 records reporting this postal code, 6

will be assigned to DA 1, 3 to DA2 and 1 to

DA 3 using the PCCF+

10

0

0

A1A 1A1

DA 1

60%DA 2

30%

DA 3

10%

A1A 1A1

6

3

1

A1A 1A1

26/11/2015

11

Population assignment using PCCF-SLI

11/26/2015Statistics Canada • Statistique Canada21

Saskatchewan

Manitoba

Alberta

Population assignment using PCCF+

11/26/2015Statistics Canada • Statistique Canada22

Saskatchewan

Manitoba

Alberta

26/11/2015

12

Population non-assignment via PCCF-SLI & PCCF+

11/26/2015Statistics Canada • Statistique Canada23

Geographic Unit PCCF-SLI PCCF+

# of Units Percent of Population

# of Units Percent of Population

DA 8,476 2.9 187 0

CT 73 0.1 7 0

CMA .. .. .. ..

CSD 1,438 0.6 109 0

CD .. .. .. ..

Percent of total 2006 census population in areas with no respondent assignment

Population assignment using PCCF-SLI

11/26/2015Statistics Canada • Statistique Canada24

Gatineau

Ottawa

26/11/2015

13

Population assignment using PCCF+

11/26/2015Statistics Canada • Statistique Canada25

Gatineau

Ottawa

Population miss-assignment using PCCF-SLI & PCCF+

11/26/2015Statistics Canada • Statistique Canada26

Geographic Unit PCCF PCCF+

% of total population % of total population

DA 37.4 7.6

CT 6.6 1.4

CMA 4.3 0.1

CSD 11.4 2.7

CD 1.1 0.3

Comparison of population coding errors using PCCF-SLI versus PCCF+ (5J)*

* Population coding errors are defined as the sum over all areas at this geographic level of the absolute value of the population coded less the population known from the census sample, expressed as a percentage of the total population in all areas at this level.

26/11/2015

14

Limitation of SLI (e.g., 2001 Census Geography)

• Over a third of the total population of rural and small town Canada can never get the correct dissemination area (DA) code when using the PCCF SLI since nearly 11,000 DAs are never linked to postal codes when only the SLI is selected.

• Also at the census subdivision (CSD) level, over a quarter of all CSDs never get coded using SLI. In rural and small town Canada, nearly 30% of CSDs never get coded using the SLI.

11/26/2015Statistics Canada • Statistique Canada27

4. Why PCCF+?

11/26/2015Statistics Canada • Statistique Canada28

26/11/2015

15

Why PCCF+ and not regular PCCF (with SLI=1)?

1. Population weighted approach

2. Supplemental coding

3. Postal codes less than perfect

4. Documentation and diagnostics

5. Modifiable SAS code

6. Vintage of postal codes

7. Postal codes used by residents for “incompletely enumerated Indian Reserves”

11/26/2015Statistics Canada • Statistique Canada29

Why PCCF+? – 1: population weighting

• Almost all rural and several urban categories of postal code provide service to multiple dissemination areas (DAs), census subdivisions (CSDs), etc…

• Use of the single link indicator (SLI) equal to 1 in PCCF forces any occurrence of a postal code to only one set of geocodes

• Using single-link approach introduces systematic bias

• PCCF+ probabilistically assigns each postal code record using census derived population weights

11/26/2015Statistics Canada • Statistique Canada30

26/11/2015

16

Why PCCF+? – 2: supplemental coding• ID, PCODE

• PR, CD, CSD, CCSD

• CMA, CT, MIZ, ER, FED

• DA, BLK

• BLKURB*, DPL*

• LAT, LONG

11/26/2015Statistics Canada • Statistique Canada31

* Poorly coded and not recommended for analytic use

PCCF-SLI

&

PCCF+

Why PCCF+? – 2: supplemental coding• ID, PCODE

• PR, CD, CSD, CCSD

• CMA, CT, MIZ, ER, FED

• DA, BLK

• BLKURB*, DPL*

• LAT, LONG

• HR, AHR

• QAIPPE, IMMTER

• CSIZE, NSREL, AIRLIFT, AR

• EA81uid, EA86uid, EA91uid EA96uid, DA01uid, DA06uid, DA11uid

11/26/2015Statistics Canada • Statistique Canada32

* Poorly coded and not recommended for analytic use

PCCF-SLI

&

PCCF+

PCCF+

only

26/11/2015

17

Why PCCF+? – 3: postal codes less than perfect

• Most files will include some postal codes that never existed (reporting or data capture errors)

• Sensitive files may omit the last digit of the postal code

• Some files may only contain the first 3 digits of the postal code

• PCCF+ can be used to geocode the above information

11/26/2015Statistics Canada • Statistique Canada33

Why PCCF+? – 4: documentation & diagnostics

• Output is documented with user manual and version

• Method has been validated in many publications

• Diagnostic codes for problem codes are provided

• Two outputs: Full file & Problem File

11/26/2015Statistics Canada • Statistique Canada34

DMT, DMTDIFF RPF, SERV, PREC

LINK (PROB) BLG NAME + ADR

SOURCE CSDNAME + TYPE

NCSD, NCD CPCCODE

RESFLG, INSTFLG

This variable provides a

measure of the quality

of the geographic

coordinates assigned to

the representative point

26/11/2015

18

Why PCCF+? – 5: Modifiable SAS code

11/26/2015Statistics Canada • Statistique Canada35

• Length of ID variable can be changed

• SAS code can be easily tweaked so results are exactly reproducible

• Define a specific kernel for probabilistic assignment

/********************************************************************************************/

/* Random Seed Value */

/* If the seed value is 0 (default) then computer time is used */

/* Change this value as desired to use the same seed between PCCF+ trials */

%let seedVal=0;

Why PCCF+? – 6: “Vintage” of postal codes

• PCCF+ assigns full census geography for most recent census year

• It also assigns dissemination (DA) area or enumeration area (EA) from each previous census back to 1981

• Useful for time-varying analysis

• For higher levels of vintage geography (e.g., CMA) use the Geographic Attributes File (GAF) or the Geographic Tape File (GTF)

11/26/2015Statistics Canada • Statistique Canada36

26/11/2015

19

Why PCCF+? – 7: Indian Reserves

• Your file includes postal codes used by residents of “incompletely enumerated Indian Reserves”

• These postal codes will not properly be coded by PCCF-SLI

• PCCF+ includes census population weights adjusted to account for estimates of the population living on the incompletely enumerated reserves

11/26/2015Statistics Canada • Statistique Canada37

Summary: PCCF+ vs PCCF-SLI

• Consider using PCCF+ rather than PCCF-SLI if any of the following apply

• You want to do better coding in rural areas

• You want to use variables present on the PCCF+ which are not present in regular PCCF

• Your file is less than perfect with respect to postal codes

• You want help to evaluate the quality of the postal code on your data file

• The “vintage” of the postal codes on your file spans more than one census

• If your file includes postal codes used by residents of “incompletely enumerated Indian Reserves”

11/26/2015Statistics Canada • Statistique Canada38

26/11/2015

20

4. Limitation of the PCCF-SLI & the PCCF+

11/26/2015Statistics Canada • Statistique Canada39

Limitations with PCCF-SLI & PCCF+

• In rural areas and at urban fringe, probabilistic assignment leads to random misclassification of dissemination area (DA) and neighbourhood income quintiles

• Reduced ability to detect effects in rural areas

• Lower risk ratios (RRs) and risk differences (RDs) for epidemiologic studies

• This is effect modification not confounding, so it is recommended to stratify analysis by urban & rural

• Take care in interpreting lower effect estimates in rural versus urban areas

11/26/2015Statistics Canada • Statistique Canada40

26/11/2015

21

Limitations with PCCF and PCCF+

• Postal codes may change over time

1. Many technical changes to address ranges• Usually no change at block-face of block level

• Very little change at higher levels

2. Some reuse of retired postal codes within same FSA

3. Two FSA in British Columbia moved in mid-90s

• Generally, these changes translate to • no change of the block face (BF) or dissemination block (DB)

latitude/longitude

• very little change at higher levels (dissemination area (DA), census tract (CT), etc.)

• Moral – code as received and interpret the output11/26/2015Statistics Canada • Statistique Canada41

Concluding remarks

• Small-area geography & spatial coordinates are part of most data sets and useful in most studies

• Familiarity with methods, limitations, and interpretation of data helps researchers more meaningfully exploit data potential

• It is not enough to use the data mechanically, users need to think about what they are doing and why

• Consult the PCCF+ documentation

11/26/2015Statistics Canada • Statistique Canada42

26/11/2015

22

Thank you!

• Acknowledgments

• Russell Wilkins (retired), Paul A Peters (University of New Brunswick) & Michael Tjepkema (Health Analysis Division)

• For more information please contact:

[email protected]

11/26/2015Statistics Canada • Statistique Canada43