33
Address Day what next after the Address Wars Jeni Tennison - @JeniT 5 March 2015 https://openaddressesuk.org @openaddressesuk

BCS Address Day - Open Addresses

Embed Size (px)

Citation preview

Page 1: BCS Address Day - Open Addresses

Address Daywhat next after the Address Wars

Jeni Tennison - @JeniT5 March 2015

https://openaddressesuk.org@openaddressesuk

Page 2: BCS Address Day - Open Addresses

In economics, a public good is a good that is both non-excludable and non-rivalrous in that individuals cannot be effectively excluded from use and where use by one individual does not reduce availability to others.

Wikipedia - Public good

Page 3: BCS Address Day - Open Addresses

"Tompkins Square Park Central Knoll" by David Shankbone - (CC BY-SA 3.0) via Wikimedia Commons

Page 4: BCS Address Day - Open Addresses

open data

public good

Page 5: BCS Address Day - Open Addresses

sum of what everyone would pay

what it costs to maintain

When should a good be public?

Page 6: BCS Address Day - Open Addresses

Address data should be open data

● National Information Infrastructure● Not just for posting mail...

○ geocoding for route finding○ associating people with areas○ classification for targeting interventions○ linking datasets together

● Denmark has taken this step○ 1000% increase use of address data○ costs = €0.2M - benefits = €14M

Page 7: BCS Address Day - Open Addresses

Current real life problems

● startup wanting to build an application○ prohibitive costs○ prohibitive licensing complexity

● SME with a geodemographic product○ prohibitive costs○ limiting customer base & growth

● New build owners○ 3 months to register to vote, order pizza

Page 8: BCS Address Day - Open Addresses

Funding public goods

● Government via taxation● Collaborative bound by contract● Cross-subsidy by selling other goods● Voluntary effort● Social norms

Page 9: BCS Address Day - Open Addresses

"The sale of the PAF with the Royal Mail was a mistake. Public access to public sector data must never be sold or given away again. This type of information, like census information and many other data sets, is very expensive to collect and collate into useable form, but it also has huge potential value to the economy and society as a whole if it is kept as an open, public good."

Bernard Jenkin, Chair of Public Administration Select Committee

Page 10: BCS Address Day - Open Addresses

Hypothesis 1: the maintenance of open address data can only be effectively funded through taxation

Hypothesis 2: it is possible to build and maintain a sustainable open address database using collaboration, cross-subsidy and voluntary effort

Page 11: BCS Address Day - Open Addresses
Page 12: BCS Address Day - Open Addresses

Goals

● Free, openly licensed, up-to-date bulk downloads of addresses

● Freemium services over that data○ eg validation, auto-completion, geocoding

● 100% open source, collaboratively maintained

● Initial ~£400k investment from government○ compared with £25M annual cost maintaining PAF

Page 13: BCS Address Day - Open Addresses

Eventual Architecture

“Definitive” UK address list- where the address data is safe to use- where each record has confidence and provenance

Bulk - Download- Upload

APIs- Add- Sort- Validate- Search

URLs- Linked data- Extensibility

Service Providers Aggregators, digital, telecoms, public sector, distribution, academics, manufacturers etc

Services - Websites, Users

Val

ue

Rev

enue

for s

usta

inab

ility

Page 14: BCS Address Day - Open Addresses

This takes time

Large datasets and inference to tackle the bulk of the challenge “80/20” rule

Ongoing, collaborative maintenance

Targeted work. Low-volume records to fill existing gaps in available datasets

NB: dates are “just for fun”

Page 15: BCS Address Day - Open Addresses

Approaches

1. Load open datasets containing addresses2. Build out crowdsourcing mechanisms3. Use inference to fill gaps

and throughout:● keep track of provenance● keep track of confidence

Page 16: BCS Address Day - Open Addresses

Loading datasets

Third Party IPRPossibly infected if validated against PAF or AddressBase ⇒ most Government “open” data is infectedA few not:● Companies House● err...

Page 17: BCS Address Day - Open Addresses

Platform for loading bulk data

Originally developed for OpenCorporatesSandboxed environment for running scripts

Page 18: BCS Address Day - Open Addresses

Motivating crowdsourcing

Bulk - Download- Upload

APIs- Add- Sort- Validate- Search

URLs- Linked data- Extensibility

Val

ue

Building Blocks- towns, postcodes, streets- used to parse data and provide

confidence in the address list- links between towns, postcodes

and streets are learned from addresses

Authoritative and definitive UK address list

- where the address data is safe to use

- where each record has confidence and provenanceR

even

ue fo

r sus

tain

abili

ty

Page 19: BCS Address Day - Open Addresses

● Turn free-text addresses into building blocks

● Can be used with data containing third party IPR

● Optional “contribute” option

Address parsing service

Page 20: BCS Address Day - Open Addresses

Inference

Page 21: BCS Address Day - Open Addresses

FograleaZE1 0SE

© Open Addresses Ltd.

Page 22: BCS Address Day - Open Addresses

7 9 11 13 15 17 19 21 23 25 27 29

6 8 10 12 14 16 18 20 22 24 26 28

FograleaZE1 0SE

Page 23: BCS Address Day - Open Addresses

7 9 11 13 15 17 19 21 23 25 27 29

6 8 10 12 14 16 18 20 22 24 26 28

FograleaZE1 0SE

Page 24: BCS Address Day - Open Addresses

What about nos. 1 to 4?

Same postcode? We cannot know!

FograleaZE1 0SE

Page 25: BCS Address Day - Open Addresses

Enabling collaborative maintenance

Page 26: BCS Address Day - Open Addresses

St James House, St James Square, Cheltenham, GL50 3PR7, St James Square, Cheltenham, GL50 3PTSt James North 1, St James Square, Cheltenham, GL50 3PRSt James North 3, St James Square, Cheltenham, GL50 3PR3, St James Square, Cheltenham, GL50 3PRSt James House, St James Square, Cheltenham Spa, GL50 3PRSt James North 1, St James Square, Cheltenham, GL50 3PRSt James Place, Jessop Avenue, Cheltenham, GL50 3PRSt James House, St James Square, Cheltenham, GL50 3PRApt. 3, St James Place, Jessop Avenue, Cheltenham, GL50 3PR56, Cheltenham Road, London, SE15 3AR

Calculating confidence

Page 27: BCS Address Day - Open Addresses

St James House, St James Square, Cheltenham, GL50 3PR7, St James Square, Cheltenham, GL50 3PTSt James North 1, St James Square, Cheltenham, GL50 3PRSt James North 3, St James Square, Cheltenham, GL50 3PR3, St James Square, Cheltenham, GL50 3PRSt James House, St James Square, Cheltenham Spa, GL50 3PRSt James North 1, St James Square, Cheltenham, GL50 3PRSt James Place, Jessop Avenue, Cheltenham, GL50 3PRSt James House, St James Square, Cheltenham, GL50 3PRApt. 3, St James Place, Jessop Avenue, Cheltenham, GL50 3PR56, Cheltenham Road, London, SE15 3AR

Calculating confidence

Page 28: BCS Address Day - Open Addresses

Sector Town Count Total Confidence

...

HD3 4 HUDDERSFIELD 66 66 87.71%

...

DG8 6 NEWTON STEWART 11 12 65.69%

DG8 6 STRANRAER 1 12 0.00%

DG8 7 NEWTON STEWART 1 1 0.00%

...

W3 6 LONDON 196 196 92.96%

...

CH44 4 WALLASEY 23 29 76.06%

CH44 4 WIRRAL 6 29 8.22%

Calculating confidence

This postcode/town association is right but confidence is low because of the low count

This postcode/town association is incorrect

Another correct postcode/town association, but with a higher count

This is what happens when post towns are re-organised; Wirral is now split in Birkenhead, Wallasey, Wirral and Prenton

This is how a correct postcode/town association looks like

Page 29: BCS Address Day - Open Addresses

Provenance

Page 30: BCS Address Day - Open Addresses

Summary

● Built most of the supporting platform○ parsing free text / messy addresses○ collaborative loading of data○ providing downloads, search & URL identity○ recording provenance & assigning confidence○ using inference to fill in gaps

● We have low numbers of addresses currently○ but the right mechanisms to add more○ and many potential partners

Page 31: BCS Address Day - Open Addresses

What next?

● Building the platform● Building the community of collaborators● Building services to aid cross-subsidy● Increasing quantity & quality of addresses● Can anyone else reuse the technology?● Can anyone else reuse the approach?

Page 32: BCS Address Day - Open Addresses

Any Questions?@JeniT - [email protected]

https://[email protected]

@openaddressesuk

Page 33: BCS Address Day - Open Addresses

Open Addresses Ltd. is a new company being set up to create and maintain an address database for the UK that will be made available to the public as Open Data. It will facilitate the collaborative maintenance of the address database with various stakeholders from the UK Government, industry and non-profit.

Offices

Where?