An Open-source Place-finder for Genealogy

Preview:

DESCRIPTION

An Open-source Place-finder for Genealogy presented by Dallan Quass and Ryan Knight at RootsTech 2012 Translate place texts to fully-qualified standardized place names, including historical.

Citation preview

A n O p e n -s o u r c e P la c e -f in d e r f o r G e n e a lo g y

Dallan Quass dallan@werelate.orgRyan Knight ryan@grandcloud.com

What's the problem?

Philadelphia PA

After leaving Marion he moved to Cambridge, MA.

Church R ow Goudhurst K ent

L o s A n g les , C a l i f or n ia

Kanesville, (Council Bluffs),

Pottawattamie, IANot stated, Ohio, KentuckyLathom,

Yorkshire, England Tranbylier, Bskrd.

Norway

L a Ju n t a , C O

Of Cranbury, Middlesex,

NJ, Germany

Great ford (near Mar t on) , NZ

K ingston Surrey

P r e s t in g ol , S t A g n e s , C o r n w a l l , E n g la n d

D e ir d o r f ,

R h in e la n d , P r s s ,

( G e r m a n y )Farm e r. &

Butche r.

Laboure rReturned to Boston Mass. with parents

after a visit to Nova Scotia.

Genealogists write places in many different ways

Some are misspelled

Los angles, California

Others are abbreviated

Tranbylier, Bskrd. Norway

Deirdorf, Rhineland, Prss, (Germany)

Some leave out commas

Philadelphia PA

Tranbylier, Bskrd. Norway

Church Row Goudhurst Kent

Kingston Surrey

Others have extra words

Not stated, Ohio, Kentucky

Of Cranbury, Middlesex, NJ, Germany

After leaving Marion he moved to Cambridge, MA.

Returned to Boston Mass. with parents after a visit to Nova Scotia.

Some no longer existor exist under different names or jurisdictions

Kanesville, (Council Bluffs), Pottawattamie, IA

Deirdorf, Rhineland, Prss, (Germany)

Others have an incorrect intermediate level

Lathom, Yorkshire, England

Some can't be found anywhere

Prestingol, St Agnes, Cornwall, England

Others can be found in multiple places

La Junta, CO

Philadelphia PA

Kingston Surrey

And finally, some aren't places at all

Farmer. & Butcher. Labourer

Why does it matter?

Search

Match

Maps

How does it work?

Steps

1. Work right-to-left, finding matching places - split on commas - back off if no matches

Ramsey, Hennepin, MN United States

Steps

1. Work right-to-left, finding matching places - split on commas - back off if no matches

2. Keep only subordinate jurisdictions - if none are subordinate, try skipping a level - if still no matches, ignore this level

Ramsey, Hennepin, MN United States

Steps

Ramsey, Hennepin, MN United States1. Work right-to-left, finding matching places - split on commas - back off if no matches

2. Keep only subordinate jurisdictions - if none are subordinate, try skipping a level - if still no matches, ignore this level

3. If there are multiple matches (ambiguous) - filter on type - filter out subordinate places - rank remaining matches

Ramsey, Minnesota, United States

Ramsey, Anoka, Minnesota, United States

Ramsey, Mower, Minnesota,United States

Database

WeRelate has a database of 435,000 places

• Includes inhabited places and record-keeping jurisdictions

• Excludes geographic entities like rivers, mountains, etc.

• Not complete, but we've researched and added additional places that appear frequently in GEDCOMs

Wiki as a Database

Wiki as a Database

How it began

WikipediaGetty

Thesaurus of Geographic

Names

Family History Catalog

All of us are smarter than any of us

Community input

Community oversight

Community oversight

Result

Proof is in the pudding

Compare to FamilySearch

Standardized 3736 place texts chosen at random from GEDCOMs using both algorithms

• 1911 standardized the same

• 1825 were different

Let's look at the K's

GEDCOM place text

This project Family Search Best guess

kaiapoi, nz Kaiapoi, Canterbury, New Zealand

Kaiapoi, Canterbury, Canterbury, New Zealand

Kaiapoi, Waimakariri (district), Canterbury (region), New Zealand

kanesville, (council bluffs), pottawattamie, ia

Council Bluffs, Pottawattamie, Iowa, United States

Kanesville, Pottawattamie, Iowa, United States

Council Bluffs (formerly Kanesville), Pottawattamie, Iowa, United States

kansas city, missouri Kansas City, Cass, Missouri, United States

Kansas City, Jackson, Missouri, United States

located in Jackson, Clay, Cass, and Platte counties

kelvin grove cemetary, palmerston north, (section s block 3 plot 38)

Palmerston North, Manawatu-Wanganui, New Zealand

Kelvin Grove, Barkly East, Cape of Good Hope, South Africa

Kelvin Grove Cemetery, Palmerston North, Manawatu-Wanganui (region), New Zealand

Let's look at the K's

GEDCOM place text

This project Family Search Best guess

kenny ?? cots altandhu lochbroom

Lochbroom, Ross and Cromarty, Scotland

Loch Broom, Pictou, Nova Scotia, Canada

Altandhu, Lochbroom, Ross and Cromarty, Scotland

kincardine ross & cromarty

Cromarty, Ross and Cromarty, Scotland

Ross and Cromarty, Scotland

Kincardine, Ross and Cromarty (county), Scotland

, king queen, virginia, usa

King, Wetzel, West Virginia, United States

,King, Clay, Virginia, United States

King and Queen (county), Virginia, United States

kingston surrey Kingston, Surrey, Jamaica

Kingston, Surrey, England

both places exist, but England is more likely

Bottom line

Of the 38 place texts compared

• 3 texts were either not a place of were truly ambiguous

• 8 texts weren't matched correctly by either system

• 10 texts were matched to the same place (just named differently) by both systems

• 11 texts were matched better by this project

• 9 texts were matched better by FamilySearch's project

Interestingly, these results are similar to the Nature study comparing Wikipedia with Encyclopedia Britannica – both had about the same number of mistakes.

Roadmap

• 2005-2011 Place wiki pages under development at WeRelate

• Jan 2011 Open-source project created

• Feb 2011 Announce at RootsTech

• Mar 2011 Incorporate new algorithm at WeRelate

Continued improvements

Future work

Analyze differences with FamilySearch

Review frequent missing places

Use machine learning for better scoring of ambiguous places

Demonstration of Places Server

• Demonstrates Matching Places

• Built with Play 1.2.4 - A Java Web framework

Allows for rapid development of web applications with a fully integrated stack

• Deployed to Heroku – Cloud Application Platform

– Heroku allows one step deployment with git

Demonstration of Places Server

Demonstration of Places Server

Demonstration of Places Server

Demonstration of Labeler

• Community feedback on places we couldn’t match

• Provides the best guess from the Places Standardizers

Demonstration of Labeler

Conclusion

Matching places is hard

• people record places in lots of different ways

But it’s important

• useful in search, match, and mapping

Open source algorithm and database are now freely available

• http://github.com/DallanQ/Places

Not perfect, but ongoing improvement

Hopefully others will benefit from this effort

Images appearing on these slides are copyrighted by the contributors to http://commons.wikimedia.org and are used under license

Recommended