The Realities of Open Geo Data

Preview:

DESCRIPTION

Commentary on the brief history , current status, and future of open geo data

Citation preview

The Realities of Open Geo Data

Tyler Bell: @twbell

1989 - GPS

http://en.wikipedia.org/wiki/File:GPS_Satellite_NASA_art-iif.jpg

May 2000 – “Selective Availability” of GPS

Ends

August 2004 – More People Use Online Maps Than Use

Email

August 2004 – OpenStreetMap Conceived

June 2005 – Open Source And Maps Start To Come

Together

June 2006 – OpenStreetMap Maps The Isle Of Wight

December 2006 – Neogeography

July 2007 – The First OpenStreetMap Conference

July 2010 – MapQuest Open Uses OpenStreetMap Data

Frictions

Price Friction

2008 – Google Drives Down Costs

October 2009 – Farewell TeleAtlas (In The US)

License Friction

"Substantial" = "less than (sic) 100 features"

Format & Accessibility Friction

4x first-time users (woot)

JSON FTW

1. Include as much detail and resolution as possible. County-level data is better than state-level data which is better than national-level data. Bonus points if you break it down further by age, sex, etc. I understand that this can't always be done for privacy reasons, but it is immensely useful when it is possible.

2. Use a flexible file format. My preference is .csv, because it can be read by almost any program. I'll tolerate .xls, but I'm not pleased with .xlsx (not everyone uses Excel!). And please, please, please do not use pdf.

3. If you do use a spreadsheet format, do not use multiple sheets, nested headers, merged cells, strategic cell borders, etc. Make it as plain as possible. Don't worry that you'll end up with too many files if you don't use sheets. Release them in a zipped folder instead.

4. Use short variable names with no whitespace. Underscores are usually a safe bet, so instead of "Number of new tuberculosis cases" use "incident_tb". If you have a corresponding column, e.g. confidence intervals in the screenshot shown above, make the variable name relevant. Use "UCI_incident_tb" instead of relying on the column's proximity to "incident_tb" to indicate a pairing. Include a README that explains the variable names if you're worried they aren't descriptive enough.

5. Actually, include a README no matter what. It can include variable names, units of measurement, notes on data collection/reporting/suppression, or anything else that is relevant.

6. Tell me whom to cite! I'm so pleased to be able to use your data, and I'd love to give you the credit you deserve. Put your citation on your website, in your README, and everywhere else I might look for it so that I can use it appropriately. Or post it to figshare where it will automatically be assigned a doi. It's an easy way to make you data citeable, shareable, and version controlled.

Bookmark these guidelines. Next time you reach for the 'export to PDF' button, or begin to use the change-cell-

border feature on Excel, pull this out and remind yourself, 'this is not machine-readable. Nobody will use my data if I

release it like this.' Then rejoice that you are awesome for sharing your data, and for doing so in a way that is actually

useful. And for that, I thank you.

Caitlin Rivers:

"Send me your data; PDF is fine" - no one, ever

// Function: guidGenerator

// Description:returns a pseudo-random GUID

//This is appended to a url for 2 reasons

//1. to make the URL unique, so that the browser

always gets it and doesn't use a cached version

//2. to make a URL look like its got a unique key, in a

naive attempt to fool a not-so-wily hacker

//into thinking they can't download a datapack

directly if they know the URL pattern, because they

//need a unique key.

Prosser: But the plans were on display.

Arthur Dent: On display? I eventually had to go down to the

cellar.

Prosser: That's the display department.

Arthur Dent: With a torch.

Prosser: The lights had probably gone.

Arthur Dent: So had the stairs.

Prosser: But you did see the notice, didn't you?

Arthur Dent: Oh, yes. It was on display in the bottom of a

locked filing cabinet stuck in a disused lavatory with a sign

outside the door saying "Beware of the Leopard."

Palo Alto gets it

This Just In!

Let's Talk About the Future

"It’s just dumb that a 100mil+ people carry GPS device in their pockets and we have to buy expensive proprietary data to find out about the shape of where we live."

"It is equal parts sad-making and hate-making that we're all still stuck suffering the lack of a comprehensive and open dataset for places."

The former Flickr guys get it

This is a Pivotal Moment for Geo

Krissy Venosdale on Flickr : http://www.flickr.com/photos/venosdale/4538665373/

Sebastien Bertrand / Flickr : http://www.flickr.com/photos/tiseb/3148814484/

We Make Data

Artifactual Data

Matthew Fontaine Maury(January 14, 1806 – February 1, 1873)

American astronomer, historian, oceanographer, meteorologist, cartographer, author, geologist, and educator.

"Every ship that navigates the high seas may henceforth be regarded as a floating observatory, a temple of science"

22,000+ Personal Weather Stations

"Seeds [...] are one of the original information storage devices, it’s almost hard to understand why libraries haven’t always included seeds"

but open is winning

"When you and I interact, our ability to be together on Earth is predicated by all the stuff that people did for thousands of years. You and I didn't invent language. You and I didn't invent clothes, roads, agriculture. It's up to us to be not just the receivers of what was given to us, but the givers of whatever's going to come next."

- John Bunker

Fin

Tyler Belltyler@factual.com@twbell

Recommended