Upload
tyler-bell
View
395
Download
0
Embed Size (px)
DESCRIPTION
Commentary on the brief history , current status, and future of open geo data
Citation preview
The Realities of Open Geo Data
Tyler Bell: @twbell
1989 - GPS
http://en.wikipedia.org/wiki/File:GPS_Satellite_NASA_art-iif.jpg
May 2000 – “Selective Availability” of GPS
Ends
August 2004 – More People Use Online Maps Than Use
August 2004 – OpenStreetMap Conceived
June 2005 – Open Source And Maps Start To Come
Together
June 2006 – OpenStreetMap Maps The Isle Of Wight
December 2006 – Neogeography
July 2007 – The First OpenStreetMap Conference
July 2010 – MapQuest Open Uses OpenStreetMap Data
Frictions
Price Friction
2008 – Google Drives Down Costs
October 2009 – Farewell TeleAtlas (In The US)
License Friction
"Substantial" = "less than (sic) 100 features"
Format & Accessibility Friction
4x first-time users (woot)
JSON FTW
1. Include as much detail and resolution as possible. County-level data is better than state-level data which is better than national-level data. Bonus points if you break it down further by age, sex, etc. I understand that this can't always be done for privacy reasons, but it is immensely useful when it is possible.
2. Use a flexible file format. My preference is .csv, because it can be read by almost any program. I'll tolerate .xls, but I'm not pleased with .xlsx (not everyone uses Excel!). And please, please, please do not use pdf.
3. If you do use a spreadsheet format, do not use multiple sheets, nested headers, merged cells, strategic cell borders, etc. Make it as plain as possible. Don't worry that you'll end up with too many files if you don't use sheets. Release them in a zipped folder instead.
4. Use short variable names with no whitespace. Underscores are usually a safe bet, so instead of "Number of new tuberculosis cases" use "incident_tb". If you have a corresponding column, e.g. confidence intervals in the screenshot shown above, make the variable name relevant. Use "UCI_incident_tb" instead of relying on the column's proximity to "incident_tb" to indicate a pairing. Include a README that explains the variable names if you're worried they aren't descriptive enough.
5. Actually, include a README no matter what. It can include variable names, units of measurement, notes on data collection/reporting/suppression, or anything else that is relevant.
6. Tell me whom to cite! I'm so pleased to be able to use your data, and I'd love to give you the credit you deserve. Put your citation on your website, in your README, and everywhere else I might look for it so that I can use it appropriately. Or post it to figshare where it will automatically be assigned a doi. It's an easy way to make you data citeable, shareable, and version controlled.
Bookmark these guidelines. Next time you reach for the 'export to PDF' button, or begin to use the change-cell-
border feature on Excel, pull this out and remind yourself, 'this is not machine-readable. Nobody will use my data if I
release it like this.' Then rejoice that you are awesome for sharing your data, and for doing so in a way that is actually
useful. And for that, I thank you.
Caitlin Rivers:
"Send me your data; PDF is fine" - no one, ever
// Function: guidGenerator
// Description:returns a pseudo-random GUID
//This is appended to a url for 2 reasons
//1. to make the URL unique, so that the browser
always gets it and doesn't use a cached version
//2. to make a URL look like its got a unique key, in a
naive attempt to fool a not-so-wily hacker
//into thinking they can't download a datapack
directly if they know the URL pattern, because they
//need a unique key.
Prosser: But the plans were on display.
Arthur Dent: On display? I eventually had to go down to the
cellar.
Prosser: That's the display department.
Arthur Dent: With a torch.
Prosser: The lights had probably gone.
Arthur Dent: So had the stairs.
Prosser: But you did see the notice, didn't you?
Arthur Dent: Oh, yes. It was on display in the bottom of a
locked filing cabinet stuck in a disused lavatory with a sign
outside the door saying "Beware of the Leopard."
Palo Alto gets it
This Just In!
Let's Talk About the Future
"It’s just dumb that a 100mil+ people carry GPS device in their pockets and we have to buy expensive proprietary data to find out about the shape of where we live."
"It is equal parts sad-making and hate-making that we're all still stuck suffering the lack of a comprehensive and open dataset for places."
The former Flickr guys get it
This is a Pivotal Moment for Geo
Krissy Venosdale on Flickr : http://www.flickr.com/photos/venosdale/4538665373/
Sebastien Bertrand / Flickr : http://www.flickr.com/photos/tiseb/3148814484/
We Make Data
Artifactual Data
Matthew Fontaine Maury(January 14, 1806 – February 1, 1873)
American astronomer, historian, oceanographer, meteorologist, cartographer, author, geologist, and educator.
"Every ship that navigates the high seas may henceforth be regarded as a floating observatory, a temple of science"
22,000+ Personal Weather Stations
Read This
"Seeds [...] are one of the original information storage devices, it’s almost hard to understand why libraries haven’t always included seeds"
but open is winning
"When you and I interact, our ability to be together on Earth is predicated by all the stuff that people did for thousands of years. You and I didn't invent language. You and I didn't invent clothes, roads, agriculture. It's up to us to be not just the receivers of what was given to us, but the givers of whatever's going to come next."
- John Bunker
Fin
Tyler [email protected]@twbell