Upload
jon-voss
View
5.151
Download
0
Embed Size (px)
Citation preview
Exploring the Use of Linked Data to Bridge State and Federal Archives
Jon Voss, LookBackMapsMARA Guest Lecture
San Jose State UniversityJune 15, 2010
Overview
1. Quick intro, logistics2. Evolution and context of the Civil War
Data 150 Project: A Very Exciting Time3. Overview of Civil War Data 150 Project
Halftime Q&A
– Some Technical Details on the Methodology, Tools
– Placing CWD150 in the Big Picture– Dig Deeper Links
Final Q&A
http://www.loc.gov/pictures/item/cwp2003000505/PP
Give me feedback...
email:[email protected]
www.twitter.com/LookBackMaps
Comments welcome, just use @LookBackMaps onTwitter or email me.
• www.lookbackmaps.net • From perspective of presenting data, not
organizing it--coming from the Web community • Started in 2008 as a Google MyMaps mashup • Based on the simple idea of creating community
around local history.• Created to solve the problem of disparate archives
with no geotags through community and crowdsourcing
• Finding ways to access, display, and improve upon data
screenshots from LookBackMaps iPhone app,overlay photos from The Bancroft Library
screenshots from LookBackMaps iPhone app,overlay photos from California Historical Society
2008 Marks a major shift for public archives
• The Library of Congress and Flickr collaboration spurs the Flickr Commons, and blows open the Web 2.0 door at archives and institutions worldwide.
2008 Marks a major shift for public archives
• The Library of Congress and Flickr collaboration spurs the Flickr Commons, and blows open the Web 2.0 door at archives and institutions worldwide.
• Multiple open source collections management and web publishing platforms begin to take hold and lower the barrier to entry for Web 2.0 presentation, collaboration, plugins and extensions
Some stats from the LOC summary report a year after launch speaks to the success. As of 10/23/08:• 10.4 million views of LOC photos on Flickr• 79% of the 4,615 photos have been made a "favorite"• 67,176 tags were added by 2,518 unique Flickr
accounts• Less than 25 instances of user-generated comments
were removed as inappropriate.• More than 500 records have been enhanced with new
information provided by the Flickr community.
LOC/Flickr Commons
Public Archives in the Web 2.0 Environment
While the majority of archives and libraries remain in a Web 1.0 environment, users have Web 2.0 expectations. Institutions and users are meeting in the middle to build community around holdings.• Search/Share: Archives want to get their holdings out
to a wide-reaching public, Users want to search across institutions to discover based on interest, locality, etc.
• Comment/Community: the ability to discuss and engage, create community
• Contribute/Improve: Tag, geotag, crowdsource• Compare: Then and now. community identity often
tied to history
Stage is set for collaboration and innovation
Mashups, collaborations, shared datasets, open source, open data, and open tools
Bing Maps Streetside Photos (tech preview) http://www.bing.com/maps/explore/#/9gk357c6yqx3jost
The more shared data we have, the more we can do with it!
By end of 2009, group of archivists and technologists start exploring collaborative efforts utilizing Linked Data to connect isolated archives and datasets in order to: • join data in a robust, scalable, community-maintained
database• increase discovery of and traffic to the archives while
adding value to the data through crowdsourcing• make the data searchable and available to other web
applications via API and semantic web queries
Archives Metadata Mapping Project
Two important outcomes:
1. The potential of using Linked Data now by using Freebase as a Linked Data publishing platform.
2. The importance of use cases.
There, I said it. Linked Data.
Providing ways to start linking to DATA, no longer just DOCUMENTS. It entails using tools and standards to make information (like metadata, MARC records, etc) searchable and machine readable.
image: Harry Halpin. http://www.ibiblio.org/hhalpin/homepage/presentations/socialnet/
The Civil War Data 150 Project
Born out of conversations with AMMP participant, Archives of Michigan.
Key ingredients for a strong use case:• Specific subject
matter• Diverse data in a wide
array of institutions• A passionate user
group• A significant
anniversary
http://www.flickr.com/photos/usnationalarchives/4166330219/
Three Primary Goals of CWD150: 1. Identify sources and
map metadata into Freebase.
• Create web apps to enable users to add to or modify shared metadata with strong identifiers.
• Engage the public in the process of interacting with and adding value to the data.
http://www.flickr.com/photos/usnationalarchives/3996142724/
Pause for Q&A
Some Technical Details on the Methodology, Tools
You can follow along and contribute to the project on the Freebase Wiki: http://wiki.freebase.com/wiki/CWD150
Some Technical Details on the Methodology, Tools
1. Identifying primary data sets and ways at getting at the data Link to Google Spreadsheet on sources. Web crawling, screen scraping, XML dumps, CSV files, etc.
2. Creating Web Apps• Once we have metadata mapped in Freebase, we can
create RABJ queues. See a simple example: Genderizer.• Then apply this to data that needs work, like regiments, or a
photo queue. • Work with Civil War historians and others to add to specific
schema.
Some Technical Details on the Methodology, Tools
3. Engaging the Public, User Interface Development• Messaging and powerful images• An easy interface with game elements and rewards• A plea for assistance and opportunity to genuinely make
records more useful. • Holy Grail: Civil War Soldier Survival App based on city of
enlistment
The Big Picture
http://www.flickr.com/photos/37377809@N00/4701512132/
The Big Picture
• CWD150 is a strong use case and an example for what can become possible in the wider web and developer community if libraries, archives and museums publish their metadata utilizing Linked Data standards and open licenses.
• Our experience is showing us that the technological barriers are not as significant as the institutional barriers around adoption and openness. But the Flickr Commons Shift has changed that.
• With CWD150, we are side-stepping the Big Next Step of enabling institutions to publish their own metadata as Linked Data, and make meaningful connections. This is on the near horizon.
The Big PictureLibraries, Archives and Museums will be critical to the adoption of Linked Data • The vast information stored in disparate, isolated
databases held by the worlds public institutions.• The expertise held by these institutions in the
organization of systems and vocabularies to make sense of this information.
• You can be on the front lines of this movement.http://www.loc.gov/pictures/item/cwp2003000216/PP
Dig Deeper!LibrariesOCLC Research Linked Data parts 1 and 2 webinarEMTACL10 April 2010. Gillian Byrne & Lisa Goddard: video | slidesJISC Linked Data Horizon ScanEd Summers is doing Linked Data work with LOC: Twitter | Blog ArchivesMark Matienzo Linking as Repurposing MetadataTim Wragge's Flickr Machine tag Challenge
ToolsBuild your own NYT Linked Data ApplicationBuild apps on FreebaseClean vast amounts of data with Gridworks
Tim Berners-LeeTED Feb 2009TED Feb 2010Gov 2.0 Expo May 2010
http://www.flickr.com/photos/library_of_congress/3252917783/
What will you do with that data?
Q&A
Give me feedback...
email:[email protected]
www.twitter.com/LookBackMaps
Comments welcome, just use @ or #LookBackMaps onTwitter or email me.