51
Merrilee Proffitt and Max Klein OCLC Research August 24 2012

Wikipedia and Libraries: Island Hopping the Data Archipelago

Embed Size (px)

Citation preview

Merrilee Proffitt and Max KleinOCLC ResearchAugust 24 2012

45 years old Almost 30K libraries contributing from

170 countries More than 271 M items 1200 employees 21 offices worldwide

Since 1978 46 people 3 locations (Dublin, San Mateo, Leiden)

Pure research not product R&D

not market research

Wikipedians still complain about the vector skin

Although content creation is fast

Internal policy progress is glacial, conservative

Consensus model over asynchronous and near-anonymous discussion

“The free bureaucracy, that anyone can legislate.” ~ San Francisco Wiknic 2012

Community orginated. 27,456 instances

2009 “Linkspam” accusations against OCLC. Cause links to Amazon and B&N on the WorldCat

page.

Original accuser was banned for being argumentative.

Crux: Should Wikipedia promote any organization? Open question in the community

Disambiguation Collation

Authority file matching

During creation used Wikipedia data

2013. Wikipedia will be promoted to “source” rather than reference.

English Wikipedia 4,000 instances

German Wikipeida 220,000 instances

Wikimedia Commons 45,000 instances

Added by hand Rules vary by

language

Load VIAF Data Check Deutsche Wikipedia Edit English Wikipedia

English Only, for now Targets 260,000 pages 1/16th of English Wikipedia

Still won’t be fully synched with Deutsche Wikipedia

https://github.com/notconfusing/VIAFbot Uses Pywikipediabot In community code review: running within the

next month

Transclusion & Sugarcoated HTML

Transclusion You can draw in text from other pages (typically

templates)

Can send parameters Templates can perform Simple logic operations

Simple text manipulation

Still Wikitext, not fully query-able

“The way you always thought Wikipedia worked.”~Merrilee Proffitt

Phase 1 Revamping interlanguage links

Phase 2 Data, Templates and Infoboxes

Phase 3 Semantic querying

Now: Added by hand or bot

Soon: Wikidata concept page

Soon: Properties for a concept

Soon: This won’t be a monumental effort.

The end of the assumption that Wikipagesstore Wikitext.

On Wikidata they store JSON.

All the work VIAFbot is doing, will be accessible across 270 Wikis.

Plus language specific lookup…

RDF Data

Backers: Google, Paul Allen Institute for Artificial Intelligence, Gordon and Betty Moore Foundation.

Release Date: January 2013 Caveat: Requires adoption by each individual

language wiki – by consensus. Wikipedias having found consensus so far: …

Hungarian Wikipedia

Bibliographic data is both: An element of citation

An articles in its own right

• 411,274 citations of books

• 244, 236 citations of journals

• 57,868 citations of encyclopedias

• 342,470 of newspapers

• 1,055,845 total print citations

• 1,169,495 citations of webhttp://en.wikipedia.org/wiki/User:Maximilianklein/Citations

Wikipedia features bidirectional linking. Take links forward all the time, why not backwards?

Could add “what cites this”

What cites this

A Wikipedia article could be a good way of declaring the aboutness of a record.

~Asaf Bartov (User:Ijon)

links to

Could add “what’s about this”

What’s about this

What’s about this

Dream Take your browser history

Would still have to create bidirectional links between WorldCat and Wikipeida

There is the practical solution.

VIAFbot is the prototype of the link reciprocation solution

Have to gain Wikipedia approval to reciprocate links with a bot Subject to community approval

Requires maintenance Can become unsynchronized

Seaplanes Imitated bidirectional

Islands Wikipedia, VIAF, WorldCat

Data Archipelago

Max Klein and Merrilee Proffitt@notconfusing and@merrileeiam