Upload
neal-woods
View
215
Download
0
Embed Size (px)
Citation preview
Can Bilateral Digitization Tear Down the Wall Between
Institutions and the Public?
Ben BrumfieldDigital Frontiers 2012
“You know Ben that it really stinks that I can't get access to the original. My grandfather Jeremiah wrote the diary so that I could read about his daily life happenings. My grandfather Edward used to own it and if he had known that I would be so interested in it I'm sure he would have kept it and given it to me instead of the university.”Alan Williams, 2009 email
Walls
• Professionally conserved• Publicly accessible • Catalogued• 1000 miles away• Reading room restrictions• “Permission-to-publish” agreements • Costly scanning fees
Penetrating the Walls
• Digitization• Collaboration
Shallow Digitization(Institutional Version)
• “Scan-and-dump” facsimiles– Limited metadata– No transcripts– Not crawlable
Shallow Digitization(Amateur Version)
• Full transcripts– No facsimiles– No provenance– No metadata on sources– Invisible editorial decisions
• Cut-and-paste replication– No attribution
Deep Digitization
• Institutional Challenges– Funding– Manpower
• Non-institutional Challenges– Standards– Access to sources
Crowdsourcing
• Who are the volunteers?• What can they do?
• OldWeather.org• Zenas Matthews• Harry Ransom Center Fragments
Accuracy
• Individual transcriptions are about 97% accurate
• Of 1000 transcribed logbook entries:– 3 will be lost because of transcription errors– 10 will be illegible– At least 3 will be errors in the logs
OldWeather Participation
• More than 1.6 million weather observations.
• 16,000 volunteers.• 1 million log pages transcribed.
• Mean contribution of 100 transcriptions per user.
OldWeather Participation
• More than 1.6 million weather observations.
• 16,000 volunteers.• 1 million log pages transcribed.
• Mean contribution of 100 transcriptions per user – but this statistic is worthless!
Power-law Distribution
• Most contributions are made by a core of well-informed enthusiasts.
• True regardless of project size.
• What are the implications?
One “Well-Informed Enthusiast”
• In 14 days,– Entire diary transcribed– 250 revisions to 43 pages– Two dozen footnotes
Crowdsourcing’s Virtuous Circle
• Volunteers• Deep digitization• Findability• More Volunteers!
One Volunteer’s Story
• Nat Wooding– Retired data analyst– 100 pages of Julia Brumfield’s diaries
transcribed and indexed in six months– No relation to diarist
One Volunteer’s Story
• Nat Wooding– Retired data analyst– 100 pages of Julia Brumfield’s diaries
transcribed and indexed in six months– No relation to diarist
– Great-uncle was diarist’s letter carrier, also named Nat Wooding
Non-institutional Digitization
The Invisible Archive
• Private collections• Family archivists (filing cabinets)
– or their heirs (boxes in the attic)• Non-notable subjects• Flickr
The Standards Problem
• “We can't overemphasize the potential futility of citing websites, any websites,but especially non-institutional websites.”
– Diggitt McLaughlin• (H-SHEAR 2011-04-27)
The Standards Problem
• “Needless to say, amateurs will continue to put out poorly edited versions of documents in print which we, as professionals, will continue to eschew using.”
– Christopher L. Miller • (H-OIEAHC list, 1996-05-07)
Solutions
• Collaboration
• Participation by professionals in amateur projects
• FreeREG/FreeCEN
Solutions
• Community
• Flickr• RootsTech
Solutions
• Software Platforms
• Suggested rigor• Graceful degradation
Thanks!
Ben Brumfield
[email protected]://fromthepage.com/
Slides and transcript to be posted athttp://manuscripttranscription.blogspot.com/