39
Managing Crowd sourced Cultural Heritage Datasets National Library of Wales Glen Robson – Head of Systems twitter: @glenrobson

OR2016 - Managing Crowd sourced Cultural Heritage Datasets

Embed Size (px)

Citation preview

Managing Crowd sourced Cultural Heritage Datasets

National Library of WalesGlen Robson – Head of Systems

twitter: @glenrobson

Plan

• Background to the National Library of Wales

• Crowd Sourcing projects

– Cymru – 1900 – Wales

– Cynefin

– Shipping Records

– WW1 Book of Remembrance

• Providing storage and access

Content

Content

Content

Content

Welsh Newspapers

http://www.cymru1900wales.org/

Cynefin

cynefin.archiveswales.org.uk

Data

• Fields:– Owner– Tennant– Use – arable/forest etc.– Size (acre, rood, perches)– Tithe Value (pounds, shilling, pence)– Geo-coordinates

• Storing in Fedora– ALTO– Open Annotations

• JSON-LD• RDF/XML

– Indexing in SOLR– Website in the summer

Shipping Registers

• 544 merchant vessels registered at the port of Aberystwyth

• 1856-1914

• Crew lists – name, position, birth date, reason for leaving, location

• Transcribed by volunteers

• https://www.llgc.org.uk/blog/?p=5716

Data Preservation

• Where do we store this data?

– Catalogue – MARC

– Fedora 3 Repository

• Excel files / RDF

• Data being enhanced

– Currently:

• Triple store (sesame) – preservation?

• https://github.com/LlGC-NLW/shippingrecords

– Fedora 4?

Enhancements

• Linking out– Places:- Birth and Ship arrival

• Volunteer using OpenRefine to group places• Will try and match with GeoNames

– Ships :-• Added to wikidata by NLW Wikipedian in Residence:

– https://tools.wmflabs.org/reasonator/?&q=23927955– https://tools.wmflabs.org/reasonator/?&q=24027483– Adding images, size, weight, creation, destruction, link to

newspapers

– Dutch Shipping to Newspaper linking: http://bit.ly/1Talish/

Research Potential

• By publishing these datasets as Linked Open Data it allows research that wasn’t possible when these items were physical or even when they were standalone digital objects.

• Physical:– Travel to Aberystwyth - x hours/days – Transcribe data in the reading room – x months/years– Process back home

• Standalone Digital Object– Transcribe data at home – x months/years– Process at home

• Linked Open Data Annotations– Process at home results in minutes

• Have to take transcriptions with trust

Mirador

http://projectmirador.org/

Simple Annotation Server

• https://github.com/glenrobson/SimpleAnnotationServer

• Stores IIIF Annotations as Linked Open Data

Annotation (Transcription)

http://walesatwar.org

Newspapers

Future Projects

Future Projects

Future Projects

Providing Access

• Volunteers want to see results

• Cynefin – funded project

• Shipping records – independent website

• Cymru1900Wales – dataset (CSV + Linked Data)

• Mirador and IIIF options:

– IIIF Search API

– IIIF Ranges – table of contents

– Datasets for download

Universal Viewer

Dataset Intersection

• Example of dataset intersection

• John Williams

• Born 1891

Can we do this at scale?

CynefinMaps

1838 to 1947

Newspapers1804 to 1919

Cymru 19141914 to 1918

General Digitisation

Shipping Records1856 to 1914

Crime and Punishment

Database1730 to 1830

Welsh Bibliography0 to 1970

Summary

• Different methods of crowd sourcing:– Excel– Outsourcing – Cynefin and wales1900– IIIF – Mirador & Simple Annotation Server– WikiData

• Ideally crowd sourcing platform directly connected to access solution (there will be corrections)

• Transcribing to linked data gives:– Connection to external data sources (geonames, wikipedia)– Connection to other resources (newspapers)– Allows researchers to query the data

• IIIF gives:– Easy to setup transcription platform– Work with other peoples content