26
Digitisation at the Wellcome Library: Lessons learned & shared. Historical Newspapers in the Digital Age, Bolzano October, 2014 Dave Thompson Digital Curator, Wellcome Library

Europeana Newspapers LFT Infoday Thompson

Embed Size (px)

Citation preview

Page 1: Europeana Newspapers LFT Infoday Thompson

Digitisation at the Wellcome Library:

Lessons learned & shared.

Historical Newspapers in the Digital Age, Bolzano

October, 2014

Dave Thompson

Digital Curator, Wellcome Library

Page 2: Europeana Newspapers LFT Infoday Thompson

The Wellcome Library

• Part of Wellcome Collection, astonishing public

venue in London developed by the Wellcome

Trust. Where people can learn more about

medicine through the ages & across cultures.

• More than 10,000 readers visit us each year,

including historians, academics, students, health

professionals & consumers, journalists, artists &

members of the general public.

Page 3: Europeana Newspapers LFT Infoday Thompson

Digitisation in the Wellcome Library

• Strategic approach, conscious planned decisions.

• Library transformation strategy, physical to digital.

• From ‘project’ to ‘production’.

• Digitisation as a sustainable end-to-end process.

Page 4: Europeana Newspapers LFT Infoday Thompson

Overview – four IT systems…

1. Workflow management system – ‘Goobi’ =

PRODUCTION.

2. Digital object repository – ‘Preservica’ =

STORAGE.

3. Front end - ‘the player’ = ACCESS.

4. Temporary & permanent storage for content =

70tb

Page 5: Europeana Newspapers LFT Infoday Thompson

Digitisation: Metadata import

MARC records are imported from Sierra into

Goobi as MARC XML.

Page 6: Europeana Newspapers LFT Infoday Thompson

Digitisation: Image upload

Digitised images (Internally or externally

digitised) are imported into Goobi &

normalised to JPEG2000.

Page 7: Europeana Newspapers LFT Infoday Thompson

Digitisation: Upload, ftp, harvesting

ftp’d content can be automatically imported

into Goobi & processed or IA content can be

automatically harvested.

Page 8: Europeana Newspapers LFT Infoday Thompson

Digitisation: METS/ALTO for access

Content is OCR’d & METS /ALTO files are

created in Goobi. Manual/automatic.

Page 9: Europeana Newspapers LFT Infoday Thompson

Digitisation: Repository ingest

Goobi initiates automated ingest of images &

metadata in Preservica.

Page 10: Europeana Newspapers LFT Infoday Thompson

Digitisation: Access

Player pulls images from

Preservica using metadata in the

METS/JSON file.

Page 11: Europeana Newspapers LFT Infoday Thompson

Or from a different perspective…

Goobi (METS/OCR)

Preservica

In-house

Institutions

Contractors

Harvesting

TIFF or JP2

TIFF or JP2

HD & ftp

TIFF or JP2

Normalises TIFF

to JP2

Manual

Automatic

Jpylyzer validates

JP2

Auto harvesting of

JP2 & DMD

Grey literature

PDF

Pro

ject M

an

ag

ers

/ In

ge

st O

ffic

er

Pro

ject M

an

ag

ers

Ingest Officer / Digital Curator

Snagging

Snagging

Page 12: Europeana Newspapers LFT Infoday Thompson

Lesson 1 - Digitisation as a social activity

1. Digitisation is not a technical problem; it’s a social

activity between creator & user.

2. Internally: Digitisation engages with all parts of the

organisation, & draws of many different skills.

3. Externally: Engaging with (Between…?) creators &

users, moving data into public realms, providing

access.

http://www.emmanueladegbola.com/networking-leads/

Page 13: Europeana Newspapers LFT Infoday Thompson

Projects & workflows

1. Standardised processes to deal with differences in

content & themes.

2. Use ‘projects’ & workflows to define activities &

automated steps to handle material from

transfer/acquisition to dissemination.

3. Projects & workflows allow us to manage our

processes & to report activity.

http://www.amross.sd/

Page 14: Europeana Newspapers LFT Infoday Thompson

Standardised formats

1. Digitisation process built around a small number of

formats.

2. Only accept – or create - TIFF or JPEG2000 image

format for digitisation. MPEG2 for video.

3. Share our JPEG2000 profile with creators & validate

images at point of processing.

4. Standardised metadata format(s) for discovery –

MARC - & retrieval – ALTO/JSON.

http://blog.absolutvision.com/en/jpeg2000-format/

Page 15: Europeana Newspapers LFT Infoday Thompson

Lesson 2 – It’s a strategic issue

1. Given the scale & complexity clear strategic direction

is essential.

2. Digitisation has to support an institutions users & their

information needs.

3. Digitisation has to be a strategic decision supporting

an institutions purpose.

4. Digitisation doesn’t change the mission of an

organisation.

Page 16: Europeana Newspapers LFT Infoday Thompson

Industrialisation of processes

1. Digitisation built around a small number of formats.

Workflows built around a small number of pre-defined

steps.

2. Common workflow activities mean less system

development, we can build our own processes.

3. Easier for humans to learn, less training, more

certainty/reliability.

4. Industrialisation supports processes that are

sustainable.

http://www.howtobeadad.com/2013/14723/unicorn-poop-how-i-fell-in-

love-with-the-daughter-i-never-had

Page 17: Europeana Newspapers LFT Infoday Thompson

Lesson 3 – sustainability or bust

1. Digitisation has to be a sustainable process.

2. Processes have to be scalable to ambition.

3. Design, re-design & review processes constantly &

integrate with existing services.

4. Digitisation as evolution, learn from what has been

done, apply & move forward.

http://planetivy.com/gaming/25273/natural-selection-2-gaming-evolution-in-action/

Page 18: Europeana Newspapers LFT Infoday Thompson

Automation is key

1. Automation is essential to scalability & efficiency.

2. Within digitisation some activities very susceptible to

automation. Automate them.

3. Automation standardises processes. Good for life

cycle management of data.

4. Automated processes maximise investment in

digitisation & support scalability.

http://www.technibble.com/automating-computer-business-for-

profit/

Page 19: Europeana Newspapers LFT Infoday Thompson

Automated harvesting of IA content

Content processed automatically, including

creation of METS & ALTO.

Goobi has a ‘repository’ of IA identifiers for

searching/harvesting.

Goobi harvests data from Internet Archive

website.

Content available in the player. Content stored in Preservica. DDS creates JSON for the player & pre-

caches some content.

Page 20: Europeana Newspapers LFT Infoday Thompson
Page 21: Europeana Newspapers LFT Infoday Thompson

Lesson 4: Nothing without imagination

1. The power of digitisation can only be revealed if we

can imagine the uses the data can be put to.

2. Digitisation is not an exercise in technology for its own

sake.

3. There is nothing that cannot be achieved, but it takes

more than kit, tools, computers, software.

4. Digitisation is about engaging with creators &

consumers, with the data & with the future.

Page 22: Europeana Newspapers LFT Infoday Thompson

Digitisation is not a separate activity

• Starts with alignment with the institutional mission.

• Builds on strategic vision.

• Digitisation as a strategic activity, planned &

supported.

• Integrate all institutional systems, bibliographic, IT

& human.

http://ocdindia.com/

Page 23: Europeana Newspapers LFT Infoday Thompson

Lesson 5 – The complete package

1. Digitisation is much more than sticking stuff under a

camera or on a scanner.

2. Digitisation has to be developed as a whole &

complete end-to-end process.

http://veritusgroup.com/how-to-create-a-dynamic-strategy-for-

every-single-donor-a-step-by-step-process/

Page 24: Europeana Newspapers LFT Infoday Thompson

So, lessons learned

• Digitisation is a social activity.

• Digitisation as a planned strategic activity.

• Digitisation has to be a sustainable & scalable

activity.

• Automation is key.

• Nothing without imagination.

• Digitisation has to be a complete package.

Page 25: Europeana Newspapers LFT Infoday Thompson

In the end we built something beautiful

Page 26: Europeana Newspapers LFT Infoday Thompson

Questions now, questions

later…?

Dave Thompson

Digital Curator

Wellcome Library

[email protected] @D_N_T