30
Surveying Newspaper Digitisation in European Libraries, Then Aggregating Them ! Europeana Newspapers Alastair Dunning Programme Manager, The European Library @alastairdunning, alastair.dunning AT kb.nl LIBER Conference, June 2013, Munich This presentation is at http://www.slideshare.net/alastairdunning

Europeana Newspapers: Surveying Newspaper Digitisation in European Libraries, then Aggregating Them

Embed Size (px)

Citation preview

Page 1: Europeana Newspapers: Surveying Newspaper Digitisation in European Libraries, then Aggregating Them

Surveying Newspaper Digitisation in European Libraries, Then Aggregating Them !

Europeana NewspapersAlastair Dunning

Programme Manager, The European Library@alastairdunning, alastair.dunning AT kb.nl

LIBER Conference, June 2013, Munich

This presentation is at http://www.slideshare.net/alastairdunning

Page 2: Europeana Newspapers: Surveying Newspaper Digitisation in European Libraries, then Aggregating Them

On November 3, 1948, the early edition of the Chicago Tribune proclaimed Thomas Dewey as winner of the US presidential campaign

http://www.chicagotribune.com/news/politics/chi-histdewey_defeats_an20080104104816,0,547284.photo

Page 3: Europeana Newspapers: Surveying Newspaper Digitisation in European Libraries, then Aggregating Them

In actual fact, the campaign was won by Harry Truman, who became the 33rd President of the United States

http://en.wikipedia.org/wiki/File:Deweytruman12.jpg

Page 4: Europeana Newspapers: Surveying Newspaper Digitisation in European Libraries, then Aggregating Them

Later editions of the Chicago Tribune corrected this mistake with headline "DEMOCRATS MAKE SWEEP OF STATE OFFICES"

However, I cannot find these online !

http://en.wikipedia.org/wiki/File:Deweytruman12.jpg

Page 5: Europeana Newspapers: Surveying Newspaper Digitisation in European Libraries, then Aggregating Them

As we shall see, presenting comprehensive digital archives, where everything is digitised, is difficult... yet this is what users often demand !

Page 6: Europeana Newspapers: Surveying Newspaper Digitisation in European Libraries, then Aggregating Them

"This lack of collocation and collection presents efficiency challenges and deepens scholars’ concerns about comprehensiveness. The anxiety over “missing something” was quite common across interviews."

Ithaka S+R, Supporting the Changing Research Practices of Historians,

http://www.sr.ithaka.org/research-publications/supporting-changing-research-practices-historians

Page 7: Europeana Newspapers: Surveying Newspaper Digitisation in European Libraries, then Aggregating Them

"When lined up against the non-digital object upon which it is based, the digital object can only ever appear impoverished."

Jim Mussell, Historian at University of Birminghamhttp://jimmussell.com/2013/05/23/the-proximal-past-digital-archives-and-the-here-and-now/

Page 8: Europeana Newspapers: Surveying Newspaper Digitisation in European Libraries, then Aggregating Them

Genealogists - those studying family history

"Genealogists represent the majority of users in many archives. And yet, the traditional archival information system does not meet their needs."

Wendy M. Duff, Catherine A. Johnson, Where Is the List with All the Names? Information-Seeking Behavior of Genealogists, American Archivist, Volume 66(1), 2003, http://archivists.metapress.com/content/L375UJ047224737N

Page 9: Europeana Newspapers: Surveying Newspaper Digitisation in European Libraries, then Aggregating Them

Despite this, European libraries have made great strides in digitising their newspapers

(These results taken from first Europeana Newspapers survey, 2012. 47 libraries responded.)http://www.europeana-newspapers.eu/wp-content/uploads/2012/04/D4.1-Europeana-newspapers-survey-report.pdf

Page 10: Europeana Newspapers: Surveying Newspaper Digitisation in European Libraries, then Aggregating Them

129, 041, 663 pages

from

23,987 titles

Page 11: Europeana Newspapers: Surveying Newspaper Digitisation in European Libraries, then Aggregating Them
Page 12: Europeana Newspapers: Surveying Newspaper Digitisation in European Libraries, then Aggregating Them

11 libraries have digitised more than 3m pages

1. National Library of Czech Republic

2. Koninklijke Bibliotheek van België

3. National Library of Spain

4. National Library of Norway

5. National and Univeristy Library of Iceland

6. BCU Lausanne

7. Hamburg State and University Library

8. Bibliothèque nationale de France

9. British Library

10. Koninklijke Bibliotheek

11. Austrian National Library

Page 13: Europeana Newspapers: Surveying Newspaper Digitisation in European Libraries, then Aggregating Them

But, only 12 (26%) of the

libraries had digitised more than 10%

of their collection

(either in terms of titles or page numbers)

Page 14: Europeana Newspapers: Surveying Newspaper Digitisation in European Libraries, then Aggregating Them

National Library of Luxembourg

620.000

pages digitised

4.000.000

pages in collection

Page 15: Europeana Newspapers: Surveying Newspaper Digitisation in European Libraries, then Aggregating Them

National Library of Finland

620.000

pages digitised

2.010.246 pages in collection

Page 16: Europeana Newspapers: Surveying Newspaper Digitisation in European Libraries, then Aggregating Them

Hamburg State and University Library

c. 2.000.000 pages digitised

c. 12.000.000 pages

in collection

Page 17: Europeana Newspapers: Surveying Newspaper Digitisation in European Libraries, then Aggregating Them

What else did the survey discover ?

Page 18: Europeana Newspapers: Surveying Newspaper Digitisation in European Libraries, then Aggregating Them

Access to digitised newspapers is nearly always

free of charge. At least 40 (85%) offered free access to their digitised newspapers.

One library had pay per view, whilst another three offered subscription services for users (ie paid access per day or per month).

Only four libraries licensed their newspaper contents to other groups (e.g. school, universities).

Page 19: Europeana Newspapers: Surveying Newspaper Digitisation in European Libraries, then Aggregating Them

Access to twentieth-century content remains problematic.

27 out of 47 libraries (57%) have a cut off date

beyond which they will not publish digitised newspapers on the web. Most frequently, this is based on a 70 year sliding scale.

23% (11 out of 47) had an agreement with a rights

organisation so that in-copyright digitised newspapers could be published, but often restricted to individual titles

Page 20: Europeana Newspapers: Surveying Newspaper Digitisation in European Libraries, then Aggregating Them

There is still much to be done to exploit the richness of digitised newspaper content

64% (37 from 47) of libraries made use of OCR

But only 17 of these libraries (36%) exposed the resulting

full text to the viewer

36% had undertaken zoning and segmentation and only six

libraries (13%) had included features such as facetted

browsing or extracting entities such as place or name

Page 21: Europeana Newspapers: Surveying Newspaper Digitisation in European Libraries, then Aggregating Them

--> Motivation for Europeana Newspapers

Others WPs will explain process of improving digitised archives but I want to return to one earlier quote

Page 22: Europeana Newspapers: Surveying Newspaper Digitisation in European Libraries, then Aggregating Them

"... the lack of comprehensive search tools for primary sources ..."

Locating primary sources presents a crucial challenge for reserachers.

--> TEL aggregator as part of Europeana Newspapers project

Page 23: Europeana Newspapers: Surveying Newspaper Digitisation in European Libraries, then Aggregating Them

Timetable: Early version with limited content added to The European Library website in September 20

More content being added in 2013 and 2014

Page 24: Europeana Newspapers: Surveying Newspaper Digitisation in European Libraries, then Aggregating Them

http://theeuropeanlibrary.org will deliver a search interface to help

locate 18m pages digitised

at European libraires

Users will also be able to search over titles of newspapers. Title metadata will also be forwarded to Europeana

Page 25: Europeana Newspapers: Surveying Newspaper Digitisation in European Libraries, then Aggregating Them

Some Issues:

Copyright means that some images cannot be shared at all, only metadata (e.g. names and dates of newspapers)

Page 26: Europeana Newspapers: Surveying Newspaper Digitisation in European Libraries, then Aggregating Them

Some Issues:

OCR and zoning quality will affect search results significantly. Eg Higher quality OCR will be returned more often in search results

Page 27: Europeana Newspapers: Surveying Newspaper Digitisation in European Libraries, then Aggregating Them

Some Issues:

Some pages have no OCR whatsoever - more difficult to find

Page 28: Europeana Newspapers: Surveying Newspaper Digitisation in European Libraries, then Aggregating Them

Some Issues:

Different libraries are willing to share different amounts of content

Some libraries happy for full content to be shared; for others it is just snippets of images

Page 29: Europeana Newspapers: Surveying Newspaper Digitisation in European Libraries, then Aggregating Them

Last Thoughts and What Next ?:

The European Library will sustain access beyond project funding; but adding more content will require membership of TEL

How can we allow for transcription?

What do non-academic users want?

How do we create full-text APIs ?

Page 30: Europeana Newspapers: Surveying Newspaper Digitisation in European Libraries, then Aggregating Them

Oh, the results here were all based on the first edition of the project survey.

If your library want to contribute to later editions, see links by July 2013

http://www.europeana-newspapers.eu/tell-us-about-your-newspaper-digitisation-project/

http://www.surveymonkey.com/s/BQ28579