Upload
the-european-library
View
208
Download
1
Embed Size (px)
Citation preview
Surveying Newspaper Digitisation in European Libraries, Then Aggregating Them !
Europeana NewspapersAlastair Dunning
Programme Manager, The European Library@alastairdunning, alastair.dunning AT kb.nl
LIBER Conference, June 2013, Munich
This presentation is at http://www.slideshare.net/alastairdunning
On November 3, 1948, the early edition of the Chicago Tribune proclaimed Thomas Dewey as winner of the US presidential campaign
http://www.chicagotribune.com/news/politics/chi-histdewey_defeats_an20080104104816,0,547284.photo
In actual fact, the campaign was won by Harry Truman, who became the 33rd President of the United States
http://en.wikipedia.org/wiki/File:Deweytruman12.jpg
Later editions of the Chicago Tribune corrected this mistake with headline "DEMOCRATS MAKE SWEEP OF STATE OFFICES"
However, I cannot find these online !
http://en.wikipedia.org/wiki/File:Deweytruman12.jpg
As we shall see, presenting comprehensive digital archives, where everything is digitised, is difficult... yet this is what users often demand !
"This lack of collocation and collection presents efficiency challenges and deepens scholars’ concerns about comprehensiveness. The anxiety over “missing something” was quite common across interviews."
Ithaka S+R, Supporting the Changing Research Practices of Historians,
http://www.sr.ithaka.org/research-publications/supporting-changing-research-practices-historians
"When lined up against the non-digital object upon which it is based, the digital object can only ever appear impoverished."
Jim Mussell, Historian at University of Birminghamhttp://jimmussell.com/2013/05/23/the-proximal-past-digital-archives-and-the-here-and-now/
Genealogists - those studying family history
"Genealogists represent the majority of users in many archives. And yet, the traditional archival information system does not meet their needs."
Wendy M. Duff, Catherine A. Johnson, Where Is the List with All the Names? Information-Seeking Behavior of Genealogists, American Archivist, Volume 66(1), 2003, http://archivists.metapress.com/content/L375UJ047224737N
Despite this, European libraries have made great strides in digitising their newspapers
(These results taken from first Europeana Newspapers survey, 2012. 47 libraries responded.)http://www.europeana-newspapers.eu/wp-content/uploads/2012/04/D4.1-Europeana-newspapers-survey-report.pdf
129, 041, 663 pages
from
23,987 titles
11 libraries have digitised more than 3m pages
1. National Library of Czech Republic
2. Koninklijke Bibliotheek van België
3. National Library of Spain
4. National Library of Norway
5. National and Univeristy Library of Iceland
6. BCU Lausanne
7. Hamburg State and University Library
8. Bibliothèque nationale de France
9. British Library
10. Koninklijke Bibliotheek
11. Austrian National Library
But, only 12 (26%) of the
libraries had digitised more than 10%
of their collection
(either in terms of titles or page numbers)
National Library of Luxembourg
620.000
pages digitised
4.000.000
pages in collection
National Library of Finland
620.000
pages digitised
2.010.246 pages in collection
Hamburg State and University Library
c. 2.000.000 pages digitised
c. 12.000.000 pages
in collection
What else did the survey discover ?
Access to digitised newspapers is nearly always
free of charge. At least 40 (85%) offered free access to their digitised newspapers.
One library had pay per view, whilst another three offered subscription services for users (ie paid access per day or per month).
Only four libraries licensed their newspaper contents to other groups (e.g. school, universities).
Access to twentieth-century content remains problematic.
27 out of 47 libraries (57%) have a cut off date
beyond which they will not publish digitised newspapers on the web. Most frequently, this is based on a 70 year sliding scale.
23% (11 out of 47) had an agreement with a rights
organisation so that in-copyright digitised newspapers could be published, but often restricted to individual titles
There is still much to be done to exploit the richness of digitised newspaper content
64% (37 from 47) of libraries made use of OCR
But only 17 of these libraries (36%) exposed the resulting
full text to the viewer
36% had undertaken zoning and segmentation and only six
libraries (13%) had included features such as facetted
browsing or extracting entities such as place or name
--> Motivation for Europeana Newspapers
Others WPs will explain process of improving digitised archives but I want to return to one earlier quote
"... the lack of comprehensive search tools for primary sources ..."
Locating primary sources presents a crucial challenge for reserachers.
--> TEL aggregator as part of Europeana Newspapers project
Timetable: Early version with limited content added to The European Library website in September 20
More content being added in 2013 and 2014
http://theeuropeanlibrary.org will deliver a search interface to help
locate 18m pages digitised
at European libraires
Users will also be able to search over titles of newspapers. Title metadata will also be forwarded to Europeana
Some Issues:
Copyright means that some images cannot be shared at all, only metadata (e.g. names and dates of newspapers)
Some Issues:
OCR and zoning quality will affect search results significantly. Eg Higher quality OCR will be returned more often in search results
Some Issues:
Some pages have no OCR whatsoever - more difficult to find
Some Issues:
Different libraries are willing to share different amounts of content
Some libraries happy for full content to be shared; for others it is just snippets of images
Last Thoughts and What Next ?:
The European Library will sustain access beyond project funding; but adding more content will require membership of TEL
How can we allow for transcription?
What do non-academic users want?
How do we create full-text APIs ?
Oh, the results here were all based on the first edition of the project survey.
If your library want to contribute to later editions, see links by July 2013
http://www.europeana-newspapers.eu/tell-us-about-your-newspaper-digitisation-project/
http://www.surveymonkey.com/s/BQ28579