Upload
liber-europe
View
105
Download
1
Tags:
Embed Size (px)
DESCRIPTION
tPresentation from WLIC2013. Reports on a survey conducted by the Europeana Newspaper project of digitised newspaper collections in LIBER (European research) libraries.
Citation preview
The challenge of making
digitised European newspaper
content available online
Susan Reilly, LIBER
Twitter: @skreilly
IFLA Newspapers, Singapore, Aug 2013
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 2
Overview
Europeana Newspapers: making European newspapers available online
Assessing the state of our digitised newspapers collections
Where do we go from here
Europeana Newspapers: making European newspapers available online
• Content from 20 countries! (13+7 new countries)
• Aggregation of more than 18 million newspapers into Europeana
• Make newspapers more accessible by applying refinement methods for OCR, OLR (article segmentation), and named entity (NER) and class recognition
• Increase visibility via dedicated content browser
• Ensure sustainability by spreading best practice
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp
Asessing the state of Europe digitised newspaper collections
• Who’s digitising newspapers?
• What percentage of newspaper collections are digitised?
• How many pages?
• Quality of digitisation?
• How are images made available?
Findings:% of newspaper collections digitised
• Survey of LIBER member (400 European research libraries)
• 47 responses• Does this indicate number of institutions digitising
newspaper?
• Less than 10% of respondents’ collections digitised• Compared to average of 20% for % of total collection digitised
(Enumerate)
• 130 million pages and 24,000 titles• Not all libraries could provide exact figures because of
cursory nature of catalogue
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp
Findings: 20th century content an issue
• Conservative approach to copyright terms
• ½ of respondents reported a cut-off date beyond which they do not make content available
• Early as 1863• Latest last 70 years
• Special arrangements with publishers (23%)
• Collective rights agreements too complex
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp
Findings: How accessible are the collections?
• 85% provide free access• Sometimes only at national level
• Some subscription fees/under licence
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp
Findings: How rich is the content?
• 36% employ no OCR
• 50% of those who did not confident enough in results to expose OCR’d text via search interface
• 36% zoning and segmentation
• Only 6% named entity recognition
• Huge variance in metadata• Dublin Core only• Own standards
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp
Challenges
• Newspaper digitisation is behind
• Copyright issues more complex
• Lack of quality evaluation technologies for OCR
• Lack of standardised metadata suited to newspapers
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp
Solutions
• Standardised metadata mapped to EDM
• Quality evaluation technologies for OCR
• Clarity over rights issues
• Dialogue with publishers
• More funding for digitisation• Increase visibility
Thank you for your attention!
http://www.libereurope.eu
http://www.europeana-newspapers.eu/