Capacity, standards and reuse: subject indexing and the British Library Alan Danskin
Collection Metadata
© The British Library Board 2016
Subject Indexing in the Digital Age , Nuremburg, 11th September, 2018
Capacity, standards and reuse: subject indexing and the British Library Alan Danskin
Collection Metadata
© The British Library Board 2016
Taxonomies in the Public Sector , Leeds 13th September, 2018
Abstract
§ The British Library faces a number of subject indexing challenges. How do we maintain the quality of subject indexing in response to a substantial increase in intake since the extension of legal deposit digital media in 2013? How do we extend subject coverage to parts of the collection that are not indexed? How do we ensure good subject indexing is applied by projects whose staff are on short term contracts and whose expertise is in the language or type of resource, rather than cataloguing and indexing.
§ § There is no magic bullet. We have increased the capacity of our workflows by developing our automated matching process to
enable reuse of existing records where possible. As a consequence, 70% of e-books receipted have been processed automatically without a reduction in standards. In the longer term we are reviewing our use of Library of Congress Subject Headings. The lengthy training period is a significant barrier to wider application. FAST is an attractive alternative, which shares a vocabulary with LCSH, but can be applied with much less training and, because of its faceted structure, is easier to map to other subject systems. We would like to contribute to a solution in which indexing can be re-used much more effectively across cultural, linguistic and technical barriers.”
3
Some Background… The National Library of the United Kingdom
St. Pancras, London • Collections • Curators • Conservation • BL Labs • Reading Rooms • Storage
Boston Spa • Document Supply • Collection Management
• Acquisitions • Cataloguing • Collection Metadata
• Technology • Finance • Human Resources • Reading Room • High Density Storage
British Library Act 1972 1973 Formed from existing institutions Legal Deposit Library
320 km / 200 miles
Some Background… Collection Metadata
§ 1950- Operated prior to the BL’s foundation - as ‘The British National Bibliography’ (BNB) Ltd
§ 1972 - The BL Act records our role as “national centre for… bibliographical & other information services”
§ 1997-2004 – Given responsibility for external service provision
§ 2015 - Collection Metadata Strategy published
Collection Metadata Strategy
Legacy Challenges Digital Challenges
ADDING CAPACITY RE-USING METADATA FOR E-BOOKS
7
Legal Deposit
§ Legal Requirement § UK print publications § c.100k monographs p.a. § British Library § National Library of Scotland § National Library of Wales § Bodleian Libraries § Cambridge University Library § Trinity College, Dublin
§ Dates back to 1662 § Revision 2003 § Extension to non-print § Enabling Act
Legal Deposit Extension to non-print media
§ Non-print Regulations § 6 April 2013
§ Scope § Electronic Publications § Websites § Metadata § Sound & Video excluded
§ Deposit § British Library § Transition from print by invitation
§ Access § LDL premises § Single user
E-books Ingest
§ 200 publishers § Ingram (aggregator) § Transformations
§ ONIX 2.1/ONIX 3 § E-pub § Dublin Core
§ 306,382 e-books to date
10
Legal Deposit E-books The Challenge
0
1000
2000
3000
4000
5000
6000
7000
1949
19
70
1973
19
78
1980
19
82
1984
19
87
1989
19
91
1993
19
95
1997
19
99
2001
20
03
2005
20
07
2009
20
11
2013
20
15
Ebooks by Year of Original Publication
No.of Items
0
2000
4000
6000
8000
10000
12000
Ebooks by Publication Location
No. of items
Legal Deposit E-books Challenge
Options § Reduce input § Increase capacity § Improve productivity
Constraints § Digital is preferred format § Rate of growth; financial constraints § Metadata quality
§ Parity with print § RDA description § DDC Classification § LCSH Subject Indexing § Name Authority Control (NACO)
12
Legal Deposit E-books Challenge
Options § Reduce input § Increase capacity § Improve productivity
Constraints § Digital is preferred format § Rate of growth; financial constraints § Metadata quality
13
Metadata Based Solution Improve productivity by automating workflow
Target existing capacity on exception handling Maintain quality standards
Techniques
§ Record sources include: § Publisher metadata § OCLC § CIP /BDS § Library of Congress § Nielsen
§ Re-use existing metadata
§ Upgrade existing records
§ Search & match
§ Automated quality assurance
New Processes eBook Records Batch Upgrade
1
2
3
4 6
5
New Processes eBook Records Batch Upgrade
ONIX to MARC Conversion 020 .. ‡a9781786750204‡qEPUB¶ 020 .. ‡a1786750201‡qEPUB¶ 037 .. ‡a9781786750204‡bIngram Content Group¶ 040 .. ‡aUk‡beng‡cUk‡erda¶ 100 1. ‡aDaly, Steven,‡eauthor.¶ 245 10 ‡aJohnny Depp :‡bA Retrospective /‡cSteven Daly.¶ 264 .1 ‡bPalazzo Editions LTD,‡c2016.¶ 300 .. ‡a1 online resource (300 pages).¶ 336 .. ‡atext‡2rdacontent¶ 337 .. ‡acomputer‡2rdamedia¶ 338 .. ‡aonline resource‡2rdacarrier¶ 852 .. ‡aBritish Library‡bHMNTS‡cDRT‡jELD.DS.106213¶ ons .7 ‡aPER‡b004010‡2bisacsh¶ ons .7 ‡aAPFB‡2bicssc¶ ons .. ‡tamber heard,alice in woderland,pirates of the
caribbean,fantastic beasts and where to find them‡2Subject Keywords¶
src .. ‡aIngrams¶
New Processes eBook Records Batch Upgrade
OCLC MATCH 020 .. ‡a9781786750204¶ 020 .. ‡a1786750201¶ 035 .. ‡a(OCoLC)1000940282¶ 040 .. ‡aNZHPC‡beng‡erda‡cNZHPC¶ 082 04 ‡a791.43028092‡223¶ 100 1. ‡aDaly, Steven,‡d1960-‡eauthor.¶ 245 10 ‡aJohnny Depp :‡ba retrospective /‡cSteven Daly.¶ 264 .1 ‡a[London] :‡bPalazzo,‡c2016.¶ 300 .. ‡a1 online resource :‡billustrations, portraits¶ 336 .. ‡atext‡btxt‡2rdacontent¶ 337 .. ‡acomputer‡bc‡2rdamedia¶ 338 .. ‡aonline resource‡bcr‡2rdacarrier¶ 520 .. ‡aJohnny Depp is one of the most enigmatic, alluring and gifted actors of his generation, whose charismatic screen presence has brought to life some of cinema's most enduring characters, from the tragi-comic Edward Scissorhands and the fantastical Willy Wonka to the murderous Sweeney Todd and the swashbuckling Captain Jack Sparrow. This retrospective provides illuminating commentary on each of Johnny Depp's movies, from his first ever role in A Nightmare on Elm Street in 1984 through to The Lone Ranger.¶ 538 .. ‡aRequires Adobe Digital editions 1.7.2 or higher.¶ 600 10 ‡aDepp, Johnny‡xCriticism and interpretation.¶ 650 .0 ‡aMotion picture actors and actresses‡zUnited States‡vBiography.¶ 655 .7 ‡aBiographies.‡2lcgft¶ 655 .7 ‡aElectronic books.‡2local¶
New Processes eBook Records Batch Upgrade
Quality
All records e-books
2017-18 82% 80%
2016-17 80% 72%
70% processed automatically
E-book batch Upgrade outcomes
E-book batch upgrade summary
Outcomes § 70% of e-book intake processed
automatically § Available at ingest § Limited discovery
§ Metadata standards identical to printed books
§ Metadata quality comparable with printed books
§ LCSH and DDC added to publisher records
Dependencies § Transformations § Tools § Expertise § Complex processes § CIP Programme § Data sources § Permissive licences
20
ADDING CAPACITY: CHOICE OF SUBJECT STANDARDS
21
Subject Indexing Gap
§ Uncatalogued backlogs
§ Legacy systems
§ Uncontrolled keyword indexing
§ Unindexed collections
§ Variant standards
22
Subject Indexing Gap
23
Benchmarking
85% ANNUAL PRODUCTION* 45 % INTEGRATED CATALOGUE** 8% FOUNDATION CATALOGUES
*Variation reflects un-upgraded e-book backlog to 2017 **16 million + records
A Brief History of subject Indexing & classification policy
Legacy Systems § Watt’s Elastic Classification § SRIS Classification § Local § PRECIS § COMPASS § Dewey Decimal Classification § LCSH § None § UKAT (UK Archival Thesaurus)
Current Systems § Dewey Decimal Classification (Ed. 23) § LCSH
§ (Library of Congress Subject Headings)
§ FAST § (Faceted Application of Subject Terminology)
24
Assigned in parallel with PRECIS § Dropped in 1988 § Reinstated from 1994
§ Vocabulary § Broad, Deep, American
§ Widely used § Copy/Derived Cataloguing § LC, OCLC, CIP, LDLSCP
§ Community of practice § SACO
Extensibility § Complex application § Intensive training § Lengthy supervision Changing requirements § Fixed term contracts § Short term funding § Changing discovery tools Cost benefit?
Why LCSH?
LCSH BUT
25
Strategic Fit
§ Living Knowledge Purposes § Everything Available Portfolio
§ Improve capability to FIND; USE; PUBLISH § Discovery programme to enhance discovery and fulfilment
§ Collection Metadata Strategy § Drive efficiencies in the creation, management and exploitation of
collection metadata § Improve the Library’s return on investment in its collection metadata
assets § Open up more of the Library’s collection metadata
26
FAST Faceted Application of Subject Terminology
§ OCLC Research Project (1998-) § Post co-ordinate - 8 Facets § Vocabulary from LCSH (1.7 million headings) § Thesaural structure. Simplified syntax § FAST headings generated from LCSH in WorldCat § Online tools § Schema: MARC XML; RDF; ISO MARC § Permissive (ODC-BY) License
27
FAST 2016 Pilot Project
Purpose § To seek efficiencies in the light of increased intake/reduced funding § To evaluate opportunities for extending coverage to resources currently
excluded § To improve retrieval and linking in an online world Scope § Western European publications – Experienced LCSH indexers § Backlog/Retrocon projects – Fixed term staff limited experience § Social sciences articles – Curators/Subject Specialists
28
FAST 2015/16 Pilot Projects
Outcomes § Quicker to learn § Less supervision required § Low barrier to application § Form/Genre implemented § Able to extend the scope of subject indexing § Facets better than strings in discovery system § Cheaper (free web tool; no subscriptions) § Continuity with LCSH
29
FAST Consultation 2016
Proposal 1 § The British Library proposes to
adopt FAST selectively to extend the scope of subject indexing of current and legacy content.
Response
30
0
10
20
30
40
50
60
70
Very Negative
Somewhat Negative
Neutral Somewhat Positive
Very Positive
Responses
FAST Consultation 2016
Proposal 2 § The British Library proposes
to implement FAST as a replacement for LCSH in all current cataloguing, subject to mitigation of the risks identified in the background paper; in particular, the question of sustainability.
Response
31
0
10
20
30
40
50
60
70
Very Negative
Somewhat Negative
Neutral Somewhat Positive
Very Positive
Responses
FAST Consultation 2016
Issues Sustainability § Service not a project Quality § Specificity § Context § Granularity
Actions and Outcomes § British Library to advocate transition
to service § British Library to address quality
concerns § Proposal 1 accepted § Proposal 2 dependent on resolution
of sustainability
32
FAST 2018
British Library Applying FAST selectively § Non print Legal Deposit
§ Official publications § Grey Literature
§ Backlog/Retrocon Projects § Asia/Africa Collections § WW1 French Posters § SSTC
§ Subject Analysis Training § FAST as exemplar
FAST Community § “FAST Five”
§ Brown, Columbia, Cornell, Harvard, Yale
§ OCLC transition to service § Production server (March 2019)
§ FAST Policy and Outreach Committee (Announcement September) § Oversight, promotion, policies, development
§ Meeting with OCLC, LC and SACO § Washington, D.C., 11th&12th October
33
Conclusion
Challenges § Declining funding § Volume of digital content § Discovery methods § Changing employment conditions § Traditional subject standards
threatened § Complexity of traditional standards
Responses § Collection Metadata Strategy § Extract maximum value from
existing metadata § Automated Processes § Continuity & Quality § Review metadata standards § Be prepared to make hard choices § Consultation
34
Links
§ Unlocking the value: the British Library’s Collection Metadata Strategy, 2015-2018. http://www.bl.uk/bibliographic/pdfs/british-library-collection-metadata-strategy-2015-2018.pdf
§ Collection Metadata Strategy Roadmap, 2015-2018 http://www.bl.uk/bibliographic/pdfs/british-library-collection-metadata-strategy-roadmap-2015-2018.pdf
§ Reimer, T. (2018). The once and future library: the role of the (national) library in supporting research. Insights, 31, 19. DOI: http://doi.org/10.1629/uksg.409
36
Links
§ OCLC Research. FAST (Faceted Application of Subject Terminology) § British Library. Consultation on Subject Indexing and Classification standards
applied by the British Library. February 2016. http://www.bl.uk/bibliographic/pdfs/british-library-consultation-fast-abridged-dewey.pdf
§ British Library. Response to the consultation on subject standards. 18/7/2016 http://www.bl.uk/bibliographic/pdfs/british-library-response-survey-subject-standards.pdf
§ Janet Ashton & Caroline Kent (2017) New Approaches to Subject Indexing at the British Library, Cataloging & Classification Quarterly, 55:7-8, 549-559, DOI: 10.1080/01639374.2017.1354345 https://doi.org/10.1080/01639374.2017.1354345
37