Catalog enrichment: importing Dewey Decimal Classification from external sources (slides)

Embed Size (px)

Citation preview

Stefano Bargioni
Pontificia Universit della Santa Croce

Catalogue enrichment: importing
Dewey Decimal Classification
from external sources

The project

Improving the Dewey search pathwith a minimal effort

while adding BNCF compliant subject headings to our catalog

Koha 3 open source ILS

Can be applied to other ILS's

Version 1: The Batch Mode

Add Dewey notations to the catalogautomatically

from selected sources

ensure quality and uniformity

An atomic copy cataloguing

copy cataloguing is usually related to the full record

we only need to copy field 082 (MARC21) or 676 (Unimarc)

ISBN unique identifier

the policy issue

Records to be modified

without Dewey notation

with ISBN

limit: 008 languageSELECT biblionumber, ISBN
FROM biblio
WHERE ISBN_present
AND dewey_absent
AND language_008='...'

In Koha, the WHERE clause is based on MySQL function ExtractValue, that works on field biblio.marcxml through XPath expressions

Dewey Sources (I)

a choice based on copy cataloguing experience

OCLC Classify

some National Libraries

API, Z39.50 or HTML access

Dewey Sources (II): OCLC Classify

Classify is a FRBR-based prototype designed to support the assignment of classification numbers and subject headings for books, DVDs, CDs, and other types of materials.

This project applies principles of the FRBR model to aggregate bibliographic information above the manifestation level. Bibliographic records are grouped using the OCLC FRBR Work-Set algorithm to form a work-level summary of the class numbers and subject headings assigned to a work. You can retrieve a summary by ISBN, ISSN, UPC, OCLC number, author/title, or subject heading.

The Classify database is accessible through a user interface and as a machine-to-machine service. The database provides access to more than 36 million WorldCat records that contain Dewey Decimal Classification (DDC) numbers,[...].

Retrieved information is in XML format.

http://www.oclc.org/research/activities/classify.html?urlm=159746

Dewey Sources (III): National Libraries

LCLibrary of Congress(any)MARC

BNFBibliotheque nationale de France(fre)MARC

DNBDeutsche Nationalbibliothek(ger)HTML

BNCFBiblioteca Nazionale Centrale di Firenze(ita)HTML

BNCRBiblioteca Nazionale Centrale di Roma(ita)HTML

BNBBritish National Bibliography(eng)MARC

The logic used in the programs

open the connection to the bibliographical database

obtain the ISBN from records without a Dewey number

open the connection to the Dewey source, if Z39.50

for each ISBN

query the data source using the current ISBN

if a Dewey number is available in the response

if the Dewey number passes quality control

update the bibliographical record

wait to avoid overloading

close the connection to the Dewey source, if Z39.50

close the connection to the bibliographical database

Quality check

Catalogs contain errors

DDC has many editions

Our old Dewey numbers start from edition 19

Indicators

Lot of discarded Dewey...

but we moved from 40,000
to 60,000 records with Dewey number

+50%

Delay while searching sources

Continuous searching can suffocate remote serversrobots.txt

policies for crawlers

Continuous indexing can overload your server

Wait a few seconds between searches or group of searchesthis will slow the harvesting process

Statistics

SourceLanguageRecords ScannedRecords ModifiedISBN not foundDewey # not foundDewey # discardedSeveral works with same ISBNISBN incorrect

Classifyall423871026753216607200598240133

LCall3199912522119585621011BNFall30903225321327726855DNBger419316338671630BNCFita12017408836433542744BNCRita754915153003297853BNBeng6215193544955518Total19710

Browsing Dewey Index

Besides author, uniform titles and subject headings, our OPAC offers a path of semantic search based on the Dewey classification number

Software

Query programs were written in Perl language, making use of the Koha API and the following libraries available on CPAN:LWP for HTTP connections

ZOOM for Z39.50 connections

DBI for connections to the MySQL database

XML::XPath for XML data processing

WWW::Scraper for HTML data processing

MARC::Record for MARC records processing

A scientific article

published on JLIS.it at
http://leo.cilea.it/index.php/jlis/article/view/8766

JLIS.it, Italian Journal of Library and information science, is an academic journal of international scope, peer-reviewed and open access

written with my cataloguers

doesn't deal with the dynamic component

Version 2.0 - Single Record Mode

New record:enter the ISBN

retrieve Dewey from important catalogs

choose and import the best one into the new record

Or upgrade an old record adding or modifying its Dewey classification

Conclusions

Increase of available bibliographic data on the net

Unique identifiersISBN, ISSN, ...

VIAF Id, ISNI, ...

Catalog enrichmentbibliographic records

authority records

Expose rich linked datawith coded information like Dewey

with standard IDs like iSBN, ISNI, ...

Thank youGracias
Grazie