If you can't read please download the document
Upload
stefano-bargioni
View
968
Download
0
Embed Size (px)
Citation preview
Stefano Bargioni
Pontificia Universit della Santa Croce
Catalogue enrichment: importing
Dewey Decimal Classification
from external sources
The project
Improving the Dewey search pathwith a minimal effort
while adding BNCF compliant subject headings to our catalog
Koha 3 open source ILS
Can be applied to other ILS's
Version 1: The Batch Mode
Add Dewey notations to the catalogautomatically
from selected sources
ensure quality and uniformity
An atomic copy cataloguing
copy cataloguing is usually related to the full record
we only need to copy field 082 (MARC21) or 676 (Unimarc)
ISBN unique identifier
the policy issue
Records to be modified
without Dewey notation
with ISBN
limit: 008 languageSELECT biblionumber, ISBN
FROM biblio
WHERE ISBN_present
AND dewey_absent
AND language_008='...'
In Koha, the WHERE clause is based on MySQL function ExtractValue, that works on field biblio.marcxml through XPath expressions
Dewey Sources (I)
a choice based on copy cataloguing experience
OCLC Classify
some National Libraries
API, Z39.50 or HTML access
Dewey Sources (II): OCLC Classify
Classify is a FRBR-based prototype designed to support the assignment of classification numbers and subject headings for books, DVDs, CDs, and other types of materials.
This project applies principles of the FRBR model to aggregate bibliographic information above the manifestation level. Bibliographic records are grouped using the OCLC FRBR Work-Set algorithm to form a work-level summary of the class numbers and subject headings assigned to a work. You can retrieve a summary by ISBN, ISSN, UPC, OCLC number, author/title, or subject heading.
The Classify database is accessible through a user interface and as a machine-to-machine service. The database provides access to more than 36 million WorldCat records that contain Dewey Decimal Classification (DDC) numbers,[...].
Retrieved information is in XML format.
http://www.oclc.org/research/activities/classify.html?urlm=159746
Dewey Sources (III): National Libraries
LCLibrary of Congress(any)MARC
BNFBibliotheque nationale de France(fre)MARC
DNBDeutsche Nationalbibliothek(ger)HTML
BNCFBiblioteca Nazionale Centrale di Firenze(ita)HTML
BNCRBiblioteca Nazionale Centrale di Roma(ita)HTML
BNBBritish National Bibliography(eng)MARC
The logic used in the programs
open the connection to the bibliographical database
obtain the ISBN from records without a Dewey number
open the connection to the Dewey source, if Z39.50
for each ISBN
query the data source using the current ISBN
if a Dewey number is available in the response
if the Dewey number passes quality control
update the bibliographical record
wait to avoid overloading
close the connection to the Dewey source, if Z39.50
close the connection to the bibliographical database
Quality check
Catalogs contain errors
DDC has many editions
Our old Dewey numbers start from edition 19
Indicators
Lot of discarded Dewey...
but we moved from 40,000
to 60,000 records with Dewey number
+50%
Delay while searching sources
Continuous searching can suffocate remote serversrobots.txt
policies for crawlers
Continuous indexing can overload your server
Wait a few seconds between searches or group of searchesthis will slow the harvesting process
Statistics
SourceLanguageRecords ScannedRecords ModifiedISBN not foundDewey # not foundDewey # discardedSeveral works with same ISBNISBN incorrect
Classifyall423871026753216607200598240133
LCall3199912522119585621011BNFall30903225321327726855DNBger419316338671630BNCFita12017408836433542744BNCRita754915153003297853BNBeng6215193544955518Total19710
Browsing Dewey Index
Besides author, uniform titles and subject headings, our OPAC offers a path of semantic search based on the Dewey classification number
Software
Query programs were written in Perl language, making use of the Koha API and the following libraries available on CPAN:LWP for HTTP connections
ZOOM for Z39.50 connections
DBI for connections to the MySQL database
XML::XPath for XML data processing
WWW::Scraper for HTML data processing
MARC::Record for MARC records processing
A scientific article
published on JLIS.it at
http://leo.cilea.it/index.php/jlis/article/view/8766
JLIS.it, Italian Journal of Library and information science, is an academic journal of international scope, peer-reviewed and open access
written with my cataloguers
doesn't deal with the dynamic component
Version 2.0 - Single Record Mode
New record:enter the ISBN
retrieve Dewey from important catalogs
choose and import the best one into the new record
Or upgrade an old record adding or modifying its Dewey classification
Conclusions
Increase of available bibliographic data on the net
Unique identifiersISBN, ISSN, ...
VIAF Id, ISNI, ...
Catalog enrichmentbibliographic records
authority records
Expose rich linked datawith coded information like Dewey
with standard IDs like iSBN, ISNI, ...
Thank youGracias
Grazie