Upload
william-lancaster
View
218
Download
0
Tags:
Embed Size (px)
Citation preview
Bibliographic data in the Semantic Web – what issues do we face in
getting it there?
Gordon DunsirePresented to the ALCTS Cataloging and
Classification Section Executive Committee Forum, ALA Annual, 24 June 2011
Overview
Introduction to linked data and the Semantic Web
From record to statement: a paradigm shiftSome issues
Linked data and RDF
Resource Description Framework (RDF)Designed for machine-processing of metadata
at global scale (Semantic Web)24/7/365Trillions of operations per second
Everything must be dis-ambiguatedMachines are dumb
Simplicity helps!Machine-readable identifiers
RDF tripleMetadata expressed as “atomic” statements
A simple, single, irreducible statementThe title of this book is “Cataloguing is fun!”
Constructed in 3 parts“Triple”
The title of this book is “Cataloguing is fun!”Subject of the statement = Subject: This bookNature of the statement = Predicate: has titleValue of the statement = Object: “Cataloguing is fun!”
This book – has title – “Cataloguing is fun!”subject – predicate - object
Machine-readable identifiersUniform Resource Identifier (URI)
Can be any unique combination of numbers and lettersNo intrinsic meaning; it’s just an identifier
Can look like a URL“Cool” URI: exploits existing processes developed for the
World-Wide Webhttp://iflastandards.info/ns/isbd/elements/P1001But does not lead to a Web page (in principle ...)
RDF requires the subject and predicate of triple to be URIsObject can be a URI, or a literal string (“Cataloguing is
fun!”)
Title: Cataloguing is fun!
Author: Mary MacDonald
Content type:
Media type:
LCSH:
microform
text
Cataloging
Bibliographic record: 12345
b12345 Author “Mary MacDonald”
b12345 Title “Cataloguing is fun!”
b12345 Content type “text”
b12345 Media type “microform
b12345 LCSH “Cataloging”
subject predicate object
Name authority record: 8765
Heading: MacDonald, Mary
n8765 Heading “MacDonald, Mary”
n8765
t1234 Preferred label “microform”
t1234
lc1234
Heading “Cataloging”lc1234Preferred label “text”t9876
t9876
Identifiers for propertiesPredicates are known as properties in RDF
http://iflastandards.info/ns/isbd/elements/P1004“has key title”
Properties can be mixed’n’matchedChosen from different sources (element sets)
Different element sets contain similar propertieshttp://RDVocab.info/Elements/keyTitleManifestation
“Key title (Manifestation) ”
Some element sets are not available in RDFE.g. MARC21
Choosing properties/URIs for legacy recordsClosest inclusive meaning
Minimises information lossCheck the definition
ISBD’s “has title proper” better than Dublin Core’s “title” (a name given to the resource.)
Check other semantic constraintsRDA’s “titleManifestation” implies a triple’s
subject URI is a ManifestationNo good for non-FRBRized records
Metadata rights
Potential legal minefieldMultiple agencies contributing to one record
Anxiety that “others” may use open triples to build rival, competitive services
Main rights associated with the record?i.e. As an aggregation of triples
Can a triple be copyrighted if component URIs are openly published?
“Minting” URIs for resourcesSpecific subject of a triple
Mainly bibliographic resourcesURIs for Persons, Places, etc. taken from RDF “authorities”
FRBRized records need separate URI for the Work, Expression, Manifestion, (Item)
“Standard” identifiers only a partial solutionISBN, ISSN, national bibliography numbers, etc.
Risk of different agencies creating different URIs for the same resourceInefficient, and costly to maintain namespaces
Other costsProviding access to triples
Data-dump, triple store, data query (SPARQL)URIs should last forever
Preservation and archive regime requiredDe-referencing services
Providing human- and machine-readable information about a URI
Cost of re-engineering systems, re-designing interfaces, re-training cataloguers ...But long-term benefits will justify the investment
The Semantic Web ecosystemNot just professionally-generated triples
Machines generate triples by parsing content and semantic inferencingRDA anticipates ...
User-generated tagsThe madness (or wisdom) of crowds
Other communities generate relevant triplesMemory institutions, publishers, reference services
Everybody uses triplesIn ways beyond our dreams ...