View
4
Download
0
Category
Preview:
Citation preview
Publication information
Copyright information
Notice
http://repository.ust.hk/ir/
This version is available at HKUST Institutional Repository via
If it is the author’s pre-published version, changes introduced as a result of publishing processes such as copy-editing and formatting may not be reflected in this document. For a definitive version of this work, please refer to the published version.
Implementing Knowledge Card to enhance information discovery through bibliographic linked data
Lam, Ki Tat
Symposium on Linked Data and the Semantic Web, Hong Kong, 4 March 2019
Published version
Nil
Nil
© The Author(s).
http://hdl.handle.net/1783.1/95441
Implementing Knowledge Card to enhanceinformation discovery through bibliographic linked data
Ki Tat LAMHong Kong University of Science and Technology Library
lblkt@ust.hk<https://orcid.org/0000‐0003‐2625‐9419>
Symposium onLinked Data and the Semantic Web : Current Trends and Opportunities
Hong Kong Baptist University LibraryHong Kong
4 March 2019
Last revised: 2019‐03‐04
3
Information Discovery Use Case
I find a book in the Library Catalog. It is about something (topic, person, book, etc.) and is written by someone (creator, contributor). I am interested in knowing / reading more information about them.
Can Library Catalog help?
4
TraditionalLibrary Catalog
(or Next Generation)
• You can search/browse the catalog by title, author, subject to find out more information by yourselves
• You can click on the author/subject fields to launch a search on the catalog
• There may be external links to websites in the records. They are provided by catalogers, and from there you can find more information
• Some platforms provide central indexes of articles or allow libraries to load their own repositories, and from there you can explore related articles
• Some platforms provide recommendations or related titles based on personal preferences or search history
What can Library Catalog offer?
5
Will linked data technology help?
Observations
Linked data is structured. Relationships between data is explicitly represented, allowing query by machines semantically.
Linked data is interlinked. Linked Open Data forms a global database of rich information about many things.
Bibliographic data that libraries created (thanks to catalogers!) contains “access points” that are linked to authority records.
Identifiers of authority records are available in LOD, e.g. Wikidata. Instead of just linking to authority data, could Library Catalog go one more step in deriving information from LOD based on these authority data identifiers?
6
Relevance concepts in information retrieval
Citation IndexTwo documents are considered to be related if one is cited by another.
Document 2 contains an identifier of document 1 (and vise versa).
Identifiers can be universal like DOI, or an internal system number identifying a document.
cited by
document 1 document 2
Web Page RankingTwo web pages are considered to be related if one has a hyperlink to another.
Web page 2 has a HTML anchor containing the URL of web page 1.
hyperlinked from
web page 1 web page 2
Eugene Garfield Larry Page
7
Relevance concepts in information retrieval [cont.]
Pattern matchingTwo objects have things in common (co‐occurrence) are considered to be related.
Applications Facial recognition (things are face features) Optical character recognition (things are glyph features) Document ranking (things are keywords, subjects, names, citations, etc.)
object 1
thing 1
thing 2
thing 3
object 2
has
has
has has
8
Bibliographic linked data Break strings in fields to
things Things are identified by URIs URIs are actionable Things within a dataset are
inter‐linked Things among datasets are
inter‐linked Richness of identifiers
LCNAF, LCSH, MESH, FAST, VIAF, …
DOI, ISBN, ISSN, …
Structured, with semantic representations – good for information discovery
rdf:type
bf:Text
hkust:991011643229703412#Work
bf:Work rdf:typebf:title
<>
rdf:type bf:Title
bf:contribution
<>
“Smoot, George”@en
rdf:type bf:Contribution
<>
bf:identifiedByrdf:type
bf:Person
bf:agent
rdf:type
rdfs:label
rdfs:label
“Wrinkles in time”@en
bf:Agent
<>rdf:value lcnaf:n93046571
relator:ctbbf:role
bf:Identifierrdf:type
A fragment of a BIBFRAME 2.0 graph (http://catalog.ust.hk/bf/991011643229703412)
9
Wikidata is a valuable source of information about things
Link to Wikipedia articles
Contains identifiers LC authority ID,
ISNI, VIAF, GND, BNF, GeoNames, …
https://www.wikidata.org/wiki/Q179572
Wikidata – Linked Open Data
10
1. A Wikidata entity is considered as containing relevant information about a book if they both have the same name/subject.
2. Therefore, by querying Wikidata with the set of names/subjects found in a book, we obtain a set of knowledge cards associated to the book.
book
has has
wikidata entity
name/subject
book
wikidata entitieshas knowledge cards
names/subjects
Discovery concepts behind Knowledge Cards
11
3. Two books have a name/subject in common are considered to be related.
4. Therefore, by querying Library Catalog Triplestore on a name/subject, we obtain a set of books related to the name/subject.
book1
has has
book 2
name/subject
Discovery concepts behind Knowledge Cards [cont.]
name/subject
books
has related books
12
4. And then by querying Wikidata with the set of names/subjects found in a set of books related to a name/subject, we obtain a set of knowledge cards associated to the name/subject.
Discovery concepts behind Knowledge Cards [cont.]
name/subject books
wikidata entities
one degree of separation
names/subjects
wikidata entity
13
5. Users can go on with more navigations to obtain more knowledge cards.
Old saying: a person is socially connected to another person in at the most six degrees of separation.
Here we have:
A name/subject is bibliographically connected to another name/subject. Each step through bibliographic linked data leads to information at next degree of separation.
A few steps in knowledge cards, a big leap in knowledge space.
Discovery concepts behind Knowledge Cards [cont.]
14
Alma BIBFRAME 2.0
Discovery with Knowledge Cards
<rdf:value rdf:resource="http://id.loc.gov/authorities/names/n79022889"/><rdf:value rdf:resource="http://id.loc.gov/authorities/names/n79039943"/><rdf:value rdf:resource="http://id.loc.gov/authorities/names/n81020731"/><rdf:value rdf:resource="http://id.loc.gov/authorities/subjects/sh2008101109"/>
http://catalog.ust.hk/kc
https://lbdiscover.ust.hk/bib/991011895759703412
Implementing knowledge cards in Library Catalog
15
Find related knowledge cards by a title
Features ‐ Discovery with Knowledge Cards
https://catalog.ust.hk/kc/?recnum=991011895759703412
16
Find related knowledge cards by a name/subject
Features ‐ Discovery with Knowledge Cards [cont.]
https://catalog.ust.hk/kc/?lcid=lcnaf:n81020731
17
Find related titles by a name/subject
Features ‐ Discovery with Knowledge Cards [cont.]
https://catalog.ust.hk/kc/?lcid=lcnaf:n81020731&dataset=catalog
18
Discover more from a knowledge card
Display Wikipedia article
Launch search to Library Catalog
De‐reference URIs of these identifiers to their linked data pages
Retrieve knowledge cards associated to this name (lcnaf:n81020731)
Retrieve titles associated to this name (lcnaf:n81020731)
Features ‐ Discovery with Knowledge Cards [cont.]
19
Search Library Catalog Triplestore by keywords
This Triplestore contains only a subset (~9%) of the Library Catalog Records that were retrieved by Library Catalog (Primo) users since January 2018 New arrivals of books and audio/visual materials since April 2018
MARC records from Alma are transformed into BIBFRAME RDF and injected to the store as named graphs
Ongoing injection – automatically done when users display the record in Primo As of 3 March 2019, the Triplestore contains 116,698 named graphs and 30,088,950
triples
Features ‐ Discovery with Knowledge Cards [cont.]
20
https://catalog.ust.hk/lod
Library Catalog Triplestore is Linked Open Data. Machines can launch SPARQL to the endpoint at:http://catalog.ust.hk/lod/sparql?...
Query Editor – construct your own SPARQL
View results in table and JSON formats
Library Catalog Triplestore – SPARQL Query Form
21
Four use cases of constructing SPARQLRetrieve triples of a graph (e.g.
hkust:991012055789703412)Search keyword in labels (e.g. 金庸 or HKUST). Limit search by types of entities (e.g. bf:Title)
Given a LCNAF/LCSH URI (e.g. lcnaf:n81020731 or lcsh:sh98002795), retrieve LCNAF/LCSH URIs mentioned in graphs that contain this URI. Limit search in selected entities (e.g. bf:Agent or bf:Topic)
Retrieve titles related to a name or subject (e.g. lcnaf:n81020731 or lcsh:sh85033169)
Handy for learners
Library Catalog Triplestore – SPARQL Query Form [cont.]
22
http://catalog.ust.hk/bf
• For library colleagues to learn about bibliographic linked data (from Alma)• BIBFRAME 2.0• RDA/RDF• JSON‐LD
• Experimental Catalog Triplestore for colleagues to learn SPARQL and understand triples
• Tools for linked data experiments, e.g. Discovery with Knowledge Cards
Bibliographic Linked Data Learning Platform
23
SPARQL Query Form(https://catalog.ust.hk/lod/)
Display the BIBFRAME linked data of this work in human readable form and in RDF/XML. Machines can download the data in RDF and in N‐Triples format (http://catalog.ust.hk/bf/991011895759703412)
Display RDA/RDF linked data of this work
Display MARCXMLof this work, for comparison purpose
Display Knowledge Cards of this title(http://catalog.ust.hk/kc/?recnum=991011895759703412)
Display record in HKUST Library Catalog(PowerSearch, on Primo)(http://lbdiscover.ust.hk/bib/991011895759703412)
Display JSON‐LD linked data of this work
Bibliographic Linked Data Learning Platform [cont.]
24
Demonstrationhttp://catalog.ust.hk/kc
To atom
To aip
To nanotechnology
From
尼耳斯·玻尔哲学文选 =The philosophical writings of Niels Bohr(https://lbdiscover.ust.hk/bib/991001519479703412)
To
Introduction to bioMEMS(https://lbdiscover.ust.hk/bib/991001004549703412)
25
ChallengesTriplestore requires very large and very faststorage resource. Our experimental triplestore isrunning on a desktop computer with a few TB ofSSD hard disks. It is therefore not sufficient totransform the whole Library Catalog to triples.
We use open source software ApacheJena/Fuseki to serve triples. It has however afew limitations. For example, database sizekeeps growing even though you are deletingrecords or updating existing triples. Difficult totroubleshoot if there is corruption due to badtriples.
Apache Jena/Fuseki comes with Lucene /ElasticSearch text index capability. But, it doesnot work well with CJK searching. We plan todevelop Lucene analyzer that will work withTSVCC (Traditional, Simplified and VariantChinese Characters).
Alma has API to transform MARC records toBIBFRAME RDF, using LC’s marc2bibframeconversion tool. However, there are issues in theconversion. Also, it does not validate data beforetransformation. Therefore, if MARC records haveinvalid/dirty content, triples created will containinvalid and/or corrupted URIs.
Alma bib records are linked to authority records,making it possible to populate authority URIs toBIBFRAME. However, quite a number of our bibrecords fail to link to authority records asexpected and therefore these URIs areunavailable for BIBFRAME transformation. ExLibris is working on this bug(?).
We have rebuild HKCAN as Linked Open Data.JULAC is in the process of relinking bib headingsin Alma (from LCNAF) to HKCAN. It will beinteresting in see how HKCAN URI helps in thisdiscovery project.
26
Challenges [cont.]Not all LCNAF and LCSH identifiers in our Catalogare available in Wikidata. If authority identifiersare not populated to Wikidata, knowledge cardsassociating to them cannot be generated.
Many LCSH headings in bib records havesubdivisions. Similarly, many LCNAF headings arename‐title. Because of the composite nature ofthese LCSH/LCNAF identifiers, they are unlikelyto be populated to Wikidata. We have to takeextra steps to break these composite forms toobtain identifiers of the name portion and maindivision of the subject.
Discovery with Knowledge Cards is a separateapp outside of the Catalog. It will perform muchbetter if it is an integral part of Primo. Needvendor to buy in to this new option ofinformation discovery.
27
Conclusion
I find a book in the Library Catalog. It is about something (topic, person, book, etc.) and is written by someone (creator, contributor). I am interested in knowing / reading more information about them.
Can Library Catalog help?
Information discovery in Library Catalog can be greatlyenhanced if linked data technology is deployed togetherwith the core functions of the Catalog.
The prerequisite is to have bibliographic data transformedinto actionable linked data. BIBFRAME is one of theoptions.
Bibliographic data used to have a rich sets of identifiersthat point to authority data. These identifiers can becomean important component in harvesting information relatedto a name, subject or title.
Linked Open Data sources such as Wikidata are essentialfor such harvesting. Completeness of the coverage ofauthority identifiers in these LOD is therefore critical.
Libraries are at the edge of a brave new world of linkeddata. Colleagues are encouraged to familiarize and adoptthis technology into their daily works.
Vendors of library service platforms need to evolve toembrace linked data into their core systems.
Information Discovery Use Case
Recommended