29
Publication information 7LWOH $XWKRUV 6RXUFH 9HUVLRQ '2, 3XEOLVKHU Copyright information Notice http://repository.ust.hk/ir/ This version is available at HKUST Institutional Repository via If it is the author’s pre-published version, changes introduced as a result of publishing processes such as copy-editing and formatting may not be reflected in this document. For a definitive version of this work, please refer to the published version. Implementing Knowledge Card to enhance information discovery through bibliographic linked data Lam, Ki Tat Symposium on Linked Data and the Semantic Web, Hong Kong, 4 March 2019 Published version Nil Nil ¡ The Author(s). http://hdl.handle.net/1783.1/95441

hkbulinkeddataktlam - HKUST Scholarly Publicationsrepository.ust.hk/ir/bitstream/1783.1-95441/1/hkbulinkeddataktlam.pdfLinked data is interlinked. Linked Open Data forms a global database

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Publication information

Copyright information

Notice

http://repository.ust.hk/ir/

This version is available at HKUST Institutional Repository via

If it is the author’s pre-published version, changes introduced as a result of publishing processes such as copy-editing and formatting may not be reflected in this document. For a definitive version of this work, please refer to the published version.

Implementing Knowledge Card to enhance information discovery through bibliographic linked data

Lam, Ki Tat

Symposium on Linked Data and the Semantic Web, Hong Kong, 4 March 2019

Published version

Nil

Nil

© The Author(s).

http://hdl.handle.net/1783.1/95441

Implementing Knowledge Card to enhanceinformation discovery through bibliographic linked data

Ki Tat LAMHong Kong University of Science and Technology Library

[email protected]<https://orcid.org/0000‐0003‐2625‐9419>

Symposium onLinked Data and the Semantic Web : Current Trends and Opportunities

Hong Kong Baptist University LibraryHong Kong

4 March 2019

Last revised: 2019‐03‐04

2

AgendaConcepts behind Knowledge CardsImplementation of Discovery with Knowledge CardsDemo

3

Information Discovery Use Case 

I find a book in the Library Catalog. It is about something (topic, person, book, etc.) and is written by someone (creator, contributor). I am interested in knowing / reading more information about them.

Can Library Catalog help?

4

TraditionalLibrary Catalog

(or Next Generation)

• You can search/browse the catalog by title, author, subject to find out more information by yourselves

• You can click on the author/subject fields to launch a search on the catalog

• There may be external links to websites in the records. They are provided by catalogers, and from there you can find more information

• Some platforms provide central indexes of articles or allow libraries to load their own repositories, and from there you can explore related articles

• Some platforms provide recommendations or related titles based on personal preferences or search history

What can Library Catalog offer?

5

Will linked data technology help?

Observations

Linked data is structured. Relationships between data is explicitly represented, allowing query by machines semantically.

Linked data is interlinked. Linked Open Data forms a global database of rich information about many things.

Bibliographic data that libraries created (thanks to catalogers!) contains “access points” that are linked to authority records. 

Identifiers of authority records are available in LOD, e.g. Wikidata. Instead of just linking to authority data, could Library Catalog go one more step in deriving information from LOD based on these authority data identifiers?

6

Relevance concepts in information retrieval

Citation IndexTwo documents are considered to be related if one is cited by another.

Document 2 contains an identifier of document 1 (and vise versa).

Identifiers can be universal like DOI, or an internal system number identifying a document.

cited by

document 1 document 2

Web Page RankingTwo web pages are considered to be related if one has a hyperlink to another.

Web page 2 has a HTML anchor containing the URL of web page 1.

hyperlinked from

web page 1 web page 2

Eugene Garfield Larry Page

7

Relevance concepts in information retrieval [cont.]

Pattern matchingTwo objects have things in common (co‐occurrence) are considered to be related.

Applications Facial recognition (things are face features) Optical character recognition (things are glyph features) Document ranking (things are keywords, subjects, names, citations, etc.)

object 1

thing 1 

thing 2 

thing 3 

object 2

has

has

has has

8

Bibliographic linked data Break strings in fields to 

things Things are identified by URIs URIs are actionable Things within a dataset are 

inter‐linked Things among datasets are 

inter‐linked Richness of identifiers

LCNAF, LCSH, MESH, FAST, VIAF, …

DOI, ISBN, ISSN, …

Structured, with semantic representations – good for information discovery

rdf:type

bf:Text

hkust:991011643229703412#Work

bf:Work rdf:typebf:title

<>

rdf:type bf:Title

bf:contribution

<>

“Smoot, George”@en

rdf:type bf:Contribution

<>

bf:identifiedByrdf:type

bf:Person

bf:agent

rdf:type

rdfs:label

rdfs:label

“Wrinkles in time”@en

bf:Agent

<>rdf:value lcnaf:n93046571

relator:ctbbf:role

bf:Identifierrdf:type

A fragment of a BIBFRAME 2.0 graph (http://catalog.ust.hk/bf/991011643229703412)

9

Wikidata is a valuable source of information about things

Link to Wikipedia articles

Contains identifiers LC authority ID, 

ISNI, VIAF, GND, BNF, GeoNames, …

https://www.wikidata.org/wiki/Q179572

Wikidata – Linked Open Data

10

1. A Wikidata entity is considered as containing relevant information about a book if they both have the same name/subject.

2. Therefore, by querying Wikidata with the set of names/subjects found in a book, we obtain a set of knowledge cards associated to the book.

book

has has

wikidata entity

name/subject 

book

wikidata entitieshas knowledge cards

names/subjects 

Discovery concepts behind Knowledge Cards

11

3. Two books have a name/subject in common are considered to be related.

4. Therefore, by querying Library Catalog Triplestore on a name/subject, we obtain a set of books related to the name/subject.

book1

has has

book 2

name/subject 

Discovery concepts behind Knowledge Cards [cont.]

name/subject

books 

has related books

12

4. And then by querying Wikidata with the set of names/subjects found in a set of books related to a name/subject, we obtain a set of knowledge cards associated to the name/subject.

Discovery concepts behind Knowledge Cards [cont.]

name/subject books 

wikidata entities

one degree of separation

names/subjects 

wikidata entity

13

5. Users can go on with more navigations to obtain more knowledge cards.

Old saying: a person is socially connected to another person in at the most six degrees of separation.

Here we have:

A name/subject is bibliographically connected to another name/subject. Each step through bibliographic linked data leads to information at next degree of separation.

A few steps in knowledge cards, a big leap in knowledge space.

Discovery concepts behind Knowledge Cards [cont.]

14

Alma BIBFRAME 2.0

Discovery with Knowledge Cards

<rdf:value rdf:resource="http://id.loc.gov/authorities/names/n79022889"/><rdf:value rdf:resource="http://id.loc.gov/authorities/names/n79039943"/><rdf:value rdf:resource="http://id.loc.gov/authorities/names/n81020731"/><rdf:value rdf:resource="http://id.loc.gov/authorities/subjects/sh2008101109"/>

http://catalog.ust.hk/kc

https://lbdiscover.ust.hk/bib/991011895759703412

Implementing knowledge cards in Library Catalog

15

Find related knowledge cards by a title

Features ‐ Discovery with Knowledge Cards

https://catalog.ust.hk/kc/?recnum=991011895759703412

16

Find related knowledge cards by a name/subject

Features ‐ Discovery with Knowledge Cards [cont.]

https://catalog.ust.hk/kc/?lcid=lcnaf:n81020731

17

Find related titles by a name/subject

Features ‐ Discovery with Knowledge Cards [cont.]

https://catalog.ust.hk/kc/?lcid=lcnaf:n81020731&dataset=catalog

18

Discover more from a knowledge card

Display Wikipedia article

Launch search to Library Catalog

De‐reference URIs of these identifiers to their linked data pages

Retrieve knowledge cards associated to this name (lcnaf:n81020731)

Retrieve titles associated to this name (lcnaf:n81020731)

Features ‐ Discovery with Knowledge Cards [cont.]

19

Search Library Catalog Triplestore by keywords

This Triplestore contains only a subset (~9%) of the Library Catalog Records that were retrieved by Library Catalog (Primo) users since January 2018 New arrivals of books and audio/visual materials since April 2018

MARC records from Alma are transformed into BIBFRAME RDF and injected to the store as named graphs

Ongoing injection – automatically done when users display the record in Primo As of 3 March 2019, the Triplestore contains 116,698 named graphs and 30,088,950 

triples

Features ‐ Discovery with Knowledge Cards [cont.]

20

https://catalog.ust.hk/lod

Library Catalog Triplestore is Linked Open Data. Machines can launch SPARQL to the endpoint at:http://catalog.ust.hk/lod/sparql?...

Query Editor – construct your own SPARQL

View results in table and JSON formats

Library Catalog Triplestore – SPARQL Query Form

21

Four use cases of constructing SPARQLRetrieve triples of a graph (e.g. 

hkust:991012055789703412)Search keyword in labels (e.g. 金庸 or HKUST). Limit search by types of entities (e.g. bf:Title)

Given a LCNAF/LCSH URI (e.g. lcnaf:n81020731 or lcsh:sh98002795), retrieve LCNAF/LCSH URIs mentioned in graphs that contain this URI. Limit search in selected entities (e.g. bf:Agent or bf:Topic)

Retrieve titles related to a name or subject (e.g. lcnaf:n81020731 or lcsh:sh85033169)

Handy for learners

Library Catalog Triplestore – SPARQL Query Form [cont.]

22

http://catalog.ust.hk/bf

• For library colleagues to learn about bibliographic linked data (from Alma)• BIBFRAME 2.0• RDA/RDF• JSON‐LD

• Experimental Catalog Triplestore for colleagues to learn SPARQL and understand triples 

• Tools for linked data experiments, e.g. Discovery with Knowledge Cards

Bibliographic Linked Data Learning Platform

23

SPARQL Query Form(https://catalog.ust.hk/lod/)

Display the BIBFRAME linked data of this work in human readable form and in RDF/XML.  Machines can download the data in RDF and in N‐Triples format (http://catalog.ust.hk/bf/991011895759703412)

Display RDA/RDF linked data of this work

Display MARCXMLof this work, for comparison purpose

Display Knowledge Cards of this title(http://catalog.ust.hk/kc/?recnum=991011895759703412)

Display record in HKUST Library Catalog(PowerSearch, on Primo)(http://lbdiscover.ust.hk/bib/991011895759703412)

Display JSON‐LD linked data of this work

Bibliographic Linked Data Learning Platform [cont.]

24

Demonstrationhttp://catalog.ust.hk/kc

To atom

To aip

To nanotechnology

From

尼耳斯·玻尔哲学文选 =The philosophical writings of Niels Bohr(https://lbdiscover.ust.hk/bib/991001519479703412)

To

Introduction to bioMEMS(https://lbdiscover.ust.hk/bib/991001004549703412)

25

ChallengesTriplestore requires very large and very faststorage resource. Our experimental triplestore isrunning on a desktop computer with a few TB ofSSD hard disks. It is therefore not sufficient totransform the whole Library Catalog to triples.

We use open source software ApacheJena/Fuseki to serve triples. It has however afew limitations. For example, database sizekeeps growing even though you are deletingrecords or updating existing triples. Difficult totroubleshoot if there is corruption due to badtriples.

Apache Jena/Fuseki comes with Lucene /ElasticSearch text index capability. But, it doesnot work well with CJK searching. We plan todevelop Lucene analyzer that will work withTSVCC (Traditional, Simplified and VariantChinese Characters).

Alma has API to transform MARC records toBIBFRAME RDF, using LC’s marc2bibframeconversion tool. However, there are issues in theconversion. Also, it does not validate data beforetransformation. Therefore, if MARC records haveinvalid/dirty content, triples created will containinvalid and/or corrupted URIs.

Alma bib records are linked to authority records,making it possible to populate authority URIs toBIBFRAME. However, quite a number of our bibrecords fail to link to authority records asexpected and therefore these URIs areunavailable for BIBFRAME transformation. ExLibris is working on this bug(?).

We have rebuild HKCAN as Linked Open Data.JULAC is in the process of relinking bib headingsin Alma (from LCNAF) to HKCAN. It will beinteresting in see how HKCAN URI helps in thisdiscovery project.

26

Challenges [cont.]Not all LCNAF and LCSH identifiers in our Catalogare available in Wikidata. If authority identifiersare not populated to Wikidata, knowledge cardsassociating to them cannot be generated.

Many LCSH headings in bib records havesubdivisions. Similarly, many LCNAF headings arename‐title. Because of the composite nature ofthese LCSH/LCNAF identifiers, they are unlikelyto be populated to Wikidata. We have to takeextra steps to break these composite forms toobtain identifiers of the name portion and maindivision of the subject.

Discovery with Knowledge Cards is a separateapp outside of the Catalog. It will perform muchbetter if it is an integral part of Primo. Needvendor to buy in to this new option ofinformation discovery.

27

Conclusion

I find a book in the Library Catalog. It is about something (topic, person, book, etc.) and is written by someone (creator, contributor). I am interested in knowing / reading more information about them.

Can Library Catalog help?

Information discovery in Library Catalog can be greatlyenhanced if linked data technology is deployed togetherwith the core functions of the Catalog.

The prerequisite is to have bibliographic data transformedinto actionable linked data. BIBFRAME is one of theoptions.

Bibliographic data used to have a rich sets of identifiersthat point to authority data. These identifiers can becomean important component in harvesting information relatedto a name, subject or title.

Linked Open Data sources such as Wikidata are essentialfor such harvesting. Completeness of the coverage ofauthority identifiers in these LOD is therefore critical.

Libraries are at the edge of a brave new world of linkeddata. Colleagues are encouraged to familiarize and adoptthis technology into their daily works.

Vendors of library service platforms need to evolve toembrace linked data into their core systems.

Information Discovery Use Case 

28

Thank you!