Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

Preview:

Citation preview

Linked Data for Libraries (LD4L)

CUL Metadata Working GroupMay 15, 2015

Linked Data for Libraries (LD4L)

• Andrew W. Mellon Foundation made a two-year $999K grant to Cornell, Harvard, and Stanford starting Jan 2014

• Partners will work together to develop an ontology and linked data sources that provide relationships, metadata, and broad context for Scholarly Information Resources

• Leverages existing work by both the VIVO project and the Hydra Partnership

Specific LD4L Goals

• Free information from existing library system silos to provide context and enhance discovery of scholarly information resources

• Leverage usage information about resources• Link bibliographic data about resources with academic

profile systems and other external linked data sources• Assemble (and where needed create) a flexible,

extensible LD ontology to capture all this information about our library resources

• Demonstrate combining and reconciling the assembled LD across our three institutions

Bibliographic Data

• MARC• MODS• EAD

Person Data• VIVO • ORCID• ISNI• VIAF

Usage Data• Circulation• Citation• Curation

• Exhibits• Research

Guides• Syllabi• Tags

LD4L Data Sources

LD4L

Agenda

• Simeon – Intro (done) and use cases• Jon – Ontologies• Steven – Modified Bibframe for LD4L• Rebecca – UC2, mass conversion, post

processing• Lynette – UC1.1, ORE, scraping, lookups• Naun – Preview of LD4P

LD4L Use Case Clusters1. Bibliographic + curation

data2. Bibliographic + person

data3. Leveraging external data

including authorities4. Leveraging the deeper

graph (via queries or patterns)

5. Leveraging usage data6. Three-site services, e.g.

cross-site search

42 Raw Use Cases

12 Refined Use Casesin 6 clusters…

Cluster 1. Bibliographic + curation dataUC1.1: Build a virtual collection

“As a faculty member or librarian, I want to create a virtual collection or exhibit containing information resources from multiple collections across multiple universities either by direct selection or by a set of resource characteristics, so that I can share a focused collection with a <class, set of researchers, set of students in a disciplinary area>.”

UC1.1: Build a virtual collection

• Individual selection or resources• Collection may be exhibit, reading list, etc.• Includes description, ordering• Shared or private• Enable across institutions

• Well developed, uses annotations• Demo => Lynette Rayle

UC1.2: Tag scholarly information resources to support reuse

“As a librarian, I would like to be able to tag scholarly information resources from one or multiple institutions into curated lists, so that I can feed these these lists into subject guides, course reserves, or reference collections. I'd like these lists to be portable (into Drupal, into LibGuides, into Spotlight! or Omeka, into Sakai, e.g.) and durable. I'd like these lists/tags to selectively feed back into the discovery environment without having to modify the catalog records.”

UC1.2: Tag scholarly information resources to support reuse

• Scale O(100,000) items• Use for e.g. virtual subject library• Selection by rules and patterns• Collaborative curation• Feed data into discovery environment

• Same model as 1.1, partially developed• CULLR v2

CULLR -> provide virtual library “views/collections” as part of main Blacklight discovery system

Cluster 2. Bibliographic + person dataUC2.1: See and search on works by people to discover more works, and

better understand people

“As a researcher, I'd like to see / search on works <by, about, cited by, collected, taught> by University faculty <in an OPAC, profiles system>, to discover works of interest based on connection to people, and to understand people based on their relation to works.”

UC2.1: See and search on works by people to discover more works, and

better understand people

• Profile + catalog data• Perhaps between institutions: multiple

catalogs + multiple profile systems• Poor journal data suggests likely can do more

in non-journal disciplines

• Demo => Rebecca Younes

UC3.1: Search with Geographic Data for Record Enrichment and Pivoting

“As a researcher, I'd like to see the geographic context of my search results, and be able to pivot, extend or refine a search with a single click, in order to better assess found resources, find related resources, and filter or expand search results to broaden or narrow a search on the fly.”

UC3.x: Search with XYZ for Record Enrichment and Pivoting

XYZ = geographic / subject /person data

• External data and entity linking• Things not strings• Different data types have different challenges

and data sources

Cluster 4. Leveraging the deeper graph (via queries or Patterns)

UC4.1: Identifying related works

“As a scholar, I would like to find all the images associated with various instances of a work sorted by time, so that I can see how the depictions of or illustrations in a work have changed over time.”

“(GloPAD specific): As a scholar, I would like to find all costume photographs and scene illustrations for various stagings and performances of the plays of a particular author or the operas of a particular composer, so that I can see how the visual look of performances of the plays or operas have changed over time.”

UC4.1: Identifying related works

• use complex graph relationships via queries or patterns (rather than direct connections

• allow discovery that would not be possible without the semantics of different relationship

• restricted by available data

• Steven Folsom work on complex and linked descriptions for hip-hop flyers

UC4.2: Leverage the deeper graph to surface more relevant works

“As a researcher, I would like to see resources in response to a search where the relevance ranking of the results reflects the "importance" of the works, based on how they have been used or selected by others, so that I can find important resources that might otherwise be "hidden" in a large set of results.”

Cluster 5. Leveraging usage dataUC5.1: Research guided by community

usage

“As a researcher, I want to find what is being used (read, annotated, bought by libraries, etc.) by the scholarly communities not only at my institution but at others, and to find sources used elsewhere but not by my community”

UC5.1: Research guided by community usage

• need to understand community of user– direct: auth and from identity?– inferred as part of discovery?

• cross-institutional – need usage data that can be shared and merged

UC5.2: Be guided in collection building by usage

“As a librarian, I would like help building my collection by seeing what is being used by students and faculty.”

“As a subject librarian, I would like to see what resources in my subject area are heavily used at peer institutions but are not in my institution’s collection.”

UC5.2: Be guided in collection building by usage

• business analytics• help libraries make best use of collection

building activities and funds• useful at both institutional or cross-

institutional levels• cross-institutional – need usage data that can

be shared and merged

Usage data

• Most work at Harvard

Cluster 6. Three site servicesUC6.1: Cross-site search

As a scholar I don’t want my discovery process to be constrained by the collection boundaries of my university yet I want to retain the detailed coverage of special collections that are important in my field.

Bonus points: I want results ranked by the scholarly value, not simply popularity in the public eye.

UC6.1: Cross-site search

• demonstrate the power of sharing library data as linked open data– opportunities for third parties– benefits for partners

• deal with dupes & near dupes, group in works

Will have simple demo by end of project using all the LOD from our three institutions merged in LD4L ontology / Bibframe

LD4L Ontology Team Overview

Metadata Working GroupMay 15, 2015

Jon Corson-Rikert

The team

Goals for an LD4L ontology framework(from the 2013 proposal)

• reuse appropriate parts of currently available ontologies while introducing extensions and additions where necessary

• be sufficiently expressive to encompass traditional catalog metadata from the 3 partners

• maintain compatibility with VIVO and research networking ontologies

• include usage and other contextual elements

Early LD4L discussions

• What library information is local vs. global?• How do we add links to external identifiers,

authorities, and real world objects (RWOs)?• How do we link across our libraries?• What existing ontologies should we use?• What services and workflows do we need?• Which services are (or aren’t) ready for prime

time?

Other starting points

• LD4L is not about original cataloging (that’s LD4P)

• LD4L is not trying to reconcile the different approaches of Schema.org and BIBFRAME

• Linking to data beyond library catalogs is a focus– Predisposed to re-use ontologies appropriate to

the domain involved• Ontology work has focused on our use cases

MWG April 18, 2003

Ontology design

Source: MK Bergmann, http://www.mkbergman.com

Ontologies need to evolve

http://www.mkbergman.com/908/a-new-methodology-for-building-lightweight-domain-ontologies/

Ontology orthogonality

http://www.w3.org/2007/Talks/0223-Bangalore-IH/Slides.html

BIBFRAME and Schema.org in LD4L

• We considered the needs of our project and our libraries

• Both will likely need the greater expressiveness that BIBFRAME offers– Technical services units are actively participating

in BIBFRAME training and trials• Not an either/or proposition -- libraries also

want broader discoverability– Value in exposing bibliographic metadata on the

web with Schema.org tags

Conversations continue on the BF list …• “Much of Bibframe’s problems seem to come from trying to keep

everything from the past (MARC) while moving to something fit for the future, an ambition that has I think also afflicted RDA” (Thomas Meehan)

• “Why do the 'powers that be' think that we even want our local catalogs to be semantically connected to the web or have all of our data linked?!” (Michael Ayres)

• “The whole linking idea is great, but really, after 40 years using MARC21, some yahoo wants to unravel everything and bill me for it? I don’t think so.” (Jeffrey Trimble)

Simple core framework

Becomes more complex …

Gets downright messy …

Simple framework, ironing out details

More work in progress …

Local vs. global identifiers

• Establishing local identifiers allows libraries to make their own assertions about resources and authorities– Assigning stable linked data URIs are accessible anywhere

• Shared references to global identifiers enable both direct and indirect linkages

• OCLC, VIAF, ISNI, ORCID and others are addressing global identifiers at scale

Local linked data identifiers

• For library resources but also local or unique information on people, organizations, events, collections– There is value and credibility in the institutional

namespace• Supplement with locally-sourced and/or

locally-targeted annotations– Not restricted to local generation or visibility

• Does not require exposing all operational data

From strings to things

• People• Organizations• Places• Subjects• Events• Works• Datasets

Image: designyoutrust.com

Entity resolution

• Can potentially happen before, during, or after MARC to RDF conversion

• Can draw on existing authorities directly and indirectly– Local sources may involve custom workflows and

services– Remote sources are likely shared and can more likely

benefit from standardized services• Assess potential to loop back

– From non-MARC sources to other catalog resources– From external sources to local

Working with non-MARC metadata

• Faculty research profiles (CAP, Faculty Finder, VIVO)

• Library guides and other library-sourced web content

• Digital collections that vary widely in size and complexity, and that encompass diverse subject domains

• Pilot projects– Cornell’s Hip Hop flyers– Harvard’s Visual Image Access metadata

Usage data

• Inspiration– Harvard’s Stacklife

• Goals– To supplement library discovery interfaces– To inform collection review and additions

• Challenges– Data availability– Concerns for patron privacy

• Potential for a normalized stack score across institutions

Stacklife screenshot

LD4L and BIBFRAME:Then and Now

Image credit: http://www.chicagobearboards.com/node/4372

Steven FolsomCornell University Library Metadata Working Group Forum

May 15th, 2015

Why is LD4L attempting to Use BIBFRAME?

• An ontology built by the Library of Congress as a replacement for MARC– If/when it solidifies, we will likely see data provided by

libraries using this model• Open Source tools available for converting MARC to

BIBFRAME• LD4L perspective is that BIBFRAME will likely be a

starting point, with post-processing needed to meet some of our use cases

• Didn’t want to develop an ontology from scratch within the timeframe of the grant…

BIBFRAME Vocabulary

Status (As of May 15, 2015, 11 am EST)• Paused in its current/first version since January 2014 to allow the

community to respond with feedback and so that tools could be built to test it in pilot projects

• Expecting a new version this spring based on feedback, more on this in a moment

Structure• Classes: Work, Instance, Authority and Annotation

– bf:Work loosely conflates FRBR concepts of Works and Expressions

– bf:Instance loosely aligns with FRBR Manifestion– No equivalent to FRBR Item yet in the vocabulary

Image credits: http://bibframe.org/vocab-model/ and http://en.wikipedia.org/wiki/George_Clooney

What BIBFRAME Isn’t.

LD4L’s Role in Grounding BIBFRAMEin Linked Data Best Practices

Image credit: http://www.moviefanatic.com/2014/01/gravity-george-clooney-reacts-to-oscar-nominations/

BIBFRAME: Areas for Improvement

Including, but not limited to…

Improvements: Reuse Existing Classes and Properties from Proven Ontologies

•bf:Person foaf:Person

•bf:Event schema:Event

•bf:subject dcterms:subject

Improvements: Provide Inverse Relationships

For example:

•bf:hasPart, add bf:isPartOf

•bf:hasReproduction, add bf:reproduces)

•dcterms:creator, add bf:creatorOf

Improvements: Lessen the number of sub-properties that do not add semantics

For Example:

• bf:titleValue doesn’t add more meaning between a title entity and its value. The more generalizable rdf:value works fine.

Improvements: Focus on Things not Strings

BIBFRAME (Post-Makeover)

Questions I want to answer going forward…

•How will descriptive practice change in with this more modular approach to the structure?

–What changes to content standards will be needed? (Especially for related entities, e.g. People, Publishers…)

•What new data sources will the move to RDF allow for? (VIVO, MusicBrainz, Getty Vocabularies, ISNI)

•What types of entities can we describe better in RDF? (e.g. Awards, Events)

•How can we take advantage of URIs right now in MARC, while ensuring a smoother transition to linked data?

Recommended