83
Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

Embed Size (px)

Citation preview

Page 1: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

Linked Data For Libraries (LD4L)

Dean B. KrafftMarch 17, 2014

Page 2: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

Overview

• Intro to the LD4L project• Why use Linked Data?• LD4L Building Blocks: VIVO• LD4L Building Blocks: Hydra• LD4L Building Blocks: LibraryCloud/ShelfRank• So, what are we actually doing?• Summing Up

Page 3: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

Linked Data for Libraries (LD4L)

• On December 5, 2013, the Andrew W. Mellon Foundation made a two-year $999K grant to Cornell, Harvard, and Stanford starting Jan ‘14

• Partners will work together to develop an ontology and linked data sources that provide relationships, metadata, and broad context for Scholarly Information Resources

• Leverages existing work by both the VIVO project and the Hydra Partnership

Page 4: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

The Project Team

• Cornell: Dean Krafft, Jon Corson-Rikert, Brian Lowe, Simeon Warner, and 1.5 new FTE

• Harvard: David Weinberger, Paul Deschner, and Paolo Ciccarese

• Stanford: Tom Cramer, Lynn McRae, Naomi Dushay, Philip Schreur, and 1 new FTE

Page 5: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

“The goal is to create a Scholarly Resource Semantic Information Store model that works both within

individual institutions and through a coordinated, extensible network of Linked Open Data to capture the

intellectual value that librarians and other domain experts add to information resources when they

describe, annotate, organize, select, and use those resources, together with the social value evident from

patterns of usage.”

Page 6: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014
Page 7: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

Project Outcomes• Create a SRSIS ontology that is “sufficiently expressive to encompass

traditional catalog metadata from both Cornell and Harvard, the basic linked data elements described in the Stanford Linked Data Workshop Technology Plan, and the usage and other contextual elements from StackLife”

• Create a SRSIS Semantic editing, display, and discovery system based on the “Vitro semantic web platform and the SRSIS ontology, each instance of the system will support the incremental ingest of semantic data from multiple information sources, including the Cornell, Harvard, and Stanford MARC-based catalogs, StackLife, LibGuides, VIVO, Harvard Profiles, CAP, and OAI-PMH metadata providers, among others.”

• Create a Project Hydra compatible interface to SRSIS, “an ActiveTriples software component that facilitates the easy use of SRSIS and other linked-data within Hydra-based systems.”

Page 8: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

Why Use a Linked Data Approach?

Page 9: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

The Semantic Web

• Turn data into a web of simple links

• Use ontology to explain how things are linked

• Use reasoning to add new links automatically

• Be flexible and extensible

Page 10: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

Hierarchical information organization

Page 11: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

Changing the focus to relationships

Page 12: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

Benefits of the Semantic Web Approach

• Focuses on connections• Connections have meaning, not just hierarchy– Shared topics– Human relationships– Linkage through events– Geographic proximity– Temporal alignment

• Supports many dimensions of nearness

Page 13: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

RDF “triples”

Page 14: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

Connectivity at the micro level

• Triples connect subjects with objects via a consistent set of relationships

Jane Smith

holds position in

author of

member of

Dept. of Genetics

College of Medicine

Journal article

Book chapter

Book

Genetics Institute

Subject Predicate Object

Page 15: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

Using Semantic Web Technology vs. Linked

Open Data

Page 16: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

What is Linked Open Data (LOD)?• Data– Structured information, not just documents with text– A common, simple format (triples)

• Open– Available, visible, mine-able– Anyone can post, consume, and reuse

• Linked– Directly by reference– Indirectly through common references and inference

Page 17: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

An HTTP request can return HTML or data

Page 18: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

Commonality among references• Shared types

– foaf: Person, Organization– geo: Country– bibo: Book, Academic Article, Journal

• Shared relationships– geo: hasBorderWith, isInGroup, hasMember– foaf: knows, homepage– dc: creator– skos: subject

• Shared instances defined as types and linked by relationships– Agrovoc concepts: rice, sustainability– DBPedia URIs for places, concepts, events– Unique identifiers for researchers and organizations

Page 19: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

Why Use LOD for Data Sharing?

• LOD requires– Being willing to make

data open– Providing enough

structure and consistency so data can be linked

• The goal is a higher return on investment over time

http://rww.readwriteweb.netdna-cdn.com/images/semweb_pwc1.pnghttp://www.pwc.com/us/en/technology-forecast/spring2009/semantic-web-technologies.jhtml

Page 20: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/

The Linked Open Data Cloud

Page 21: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

LD4L Building Blocks:VIVO

Page 22: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

What is VIVO?• Software: An open-source semantic-web-based

researcher and research discovery tool• Data: Institution-wide, publicly-visible

information about research and researchers• Standards: A standard ontology (VIVO data) that

interconnects researchers, communities, and campuses using Linked Open Data

• Community: An open community with strong national and international participation

Page 23: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

VIVO connects scientists and scholars with and through their research and scholarship

Page 24: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

So – How Does VIVO Actually Work?

Page 25: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

VIVO is a Semantic Web application• Provides data readable by machines, not just text for

humans• Provides self-describing data via shared ontologies– Defined types– Defined relationships

• Provides search & query augmented by relationships• Does simple reasoning to categorize and find associations– Teaching faculty = any faculty member teaching a course– All researchers involved with any gene associated with breast

cancer (through research project, publication, etc.)

Page 26: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

The VIVO/Vitro Platform

• Ingest tools – getting batch data in• Ontology editing tools – change what is being

described and represented• Instance editing tools – Edit instances of any of

the things represented in the ontology (people, publications, organizations, etc.)

• Template/display system – Display instances and sets in a useful way

Page 27: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

What does VIVO model?

• People and more– Organizations, grants, programs, projects, publications,

events, facilities, and research resources

• Relationships among the above– Meaningful

– Bidirectional

– Navigable context

• Links to URIs elsewhere– Concepts, identifiers

– People, places, organizations, events

Page 28: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

The VIVO ontology

• Describes people and organizations in the process of doing research, scholarship, and creative activities

• Stays discipline neutral and supports description and discovery across all disciplines

• Uses existing domain terminology to describe the content of research

• Remains modular, flexible, and extensible

Page 29: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014
Page 30: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

VIVO harvests much of its data automatically from verified sources•Reduces the need for manual input of data•Provides an integrated and flexible source of publicly visible data at an institutional level

Data, data, data

Individuals may also edit and customize their profiles to suit their professional needs

External data sources

Internal data sources

Page 31: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

Typical data sources

• HR – people, appointments

• Research administration – grants & contracts• Registrar – courses• Faculty reporting system(s)– publications, service, research areas, awards

• Events calendar• Internal and external news • External repositories – e.g., Pubmed, Scopus

Page 32: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

Complexity of Inputs

People

Grants

Data

Google Scholar

Center/ Dept/

Program websites

ResearchFacilities & Services

Courses

Tech transfer

Publications

VP ResearchUniv.

Communications

HPC

HR data

Faculty Reporting

GradSchool

Pubmed

CrossRef

Researcher.gov

arXiv

other database

s

NIH RePorter

Self-editing

Other campuses

Page 33: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014
Page 34: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

Structured data for visualizations

Page 35: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014
Page 36: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014
Page 37: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

Linked data indexing for search

Ponce VIVO

WashU VIVO

IU VIVO

CornellIthaca VIVO

Weill

Cornell VIVO

eagle-IResearchresources Harvard

ProfilesRDF

OtherVIVOs

DigitalVitaRDF

IowaLokiRDF

Linked Open Data

vivosearch.

org

UF VIVO

Scripps VIVO

Solrsearchindex

Alter-nateSolr

index

Page 38: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014
Page 39: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014
Page 40: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

Weill – Publications

Page 41: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014
Page 42: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014
Page 43: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014
Page 44: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

VIVO is Extensible

Page 45: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

Adding Research Resources and Facilities to VIVO

• CTSAconnect– OHSU, Harvard, Cornell, Florida, Buffalo & Stony

Brook– eagle-i sister NIH project – Harvard, OHSU, 7 others

• Facilities, services, techniques, protocols, skills, and research outputs beyond publications– Extended ways to represent expertise– Improve attribution for data and other contributions

to science

Page 46: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

Connecting researchers, resources, and clinical activities

Page 47: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014
Page 48: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014
Page 49: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014
Page 50: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

Supporting Humanities and Artistic Works

• Performances of a work• Translations• Collections and exhibits

Steven McCauley and Theodore Lawless, Brown University

http://www.vivoweb.org/files/vivo2013/friday_pm/VIVO-Humanities_McCauley.pdf

Page 51: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

The VIVO Community is now over 100 institutions worldwide

Page 52: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

5th Annual VIVO Conference

August 6-8, 2014 Austin, TX

www.vivoconference.org

Page 53: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

How does LD4L build on VIVO

• The LD4L ontology will use components of the VIVO-ISF ontology

• LD4L will use VIVO ontology design patterns• The basis for SRSIS implementations at each

institution will be Vitro plus LD4L ontology• The multi-institution LD4L demonstration

search will be an adaptation of VIVOsearch.org• LD4L will link to existing VIVO data

Page 54: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

LD4L Building Blocks: Hydra

Page 55: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

http://projecthydra.org

Hydra slides courtesy of Tom Cramer

Page 56: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

What Is Hydra?

• A robust repository fronted by feature-rich, tailored applications and workflows (“heads”) ➭ One body, many heads

• Collaboratively built “solution bundles” that can be adapted and modified to suit local needs.

• A community of developers and adopters extending and enhancing the core➭ If you want to go fast, go alone. If you want

to go far, go together.

Page 57: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

Fundamental Assumption #1

No single system can provide the full range of repository-based solutions for a given institution’s needs,

…yet sustainable solutions require a common repository infrastructure.

Page 58: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

For Instance…

- Generally a single PDF

- Simple, prescribed workflow

- Streamlined UI for depositors, reviewers & readers

Digitization Workflow

System

General Purpose Institutional Repository

Simple Complex

- Potentially hundreds of files type per object

- Complex, branching workflow

- Sophisticated operator (back office) interfaces

- Heterogeneous file types

- Simple to complex objects

- One- or two-step workflow

- General purpose user interfaces

ETD Deposit System

Page 59: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

Hydra Heads: Emerging Solution Bundles

Institutional RepositoriesUniversity of Hull University of VirginiaPenn State University

ImagesNorthwestern University (Digital Image Library)

Page 60: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

Archives & Special CollectionsStanford UniversityUniversity of VirginiaRock & Roll Hall of Fame

MediaIndiana UniversityNorthwestern University Rock & Roll Hall of FameWGBH

Hydra Heads: Emerging Solution Bundles

Page 61: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

Workflow Management (Digitization, Preservation)Stanford UniversityUniversity of Illinois – Urbana-ChampagneNorthwestern University

ExhibitsNotre Dame

Hydra Heads: Emerging Solution Bundles

Page 62: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

ETDsStanford UniversityUniversity of VirginiaEtc.

(Small) Dataeveryone…

Hydra Heads: Emerging Solution Bundles

Page 63: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

Fundamental Assumption #2

No single institution can resource the development of a full range of solutions on its own,

…yet each needs the flexibility to tailor solutions to local demands and workflows.

Page 64: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

Hydra Philosophy -- Community

• An open architecture, with many contributors to a common core

• Collaboratively built “solution bundles” that can be adapted and modified to suit local needs

• A community of developers and adopters extending and enhancing the core

One framework, many contributors

Page 65: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

Hydra Philosophy -- Technical

• Tailored applications and workflows for different content types, contexts and user interactions

• A common repository infrastructure• Flexible, atomistic data models• Modular, “Lego brick” services• Library of user interaction widgets• Easily skinned UI

One body, many heads

Page 66: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

Technical Framework - Components

• Fedora provides a durable repository layer to support object management and persistence

• Solr, provides fast access to indexed information

• Blacklight, a Ruby on Rails plugin that sits atop solr and provides faceted search & tailored views on objects

• Hydra Head, a Ruby on Rails plugin that provides create, update and delete actions against Fedora objects

Page 67: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

Major Hydra Components

Page 68: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

So What is Hydra?

• Framework for generating Fedora front-end applications w/ full CRUD functionality

• That follows design pattern with common componentry and platforms

– Fedora, Ruby on Rails, Solr, Blacklight

• That supports distinct UI’s, content types, workflows, and policies

• Being developed by a community of 22 partner institutions (and growing)

Page 69: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

How does LD4L build on Hydra?

• We will create an ActiveTriples Hydra component to mimic ActiveFedora

• We will make it possible to use a TripleStore as a Hydra repository as well as Fedora

• The new Cornell Blacklight/Solr-based search will index from Cornell’s triple-based SRSIS

• We will explore using Hydra-based collections as data sources for LD4L data about resources

Page 70: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

LD4L Building Blocks: LibraryCloud/

ShelfRank

Page 71: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

• Provides model for access to library data• Includes access to ShelfRank for Harvard Library

resources• Provides concrete example for creating an ontology

for usage• Source data for Harvard SRSIS instance

Page 72: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

Enough background, what are we actually

doing?

Page 73: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

Developing Use Cases

• Currently have draft set of 33 use cases on LD4L public wiki

• Users include: faculty, student, dean, researcher, librarian, bibliographer, and cataloger

• Examples: Research guided by community usage; Compose a syllabus; Build a virtual collection; Info-rich maps; Who an author influenced

Page 74: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

Identifying Data Sources

• Bibliographic data: CUL/Harvard/SUL catalogs• Person data: VIVO, Stanford CAP, Profiles• Usage data: LibraryCloud, Cornell/Stanford

circulation, BorrowDirect circulation• Collections: Archival EAD, IRs, SharedShelf,

Olivia, arbitrary OAI-PMH• Annotations: Cornell CuLLR, Stanford DMS,

Bloglinks, DBpedia, LibGuides• Subjects & Authorities

Page 75: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

Assembling the Ontology

• General: VIVO-ISF; OpenAnnotation; SKOS• Bibliographic: BIBFRAME, BIBO, FaBiO• Provenance: PROV-O, PAV• People/Organizations: FOAF, PROV,

Schema.org• Licensing: Creative Commons; Dublin Core

Terms; Software Ontology• Many vocabularies/identifiers: VIAF, Getty,

ORCID, ISNI

Page 76: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

Project timeline 2014

• Jan-June 2014: Initial ontology design; identify data sources; identify external vocabularies; begin SRSIS and Hydra ActiveTriples development

• July-Dec 2014: Complete initial ontology; complete initial ActiveTriples development; pilot initial data ingests into Vitro-based SRSIS instance at Cornell

Page 77: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

Workshop – January 2015

• Hold a two-day workshop for 25 attendees from 10-12 interested library, archive, and cultural memory institutions

• Demonstrate initial prototypes of SRSIS and ontology• Obtain feedback on initial ontology design• Obtain feedback on overall design and approach• Make connections to support participants in piloting this

approach at their institutions• Understand how institutions see this approach fitting in

with their own multi-institutional collaborations and existing cross-institutional efforts such as the Digital Public Library of America, VIVO, and SHARE

Page 78: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

Project timeline Jan-June 2015

• Pilot SRSIS instances at Harvard and Stanford• Populate Cornell SRSIS instance from multiple

data sources including MARC catalog records, EAD finding aids, VIVO data, CuLLR, and local digital collections

• Develop a test instance of the SRSIS Search application harvesting RDF across the three partner institutions

• Integrate SRSIS with ActiveTriples

Page 79: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

Project timeline July-Dec 2015

• Implement fully functional SRSIS instances at Cornell, Harvard, and Stanford

• Public release of open source SRSIS code and ontology

• Public release of open source ActiveTriples Hydra Component

• Create public demonstration of SRSIS Search-based discovery and access system across the three SRSIS instances

Page 80: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

Summing Up

Page 81: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

Work So Far

• Initial project meeting at Stanford Jan. 30-31• Developed initial set of use cases and data

sources• Set up initial working groups: Ontology,

Engineering, Use Cases, and Outreach and Workshop Planning

• Identified a list of potential partners• Find out more at:

https://wiki.duraspace.org/display/ld4l/

Page 82: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

Project Outcomes

• Open source extensible SRSIS ontology compatible with VIVO ontology, BIBFRAME, and other existing library LOD efforts

• Open source SRSIS semantic editing, display, and discovery system

• Project Hydra compatible interface to SRSIS, using ActiveTriples to support Blacklight search across multiple SRSIS instances

Page 83: Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

Questions?