Because the web of data itself - COnnecting REpositories · 2017. 10. 13. · An OCLC Research View...

Preview:

Citation preview

AIB CILW 2016 Conference, Rome

October 21, 2016

Because the web of data

doesn’t organize itselfOCLC Research’s contributions to

linked data in the library community

Titia van der Werf

Senior Program Officer

Web of Documents

• Web pages or other

documents

• Human-readable

text

• Independent

• Static

Web of Data

• Statements about

entities, or ‘Things’

• Machine-processable

data

• Integrated

• Actionable

The two models of the Web

An example: a Knowledge Card

Albert Einstein

Person

Relativity: The Special and General Theory

Work

Physics

Subject

author

about

Entities and relationships

https://www.wikidata.org/wiki/Q937 and http://viaf.org/viaf/75121530

Wikidata and VIAF

http://experiment.worldcat.org/entity/work/data/369081611

WorldCat Works

http://id.loc.gov/authorities/subjects/sh85101653.html

Library of Congress Subject Headings

author

about

…linked for machine understanding

THE OCLC RESEARCH

INTERNATIONAL LINKED DATA

SURVEYS FOR IMPLEMENTERSKAREN SMITH-YOSHIMURA

Geographic breakdown of 90 responding institutions

20 countries

represented

0 5 10 15 20 25 30 35 40 45

USA

Spain

UK

The Netherlands

Norway

Canada

Australia

France

Germany

Italy

Switzerland

Austria

Czech Republic

Hungary

Ireland

Japan

Malaysia

Portugal

Singapore

Sweden

Linked Data Survey Respondents

Academic library

National library

Network

Government

Scholarly

Public Library

Museum

Other

31%

20%14%

10%

8%

7%4% 6%

2015 responding institutions by type

What is published as linked data

0 10 20 30 40 50 60

Authority files

Bibliographic data

Data about musuem objects

Datasets

Descriptive metadata

Digital collections

Encoded archival descriptions

Geographic data

Ontologies/vocabularies

Other

• Steep learning curve for staff

• Inconsistent legacy data

• Difficulties in

– selecting appropriate ontologies to model

data

– establishing links

• Little documentation or advice on how to build

the systems

Barriers to publishing linked data

VIAF

DBpedia

GeoNames

id.loc.gov

“Resources we convert to linked data ourselves”

Getty's Art and Architecture Thesaurus

FAST (Faceted Application of Subject Terminology)

WorldCat.org

data.bnf.fr

Deutsche National Bib Linked Data Service

2015 linked data resources most

consumed

DBpedia

Libraries, publishing

Life sciences

Social networking

Government

• Unreliable quality of published linked data

– not always reusable

– lack of authority control or URIs

– stale or obsolete datasets

• Difficulty understanding its structure and meaning

• Matching, disambiguating, and aligning locally produced

data with third-party resources

• Mapping vocabulary

• Size of RDF datasets—too large or small

Barriers to consuming linked data

Maturity

Analysis

Implementation

• Expose our data to larger Web audience

• Demonstrate what can be done

• Heard about it and wanted to try it

• Improve SEO

• Create a richer user experience

• Enhance our own data

• Improve internal metadata management

• Achieve greater accuracy and scope in search results

• Experiment with data integration

Publishing

Consuming

Both

Reasons for publishing and consuming linked data

OCLC RESEARCH’S CONTRIBUTIONS

WorldCat growth since 1998

39 41 44 47 50 52 55 61 67

86

108

139

197

236

264

0

50

100

150

200

250

1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012

Millions of records

As of 27 April 2012

In aggregations:

• data lose their local context

• data get lost in the bigger context

Making sense of data at the aggregate level:

• FRBR

• GLIMIR

• VIAF

• FAST

• Mining for entities/names

Aggregating data

Manifestations

Reproductions

Translations

Works

FRBRisation of WorldCat: 2006 - now

GLIMIR:

Clustering

records which

differ in

language and

cataloguing

rules

2014: 197 million bibliographic work descriptions available as Linked Data

VIAF

Virtual International Authority File

• Merge of 24+ national level authority files

• Cooperative program run by OCLC

• Initiated by LoC, DNB, BnF and OCLC

• 29 million authority records

• 112 million bibliographic records

• Migrated from an OCLC Research project to an

OCLC service in 2012

• VIAF is available as linked data

OCLC’s linked data resources

WorldCat Catalog

WorldCat Works

FAST

VIAF

ISNI

The EntityJS explorer

Show related entities

WHAT WE’VE LEARNED

Linked data in the library community:

Where the effort is focused

Data publishing

Data consumption

Application development

?

Why linked data?

Replicate existing library

functions more cheaply and

efficiently

Improve data integration

A better user

experience

Greater Web

visibility

Develop better models of

resources not well served by

current standards

Improve internal data

management

Library linked data is not…

A silver bullet

A killer app

A panacea

The result of cumulative and joint effort

But it is...

SM

Together we make breakthroughs possible.

Acknowledgements

Jean Godby

AIB CILW 2016 Conference, Rome - October 21, 2016

Karen Smith-Yoshimura

SM

Together we make breakthroughs possible.

Comments?

Titia van der Werf

AIB CILW 2016 Conference, Rome - October 21, 2016

titia.vanderwerf@oclc.org

Recommended