72
Linked Open Data Fundamentals For Libraries, Archives & Museums Trevor Thornton Senior Applications Developer, NYPL Labs New York Public Library

Linked Open Data Fundamentals for Libraries, Archives and Museums

Embed Size (px)

Citation preview

Page 1: Linked Open Data Fundamentals for Libraries, Archives and Museums

Linked Open Data FundamentalsFor Libraries, Archives & Museums

Trevor ThorntonSenior Applications Developer, NYPL Labs

New York Public Library

Page 2: Linked Open Data Fundamentals for Libraries, Archives and Museums

Workshop Topics

• What Linked Open Data is• Potential benefits of Linked Open Data for

libraries, archives and museums• Overview of technical concepts• Licenses for open data (legal issues)• Tour of relevant Linked Open Data sources

(element sets, controlled vocabularies, published data sets)

• General considerations for implementation

Page 3: Linked Open Data Fundamentals for Libraries, Archives and Museums

Linked Open Data (LOD)DataFor libraries, archives and museums, this is includes any type of digital information that describes resources or aids in their discovery (metadata).It also includes data produced through original research (scientific/statistical data, geospatial data, etc.)

Linked DataData published on the Web in accordance with principles designed to facilitate linkages between resources

Linked Open DataLinked data that is freely usable, reusable, and redistributable — subject, at most, to attribution and ‘share alike’ requirements

Page 4: Linked Open Data Fundamentals for Libraries, Archives and Museums

The value of our data

• Our data is a crucial tool in serving our missions to collect, preserve and provide access to resources

• We are dedicated to standards of quality and accuracy in the data we create

• The creation and management of data represents a significant investment on the part of cultural heritage institutions

Page 5: Linked Open Data Fundamentals for Libraries, Archives and Museums

Benefits of Linked Open Data• Puts information on the web, where people are

looking for it

• People can use your data in new ways, opening opportunities for scholarship and innovation

• Expands discoverability of your collections

• Allows for continuous improvement ofyour data by linking it to a growing poolof other data

Page 6: Linked Open Data Fundamentals for Libraries, Archives and Museums

The emerging data commons

Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/

Page 7: Linked Open Data Fundamentals for Libraries, Archives and Museums

A very brief history of

linked data

StarringTim Berners-Lee

Photo: Paul Clarke

Page 8: Linked Open Data Fundamentals for Libraries, Archives and Museums

1990 (more or less)

Tim Berners-Lee invents the World Wide Web to publish hypertext documents on the Internet.

It includes 3 essential technologies:

URI (Uniform Resource Identifier)

HTTP (Hypertext Transfer protocol)

HTML (Hypertext Markup Language)

Page 9: Linked Open Data Fundamentals for Libraries, Archives and Museums

2001

Tim Berners-Lee proposes ‘The Semantic Web’ in an article in Scientific American

“The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in cooperation…

In the near future, these developments will usher in significant new functionality as machines become much better able to process and ‘understand’ the data that they merely display at present.”

Page 10: Linked Open Data Fundamentals for Libraries, Archives and Museums

2006

In a document discussing design issues for the Semantic Web, Berners-Lee introduces linked

data as a crucial component:

“The Semantic Web isn't just about putting data on the web. It is about making links, so that a person or machine can explore the web of data. With linked data, when you have some of it, you can find other, related, data.”

He outlines 4 basic principles…

Page 11: Linked Open Data Fundamentals for Libraries, Archives and Museums

The Linked Data Principles

1. Use URIs as names for things.

2. Use HTTP URIs so that people can look up those names.

3. When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL).

4. Include links to other URIs so that they can discover more things.

Page 12: Linked Open Data Fundamentals for Libraries, Archives and Museums

THE TECHNICAL PART STARTS NOW

Page 13: Linked Open Data Fundamentals for Libraries, Archives and Museums

URI(Uniform Resource Identifier)

Globally unique identifier for a resource on a computer or a network.

HTTP URIs identify resources on the Web.

http://www.yourdomain.org/something

Page 14: Linked Open Data Fundamentals for Libraries, Archives and Museums

URI vs. URL

URLs (Uniform Resource Locators) are a subset of URIs that, in addition to identifying a resource, provide a means of locating it.

A URI does not necessarily point to a document. A URL does .

A URI can identify a real-world object.

Page 15: Linked Open Data Fundamentals for Libraries, Archives and Museums

HTTP(Hypertext Transfer Protocol)

The foundation of data communication for the Web

HTTP request

HTTP response

Client/User agent(e.g. web browser)

WebServer

Page 16: Linked Open Data Fundamentals for Libraries, Archives and Museums

RDFResource Description Framework

A framework for describing Web resources.

A Web resource is anything that can be retrieved or identified on the WWW via a URI.

RDF descriptions are based on simplesubject-predicate-object

expressions called “triples”.

Page 17: Linked Open Data Fundamentals for Libraries, Archives and Museums

The RDF Triple

Subject - the resource being describedPredicate - a property of that resource

Object - the value of the property

Subject and predicate are defined using URIs.Object can either be a URI or a ‘literal’ (text, number, date,

etc.)

subject objectpredicate

Page 18: Linked Open Data Fundamentals for Libraries, Archives and Museums

A basic triple

James Joyce

creator

Page 19: Linked Open Data Fundamentals for Libraries, Archives and Museums

A basic triple

James Joyce

creator

http://www.worldcat.org/oclc/746309573

http://viaf.org/viaf/44300643

http://purl.org/dc/terms/creator

Page 20: Linked Open Data Fundamentals for Libraries, Archives and Museums

Another basic triple

Dublin, Ireland

subject

http://www.worldcat.org/oclc/746309573

http://dbpedia.org/resource/Dublin

http://purl.org/dc/terms/subject

Page 21: Linked Open Data Fundamentals for Libraries, Archives and Museums

One more basic triple

date created

http://www.worldcat.org/oclc/746309573

http://purl.org/dc/terms/created

1918/1922

Page 22: Linked Open Data Fundamentals for Libraries, Archives and Museums

RDF data as a graph

http://www.worldcat.org/oclc/746309573

James Joycehttp://viaf.org/viaf/44

300643

creatorhttp://purl.org/dc/

terms/creator Dublin, Irelandhttp://dbpedia.org/res

ource/Dublin1918/1920

date createdhttp://purl.org/dc/terms/

created

subjecthttp://purl.org/dc/

terms/subject

Page 23: Linked Open Data Fundamentals for Libraries, Archives and Museums

RDF serialization formats‘Serialization’ = to record one or moreRDF graphs in a machine-readable file.

There are 2 basic options:

RDF in a standalone text file:• RDF XML• N3 (Notation 3)• Turtle (Terse RDF Triple Language)• N-Triples

RDF embedded in HTML• RDFa (RDF in attributes)

Page 24: Linked Open Data Fundamentals for Libraries, Archives and Museums

<http://www.worldcat.org/oclc/746309573> <http://purl.org/dc/terms/creator>

<http://viaf.org/viaf/44300643> .

<http://www.worldcat.org/oclc/746309573> <http://purl.org/dc/terms/subject>

<http://dbpedia.org/resource/Dublin> .

<http://www.worldcat.org/oclc/746309573> <http://purl.org/dc/terms/created>

1918/1922 .

Basic triples in N-Triples

N-Triples is the most basic expression of RDF.

Page 25: Linked Open Data Fundamentals for Libraries, Archives and Museums

@prefix dcterms: <http://purl.org/dc/terms/>.

<http://www.worldcat.org/oclc/746309573>

dcterms:creator http://viaf.org/viaf/44300643;

dcterms:subject http://dbpedia.org/resource/Dublin;

dcterms:created 1918/1922.

Basic triples in N3/Turtle

Statements about the same resource are grouped together.

Property URIs are shortened using prefixes.

Page 26: Linked Open Data Fundamentals for Libraries, Archives and Museums

Basic triples in RDF-XML<?xml version="1.0" encoding="UTF-8"?>

<rdf:RDF xmlns:rdf=“http://www.w3.org/1999/02/22-rdf-syntax-ns#”

xmlns:dcterms="http://purl.org/dc/terms/">

<rdf:Description rdf:about="http://www.worldcat.org/oclc/746309573">

<dcterms:creator rdf:resource="http://viaf.org/viaf/44300643"/>

<dcterms:subject rdf:resource="http://dbpedia.org/resource/Dublin"/>

<dcterms:created>1918/1922</dcterms:created>

</rdf:Description>

</rdf:RDF>

Page 27: Linked Open Data Fundamentals for Libraries, Archives and Museums

RDFa (RDF in Attributes)

RDFa allows RDF data to be embeddedwithin HTML content.

Rendered HTML:

Ulysses is a novel by the Irish author James Joyce.

HTML code:

<div about=“http://www.worldcat.org/oclc/746309573” prefix=“dcterms: http://purl.org/dc/terms/> Ulysses is a novel by the Irish author <span property=“dcterms:creator” resource=“http://viaf.org/viaf/44300643”>James Joyce</span></div>

Page 28: Linked Open Data Fundamentals for Libraries, Archives and Museums

RDF Ontologies

Ontologies/vocabularies define categories of things and the relationships that they can have

to each other.

Ontologies provide the semantics that allow datato be interpreted by machines.

Rules of inference – what can be assumed to be true based on what is asserted by a triple.

Page 29: Linked Open Data Fundamentals for Libraries, Archives and Museums

RDFS (RDF Schema)

A basic vocabulary for ontology development. RDFS defines RDF classes and properties.

Class – a category of resources; a resource in such a category is said to be an instance of the class

Property – a relation between a subject resource and an object resource in a triple.

Page 30: Linked Open Data Fundamentals for Libraries, Archives and Museums

OWL(Web Ontology Language)

Provides an extended set of properties used in ontology/vocabulary definitions(used in conjunction with RDFS)

•Equivalence/disjunction• Advanced property definitions• Restrictions and Cardinality

Page 31: Linked Open Data Fundamentals for Libraries, Archives and Museums

SKOS(Simple Knowledge Organization System)

Set of vocabularies created to support the use of thesauri, classification schemes, subject heading

systems and taxonomies in RDF• Concept schemes

(names, topics, geographic terms, etc.)• Preferred/alternate labels• Broader/narrower concepts

Page 32: Linked Open Data Fundamentals for Libraries, Archives and Museums

Triplestore

A database for storing RDF data.Often a triplestore is part of a suite of

applications that might include:• Triplestore• Inference engine – provides the ‘intelligence’

required to interpret data based on RDFS/OWL ontologies

• Query engine – supports access to data based on user-supplied queries

Page 33: Linked Open Data Fundamentals for Libraries, Archives and Museums

SPARQL(SPARQL Protocol and RDF Query Language)

• The primary query language for RDF data (analogous to SQL for relational databases)

• SPARQL endpoint – Web service that provides direct access to RDF datastores via SPARQL queries

Page 34: Linked Open Data Fundamentals for Libraries, Archives and Museums

Publishing Linked Data

Establish URIs for your resources• Within a domain that you control (yourlibrary.org)

• Consult with your IT staff on strategies for formulating URIs, for example: Subdomain (data.yourlibrary.org/something) Reserve a path within your domain,

(yourdomain.org/data/something)

Page 35: Linked Open Data Fundamentals for Libraries, Archives and Museums

Publishing Linked Data

Decide what happens when users (human or machine) try to access your URIs via the Web

1. Nothing (Not recommended)

2. Something – User is provided with information about the resource URI directs to RDF file

Good for machines, not for humans URI directs to an HTML representation of the resource

Good for humans, useless for machines – Not recommended URI directs to an HTML representation of the resource with RDFa embedded

Good for humans, OK for machines URI directs to either RDF file or HTML representation based on what

the user prefers (content negotiation)

Page 36: Linked Open Data Fundamentals for Libraries, Archives and Museums

HTTP Content Negotiation

HTTP Request• Resource URI (+ method)• Headers (Information about

the requestor)• Message body (optional)

HTTP request

HTTP response

Client/User agent(e.g. web browser)

WebServer

HTTP Response• Status code• Headers (Information

about the response)• Message body (optional)

Page 37: Linked Open Data Fundamentals for Libraries, Archives and Museums

HTTP ‘Accept’ Header

Part of the HTTP request that specifies what types of data the client can accept

• Web browsersHTML, JPEG, GIF, text, or other formats that browser can display – unsupported formats are either displayed as text or prompt user to download file

• Semantic web applicationsRDF XML, N3, Turtle, or other RDF serialization

Page 38: Linked Open Data Fundamentals for Libraries, Archives and Museums

HTTP Status Codes

Part of the HTTP response that classifies the nature of the response

1xx : Informational2xx : SuccessExample: 200 OK

3xx : RedirectionExamples: 301 Moved Permanently, 303 See OtherResponse will include ‘Location’ header with URI for new resource

4xx : ErrorExample: 404 Not Found

Page 39: Linked Open Data Fundamentals for Libraries, Archives and Museums

HTTP Content Negotiationvia 303 Redirect

Web browserWeb server

(running some kind of content negotiation service)

HTTP requestURI: http://example.org/somethingAccepts: HTML, JPEG, GIF, etc.

Page 40: Linked Open Data Fundamentals for Libraries, Archives and Museums

HTTP Content Negotiationvia 303 Redirect

Web browserWeb server

(running some kind of content negotiation service)

HTTP requestURI: http://example.org/somethingAccepts: HTML, JPEG, GIF, etc.

HTTP responseStatus: 303 See Other

Location:http://example.org/something.html

Page 41: Linked Open Data Fundamentals for Libraries, Archives and Museums

HTTP Content Negotiationvia 303 Redirect

Web browserWeb server

(running some kind of content negotiation service)

HTTP requestURI: http://example.org/somethingAccepts: HTML, JPEG, GIF, etc.

HTTP responseStatus: 303 See Other

Location:http://example.org/something.html

HTTP requestURI: http://example.org/something.htmlAccepts: HTML, JPEG, GIF, etc.

Page 42: Linked Open Data Fundamentals for Libraries, Archives and Museums

HTTP Content Negotiationvia 303 Redirect

Web browserWeb server

(running some kind of content negotiation service)

HTTP requestURI: http://example.org/somethingAccepts: HTML, JPEG, GIF, etc.

HTTP responseStatus: 303 See Other

Location:http://example.org/something.html

HTTP requestURI: http://example.org/something.htmlAccepts: HTML, JPEG, GIF, etc.

HTTP responseStatus: 200 OK

Page 43: Linked Open Data Fundamentals for Libraries, Archives and Museums

Trust

The rapid growth of the Web is attributable in large part to the fact that it allows anyone to say anything about anything (provable facts,

subjective opinions, blatant lies and everything in between)

This is also true of the linked data web.

Libraries, archives and museums are expected to provide ‘factual’, objective data and depend on

trusted sources.

Page 44: Linked Open Data Fundamentals for Libraries, Archives and Museums

Linked data attribution

A growing concern in the linked data community is the need to include attribution with data in order

to determine whether or not it can/should be trusted.

• RDF reification – allows source attribution to be associated with an RDF triple

• Named graphs – Extension of RDF that allows attribution and other metadata to be associated with RDF descriptions

• Quad stores – Similar to triplestores but with an additional element that connects the triple with its source

Page 45: Linked Open Data Fundamentals for Libraries, Archives and Museums

THE TECHNICAL PART IS NOW OVER

Page 46: Linked Open Data Fundamentals for Libraries, Archives and Museums

Linked Open Data

DataFor libraries, archives and museums, this is includes any type of digital information that describes resources or aids in their discovery (metadata).Also includes data produced through original research (scientific/statistical data, geospatial data, etc.)

Linked DataData published on the Web in accordance with principles designed to facilitate linkages between resources

Linked Open DataLinked data that is freely usable, reusable, and redistributable — subject, at most, to attribution and ‘share alike’ requirements

Page 47: Linked Open Data Fundamentals for Libraries, Archives and Museums

Open data licensing

Licensing your data is not the same as licensing your assets. Typically permitted uses of data are

much less restrictive.

You can often provide free, open use of your data even if use of your assets are

completely restricted.

TALK TO YOUR LEGAL DEPARTMENT FIRST.

Page 48: Linked Open Data Fundamentals for Libraries, Archives and Museums

Open data licensing

A nonprofit organization that enables the sharing and use of creativity and knowledge

through free legal tools.

CC provides an alternative to standard“all rights reserved” copyright.

Page 49: Linked Open Data Fundamentals for Libraries, Archives and Museums

Creative Commons Licenses

Three-Layer Design:

COMMONS DEEDThe human-readable versionof the license

LEAGAL CODEThe actual license as a legal document (accessible on the Web)

MACHINE-READABLE CODEAllows license info to beexpressed in RDF

Page 50: Linked Open Data Fundamentals for Libraries, Archives and Museums

Creative Commons Licenses

CC licenses allow creators to specify a combination of 4 restrictions on use

AttributionAny use must give credit to the creator

Share AlikeAny use must be made available under the same terms as the original

Non-CommercialOnly non-commercial uses are permitted

No Derivative WorksThe original may only be used in whole and unchanged

Licenses specify that any restrictions may be waived with permission of the rights holder.

Page 51: Linked Open Data Fundamentals for Libraries, Archives and Museums

Creative Commons LicensesO

PEN

DAT

A (: Attribution (CC BY)

Allows distribution and reuse in any way as long as you get credit

Attribution-ShareAlike (CC BY-SA)Allows distribution and reuse in any way as long as you get credit and derivative works are released under the same license

Attribution-NoDerivs (CC BY-ND)Requires that the original is used unchanged and in whole, with credit to you

Attribution-NonCommercial (CC BY-ND)Allows distribution and reuse in any way, for non-commercial purposes only, as long as you get credit

Attribution-NonCommercial-ShareAlike (CC BY-NC-SA)Requires that the original is used unchanged and in whole, with credit to you, provided that derivative works are released under the same license

Attribution-NonCommercial-NoDerivs (CC BY-NC-ND)Only permits use as-is, for non commercial purposes, and with credit to you – the most restrictive CC license available

NO

T O

PEN

DAT

A ):

Page 52: Linked Open Data Fundamentals for Libraries, Archives and Museums

CC0 (‘CC Zero’)

• Laws vary from jurisdiction to jurisdiction as to what rights are automatically granted and how and when they expire or may be voluntarily relinquished

• Ambiguity with regard to rights can limit creative re-use• CC0 is designed to make it as clear as is legally possible that

any use of your content is allowed• Quickly becoming the preferred license for open data

AGAIN, TALK TO YOUR LEGAL DEPARTMENT FIRST!

Allows creators to waive all rights to work and to place it as completely as possible into the public domain.

Page 53: Linked Open Data Fundamentals for Libraries, Archives and Museums

LINKED DATA SOURCES

Page 54: Linked Open Data Fundamentals for Libraries, Archives and Museums

DCMI Termsdublincore.org/documents/dcmi-terms/General purpose metadata terms maintained by the

Dublin Core Metadata Initiative

Page 55: Linked Open Data Fundamentals for Libraries, Archives and Museums

Bibliographic Ontologybibliontology.com

An extensive vocabulary of terms for describingbibliographic resources

Page 56: Linked Open Data Fundamentals for Libraries, Archives and Museums

FOAF (Friend of a Friend)foaf-project.org

Provides a vocabulary for describing people and their relationships to each other and the things they create

Page 57: Linked Open Data Fundamentals for Libraries, Archives and Museums

LC Linked Data Serviceid.loc.gov

Library of Congress authorities as linked data (Name Authority File, Subject Headings, Thesaurus of Graphic Materials, etc.)

Page 58: Linked Open Data Fundamentals for Libraries, Archives and Museums

Virtual International Authority Fileviaf.org

Links names from multiple authority files to create cluster records representing the entities identified

Page 59: Linked Open Data Fundamentals for Libraries, Archives and Museums

GeoNamesgeonames.org

Aggregates geographic data from a wide variety of sourcesand makes it available as LOD

Page 60: Linked Open Data Fundamentals for Libraries, Archives and Museums

New York Timesdata.nytimes.com

150 years of subjects from New York Times articles –data source for Times Topics pages

Page 61: Linked Open Data Fundamentals for Libraries, Archives and Museums

Data.gov

Open access to datasets held or generated by theUS Federal Government

Page 62: Linked Open Data Fundamentals for Libraries, Archives and Museums

DBpediadbpedia.org

Crowd-sourced community effort to extract structured information from Wikipedia and to make it available on the Web

Page 63: Linked Open Data Fundamentals for Libraries, Archives and Museums

Freebasefreebase.com

A large collaborative knowledge base consisting of metadata composed mainly by its community members (owned by Google)

Page 64: Linked Open Data Fundamentals for Libraries, Archives and Museums

Google Knowledge Graph

Google uses data from Freebase and other sourcesto provide related information based on search queries

Page 65: Linked Open Data Fundamentals for Libraries, Archives and Museums

Schema.org

A set of vocabularies developed by Google, Bing (Microsoft)and Yahoo! for adding semantic data to web pages

Page 66: Linked Open Data Fundamentals for Libraries, Archives and Museums

OCLC WorldCatoclc.org/worldcat

Earlier this year, OCLC added linked data to records in WorldCat, using Schema.org vocabularies and proposed extensions

for library data

Page 67: Linked Open Data Fundamentals for Libraries, Archives and Museums

SOME CONSIDERATIONS

Page 68: Linked Open Data Fundamentals for Libraries, Archives and Museums

Start small

Linked Open Data is not an‘all or nothing’ proposition

Start by publishing data aboutspecific collections or items of

special interest

Consider incorporating Linked Open Data into online exhibitions or special projects

Page 69: Linked Open Data Fundamentals for Libraries, Archives and Museums

Engage the linked data community

Let people know what you’re up to, and ask for feedback – you will get it.

Page 70: Linked Open Data Fundamentals for Libraries, Archives and Museums

Be creative

In addition to publishing data aboutyour own collections, think about how youcan incorporate data from other sources

into your projects

Consider collaborations withother institutions

Page 71: Linked Open Data Fundamentals for Libraries, Archives and Museums

Utilize your internal resources

Cataloging/Metadata

Curators/Subject Matter Experts

IT Staff

Legal Department

Page 72: Linked Open Data Fundamentals for Libraries, Archives and Museums

me:[email protected]

nypl labs:www.nypl.org/labs