117
The Digital Cavemen of Linked Lascaux Ruben Verborgh

The Digital Cavemen of Linked Lascaux

Embed Size (px)

Citation preview

Page 1: The Digital Cavemen of Linked Lascaux

The Digital Cavemenof Linked LascauxRuben Verborgh

Page 2: The Digital Cavemen of Linked Lascaux
Page 3: The Digital Cavemen of Linked Lascaux
Page 4: The Digital Cavemen of Linked Lascaux

The Lascaux paintings are 17,300 years old.

How long will your records last?

Page 5: The Digital Cavemen of Linked Lascaux

by Banksy

Page 6: The Digital Cavemen of Linked Lascaux

by Moyan Brenn

Page 7: The Digital Cavemen of Linked Lascaux

SUSTAINABILITY

Page 8: The Digital Cavemen of Linked Lascaux

SUSTAINABILITYa threat to the Semantic Web

lack of a longterm plan for

=

Page 9: The Digital Cavemen of Linked Lascaux

SUSTAINABILITYmaking promises you can keep

=

Page 10: The Digital Cavemen of Linked Lascaux

SUSTAINABILITYa dialog becoming a contract

=

Page 11: The Digital Cavemen of Linked Lascaux

SUSTAINABILITYremaining constant under change

=

Page 12: The Digital Cavemen of Linked Lascaux

How can we promise to remain constant in a changing world?

Page 13: The Digital Cavemen of Linked Lascaux

Changes

Constants

Promises

The Digital Cavemenof Linked Lascaux

Page 14: The Digital Cavemen of Linked Lascaux

Changes

Constants

Promises

The Digital Cavemenof Linked Lascaux

Page 15: The Digital Cavemen of Linked Lascaux

Changes

Data models

Technology

Interfaces

Page 16: The Digital Cavemen of Linked Lascaux

Changes

Data models

Technology

Interfaces

Page 17: The Digital Cavemen of Linked Lascaux

The oldest data model is a simple table.

1.1. INTRODUCTION 7

Tabular data Relational model

Meta-markup languages RDF

Each data item is structured asa line of field values. Fields arethe same for all items; a headerline can indicate their name.

Data are structured as tables, each ofwhich has its own set of attributes.Records in one table can relate to oth-ers by referencing their key column.

XML documents have a hierarchicalstructure, which gives them a tree-like appearance. Each element canhave one or more children; there isexactly one root element.

Each fact about a data item is expressedas a triple, which connects a subject toan object through a precise relationship.This leads to graph-structured data thatcan take any shape.

header

row

columnrelation

key column

attributes

table/entity

root

parent

child

siblings

propertysubject

object

Figure 1.1: Schematic comparison of the four major data models

van Hooland, S. and Verborgh, R. “Linked Data for Libraries, Archives and Museums” (Facet, 2014)

Page 18: The Digital Cavemen of Linked Lascaux

Tables do not cope well with changes in data or schema.

Title Artist Born Died

The Thrill is Gone B. B. King 1925 2015

Riding with the King John Hiatt 1952

Riding with the King B. B. King 1925

… … … …

Page 19: The Digital Cavemen of Linked Lascaux

Relational databases providea multi-dimensional table model.

1.1. INTRODUCTION 7

Tabular data Relational model

Meta-markup languages RDF

Each data item is structured asa line of field values. Fields arethe same for all items; a headerline can indicate their name.

Data are structured as tables, each ofwhich has its own set of attributes.Records in one table can relate to oth-ers by referencing their key column.

XML documents have a hierarchicalstructure, which gives them a tree-like appearance. Each element canhave one or more children; there isexactly one root element.

Each fact about a data item is expressedas a triple, which connects a subject toan object through a precise relationship.This leads to graph-structured data thatcan take any shape.

header

row

columnrelation

key column

attributes

table/entity

root

parent

child

siblings

propertysubject

object

Figure 1.1: Schematic comparison of the four major data models

van Hooland, S. and Verborgh, R. “Linked Data for Libraries, Archives and Museums” (Facet, 2014)

Page 20: The Digital Cavemen of Linked Lascaux

Databases cope with data changesbut schema changes are harder.

Title ArtistThe Thrill is Gone 1

Riding with the King 2Riding with the King 1

… …

ID Name Born Died

1 B. B. King 1925 2015

2 John Hiatt 1952

… … … …

Page 21: The Digital Cavemen of Linked Lascaux

There is no interoperabilitywith other databases.

Title ArtistThe Thrill is Gone 1

Riding with the King 2Riding with the King 1

… …

Wikipedia?

Page 22: The Digital Cavemen of Linked Lascaux

XML allows reuse of schemasand identifiers.

1.1. INTRODUCTION 7

Tabular data Relational model

Meta-markup languages RDF

Each data item is structured asa line of field values. Fields arethe same for all items; a headerline can indicate their name.

Data are structured as tables, each ofwhich has its own set of attributes.Records in one table can relate to oth-ers by referencing their key column.

XML documents have a hierarchicalstructure, which gives them a tree-like appearance. Each element canhave one or more children; there isexactly one root element.

Each fact about a data item is expressedas a triple, which connects a subject toan object through a precise relationship.This leads to graph-structured data thatcan take any shape.

header

row

columnrelation

key column

attributes

table/entity

root

parent

child

siblings

propertysubject

object

Figure 1.1: Schematic comparison of the four major data models

van Hooland, S. and Verborgh, R. “Linked Data for Libraries, Archives and Museums” (Facet, 2014)

Page 23: The Digital Cavemen of Linked Lascaux

XML schema evolution remains a tough nut to crack.

1.1. INTRODUCTION 7

Tabular data Relational model

Meta-markup languages RDF

Each data item is structured asa line of field values. Fields arethe same for all items; a headerline can indicate their name.

Data are structured as tables, each ofwhich has its own set of attributes.Records in one table can relate to oth-ers by referencing their key column.

XML documents have a hierarchicalstructure, which gives them a tree-like appearance. Each element canhave one or more children; there isexactly one root element.

Each fact about a data item is expressedas a triple, which connects a subject toan object through a precise relationship.This leads to graph-structured data thatcan take any shape.

header

row

columnrelation

key column

attributes

table/entity

root

parent

child

siblings

propertysubject

object

Figure 1.1: Schematic comparison of the four major data models

?

Page 24: The Digital Cavemen of Linked Lascaux

The RDF datamodel is flexiblefor changes in data and schema.

1.1. INTRODUCTION 7

Tabular data Relational model

Meta-markup languages RDF

Each data item is structured asa line of field values. Fields arethe same for all items; a headerline can indicate their name.

Data are structured as tables, each ofwhich has its own set of attributes.Records in one table can relate to oth-ers by referencing their key column.

XML documents have a hierarchicalstructure, which gives them a tree-like appearance. Each element canhave one or more children; there isexactly one root element.

Each fact about a data item is expressedas a triple, which connects a subject toan object through a precise relationship.This leads to graph-structured data thatcan take any shape.

header

row

columnrelation

key column

attributes

table/entity

root

parent

child

siblings

propertysubject

object

Figure 1.1: Schematic comparison of the four major data models

van Hooland, S. and Verborgh, R. “Linked Data for Libraries, Archives and Museums” (Facet, 2014)

Page 25: The Digital Cavemen of Linked Lascaux

RDF involves a trade-offbetween flexibility and reuse.

customontology

reuse ontologies

perfect match

perfect interoperability

Page 26: The Digital Cavemen of Linked Lascaux

So far for change within models…what about change between them?

1.1. INTRODUCTION 7

Tabular data Relational model

Meta-markup languages RDF

Each data item is structured asa line of field values. Fields arethe same for all items; a headerline can indicate their name.

Data are structured as tables, each ofwhich has its own set of attributes.Records in one table can relate to oth-ers by referencing their key column.

XML documents have a hierarchicalstructure, which gives them a tree-like appearance. Each element canhave one or more children; there isexactly one root element.

Each fact about a data item is expressedas a triple, which connects a subject toan object through a precise relationship.This leads to graph-structured data thatcan take any shape.

header

row

columnrelation

key column

attributes

table/entity

root

parent

child

siblings

propertysubject

object

Figure 1.1: Schematic comparison of the four major data models

1.1. INTRODUCTION 7

Tabular data Relational model

Meta-markup languages RDF

Each data item is structured asa line of field values. Fields arethe same for all items; a headerline can indicate their name.

Data are structured as tables, each ofwhich has its own set of attributes.Records in one table can relate to oth-ers by referencing their key column.

XML documents have a hierarchicalstructure, which gives them a tree-like appearance. Each element canhave one or more children; there isexactly one root element.

Each fact about a data item is expressedas a triple, which connects a subject toan object through a precise relationship.This leads to graph-structured data thatcan take any shape.

header

row

columnrelation

key column

attributes

table/entity

root

parent

child

siblings

propertysubject

object

Figure 1.1: Schematic comparison of the four major data models

1.1. INTRODUCTION 7

Tabular data Relational model

Meta-markup languages RDF

Each data item is structured asa line of field values. Fields arethe same for all items; a headerline can indicate their name.

Data are structured as tables, each ofwhich has its own set of attributes.Records in one table can relate to oth-ers by referencing their key column.

XML documents have a hierarchicalstructure, which gives them a tree-like appearance. Each element canhave one or more children; there isexactly one root element.

Each fact about a data item is expressedas a triple, which connects a subject toan object through a precise relationship.This leads to graph-structured data thatcan take any shape.

header

row

columnrelation

key column

attributes

table/entity

root

parent

child

siblings

propertysubject

object

Figure 1.1: Schematic comparison of the four major data models

1.1. INTRODUCTION 7

Tabular data Relational model

Meta-markup languages RDF

Each data item is structured asa line of field values. Fields arethe same for all items; a headerline can indicate their name.

Data are structured as tables, each ofwhich has its own set of attributes.Records in one table can relate to oth-ers by referencing their key column.

XML documents have a hierarchicalstructure, which gives them a tree-like appearance. Each element canhave one or more children; there isexactly one root element.

Each fact about a data item is expressedas a triple, which connects a subject toan object through a precise relationship.This leads to graph-structured data thatcan take any shape.

header

row

columnrelation

key column

attributes

table/entity

root

parent

child

siblings

propertysubject

object

Figure 1.1: Schematic comparison of the four major data models

Page 27: The Digital Cavemen of Linked Lascaux

There’s no ultimate model.They co-exist. Change is inherent.

1.1. INTRODUCTION 7

Tabular data Relational model

Meta-markup languages RDF

Each data item is structured asa line of field values. Fields arethe same for all items; a headerline can indicate their name.

Data are structured as tables, each ofwhich has its own set of attributes.Records in one table can relate to oth-ers by referencing their key column.

XML documents have a hierarchicalstructure, which gives them a tree-like appearance. Each element canhave one or more children; there isexactly one root element.

Each fact about a data item is expressedas a triple, which connects a subject toan object through a precise relationship.This leads to graph-structured data thatcan take any shape.

header

row

columnrelation

key column

attributes

table/entity

root

parent

child

siblings

propertysubject

object

Figure 1.1: Schematic comparison of the four major data models

1.1. INTRODUCTION 7

Tabular data Relational model

Meta-markup languages RDF

Each data item is structured asa line of field values. Fields arethe same for all items; a headerline can indicate their name.

Data are structured as tables, each ofwhich has its own set of attributes.Records in one table can relate to oth-ers by referencing their key column.

XML documents have a hierarchicalstructure, which gives them a tree-like appearance. Each element canhave one or more children; there isexactly one root element.

Each fact about a data item is expressedas a triple, which connects a subject toan object through a precise relationship.This leads to graph-structured data thatcan take any shape.

header

row

columnrelation

key column

attributes

table/entity

root

parent

child

siblings

propertysubject

object

Figure 1.1: Schematic comparison of the four major data models

1.1. INTRODUCTION 7

Tabular data Relational model

Meta-markup languages RDF

Each data item is structured asa line of field values. Fields arethe same for all items; a headerline can indicate their name.

Data are structured as tables, each ofwhich has its own set of attributes.Records in one table can relate to oth-ers by referencing their key column.

XML documents have a hierarchicalstructure, which gives them a tree-like appearance. Each element canhave one or more children; there isexactly one root element.

Each fact about a data item is expressedas a triple, which connects a subject toan object through a precise relationship.This leads to graph-structured data thatcan take any shape.

header

row

columnrelation

key column

attributes

table/entity

root

parent

child

siblings

propertysubject

object

Figure 1.1: Schematic comparison of the four major data models

1.1. INTRODUCTION 7

Tabular data Relational model

Meta-markup languages RDF

Each data item is structured asa line of field values. Fields arethe same for all items; a headerline can indicate their name.

Data are structured as tables, each ofwhich has its own set of attributes.Records in one table can relate to oth-ers by referencing their key column.

XML documents have a hierarchicalstructure, which gives them a tree-like appearance. Each element canhave one or more children; there isexactly one root element.

Each fact about a data item is expressedas a triple, which connects a subject toan object through a precise relationship.This leads to graph-structured data thatcan take any shape.

header

row

columnrelation

key column

attributes

table/entity

root

parent

child

siblings

propertysubject

object

Figure 1.1: Schematic comparison of the four major data models

Page 28: The Digital Cavemen of Linked Lascaux

Changes

Data models

Technology

Interfaces

Page 29: The Digital Cavemen of Linked Lascaux

Even if your data doesn’t change, technology does.

What happens to your data?

new software versions

new software manufacturers

Page 30: The Digital Cavemen of Linked Lascaux

Is your softwareholding your data hostage?

Is your software the owner of your data?

Intentional or unintentional vendor lock-in?

Or are you?

Can you get your data out at any moment you want?

Page 31: The Digital Cavemen of Linked Lascaux

The Cooper-Hewitt Design Museum had trouble getting their own data.

Data in The Museum System

flexible, but complex relational design

no export button

Website had more flexible demands

complex manual queries to liberate data

parallel CMS to drive website

Page 32: The Digital Cavemen of Linked Lascaux

Changes

Data models

Technology

Interfaces

Page 33: The Digital Cavemen of Linked Lascaux

The Web has been designedwith change in mind.

Individual links are allowed to breakso the entire Web does not.

—Tim Berners-Lee

Page 34: The Digital Cavemen of Linked Lascaux

The Web is in rapid evolution but continues on working.

What year is it? Then your users need…

1995 – HTML 2.0

2000 – XML

2008 – JSON

2012 – HTML 5

2015 – RDF ?

2017 – … ?

Page 35: The Digital Cavemen of Linked Lascaux

At least HTML seems constant,so the human Web is safe.

http://bib.org/books/978-1-85604-964-1/

around 2005: made in HTML 4

around 2015: made in HTML 5

Markup changes, the identifier does not.

Tim Berners-Lee called these “Cool URIs”.

Page 36: The Digital Cavemen of Linked Lascaux

Web APIs for machines suffer from changes on many levels.

http://api.bib.org/v2/viewBookDetails.php?id=978-1-85604-964-1&format=json &apikey=WSDGU56VP

How does this identifier cope with change?

How long does this identifier work unchanged?

!

Page 37: The Digital Cavemen of Linked Lascaux

http://api.bib.org/v2/viewBookDetails.php?id=978-1-85604-964-1&format=json &apikey=WSDGU56VP

!

!

!

Web APIs for machines suffer from changes on many levels.

dependency on server technology

dependency on API version

dependency on representation

dependency on API key

Page 38: The Digital Cavemen of Linked Lascaux

Plenty of excuses exist to change machine interfaces.

But our new server does it faster!

But our new API has different features!

But XML is obsolete now so we need JSON!

Page 39: The Digital Cavemen of Linked Lascaux

Even funnier are the excuses for requiring API keys.

But we need to rate limit!

But we need to track automated access!

But we need to protect our data!

Page 40: The Digital Cavemen of Linked Lascaux

Once and for all: API keys do not help with these.

But we need to rate limit!

But we need to track automated access!

But we need to protect our data!

Page 41: The Digital Cavemen of Linked Lascaux

Once and for all: API keys do not help with these.

Your HTML interface is still open!

JSON is a convenience, not a necessity.

Anybody can still do whatever they wantby scraping HTML pages with the same data.

Protect your data, not just one interface.

Page 42: The Digital Cavemen of Linked Lascaux

Yet other possible changes still appear to be a concern.

Remain constant if your server changes?

Remain constant if your API changes?

Remain constant if data models change?

Page 43: The Digital Cavemen of Linked Lascaux

Changes

Constants

Promises

The Digital Cavemenof Linked Lascaux

Page 44: The Digital Cavemen of Linked Lascaux

Constants

URIs

Ontologies

Resources

Page 45: The Digital Cavemen of Linked Lascaux

Constants

URIs

Ontologies

Resources

Page 46: The Digital Cavemen of Linked Lascaux

The RDF model is drivenby unique identifiers.

S

O

P

Page 47: The Digital Cavemen of Linked Lascaux

Constants allow clientsto establish a shared meaning.

S

O

P

http://bib.org/books/978-1-85604-964-1/

http://bib.org/authors/7356/

http://purl.org/dc/terms/creator

Page 48: The Digital Cavemen of Linked Lascaux

Human semantics are in conceptsand their meaning to the world.

S

O

P

a book

a person

written by

Page 49: The Digital Cavemen of Linked Lascaux

Machine semantics are in symbolsand their structural interrelations.

S

O

P

http://digybe.wpq/dgjyj-dgu7945

http://aole.wqq/mobd1.tihz

http://yudgy.jdu/DHH8DHBtkixhj

Page 50: The Digital Cavemen of Linked Lascaux

We need to be very careful about our choice of symbols.

S

O

P

http://bib.org/books/978-1-85604-964-1/

http://bib.org/authors/7356/

http://purl.org/dc/terms/creator

Page 51: The Digital Cavemen of Linked Lascaux

We need to be very careful about our choice of symbols.

http://bib.org/books/978-1-85604-964-1/

http://bib.org/authors/7356/

Is this a bookor a description of a book?

:printDate "2014-06-11":lastModified "2015-11-25"

Is this a person or a document?

:birthDate "1987-02-28":size "17kB"

Page 52: The Digital Cavemen of Linked Lascaux

Although designed for machines,the example only works for humans.

S

O

P

http://bib.org/books/978-1-85604-964-1/

http://bib.org/authors/7356/

http://purl.org/dc/terms/creator

Page 53: The Digital Cavemen of Linked Lascaux

Because, somehow, Web APIs make machine access different.

S

O

P

http://api.bib.org/v2/viewBookDetails.php?id=978-1-85604-964-1&format=json &apikey=WSDGU56VP

http://api.bib.org/v2/viewAuthorProfile.php?id=7356&format=json&apikey=WSDGU56VP

http://purl.org/dc/terms/creator

Page 54: The Digital Cavemen of Linked Lascaux

That’s why it’s a problem ifmachines need different identifiers.

S

O

P

http://api.bib.org/v2/viewBookDetails.php?id=978-1-85604-964-1&format=json &apikey=WSDGU56VP

http://api.bib.org/v2/viewAuthorProfile.php?id=7356&format=json&apikey=WSDGU56VP

http://purl.org/dc/terms/creator

Page 55: The Digital Cavemen of Linked Lascaux

Only this triple is a global constant.The other is volatile and local.

S

O

P

http://bib.org/books/978-1-85604-964-1/

http://bib.org/authors/7356/

http://purl.org/dc/terms/creator

Page 56: The Digital Cavemen of Linked Lascaux

Constants

URIs

Ontologies

Resources

Page 57: The Digital Cavemen of Linked Lascaux

Fortunately, we don’t have to pick all the constants ourselves.

Ontologies provide identifiers of concepts that are designed to be reused.

They are necessary to make RDF work.

They are necessary to create queries,especially over multiple datasources.

Page 58: The Digital Cavemen of Linked Lascaux

Of course, we get the benefits only if we actually reuse.

Why have our own my:writtenBy property when dc:creator already exists?

Maybe we have a more specific meaning?

We can still relate both properties with RDF.

But if we all use derivatives of the constants,what is the value of these constants?

Page 59: The Digital Cavemen of Linked Lascaux

Authors are not always in control: external semantic drift happens.

foaf:knows was bidirectional…

spec: “some level of reciprocity”

An foaf:knows Pete ⇒ Peter foaf:knows An

…until somebody modeled Twitter followers

Pete follows Angela Merkel ⇒ Pete knows Angela

Yet Angela doesn’t know Pete…

Page 60: The Digital Cavemen of Linked Lascaux

Getting close to Derrida… but we’re not philosophers.

There are only two hard things in Computer Science:cache invalidation and naming things.

—Phil Karlton

Page 61: The Digital Cavemen of Linked Lascaux

Constants

URIs

Ontologies

Resources

Page 62: The Digital Cavemen of Linked Lascaux

The constants you can touch are the constants you can trust.

No matter how hard technology changes, the books we describe remain the same.

Any mechanism of identification should based on domain resources, not on inevitably changing technology.

Page 63: The Digital Cavemen of Linked Lascaux

The “success” storyof the Web API community.

3 FOSTERING REUSABILITY THROUGH A SELF-DESCRIPTIVE BOTTOM-UP APPROACH

3 Fostering reusability through a self-descriptive bottom-up approach

Lacking better measurements, the Web api community has been heading the same quantity-over-quality course that hascharacterized the first years of the Linked Data initiative. An often-quoted fact in Web api papers and articles is the everincreasing number of Web apis (Figure 1), which is supposed to be an indicator of the ecosystem’s excellent health. How-ever, as Linked Data researchers have become painfully aware, quantity only loosely correlates with quality or usefulness.Perhaps for Web apis, the correlation between quantity and utility could even be negative. Few other communities wouldpride themselves on the existence of more than 12.000 di↵erent micro-protocols to achieve essentially the same thing:communicating between clients and servers over http. Of course, each application has its own domain and domain-specific vocabulary, but does that also warrant an entirely di↵erent way of exposing this, especially when we have rdf asa uniform data model? Each di↵erent api currently requires a di↵erent client, given the lack of a uniform api descriptionformat to explain the api’s response structure and functionality. Clearly, this approach to Web apis is a dead end.

2005 2007 2009 2011 2013 2015

Special.

1861,263

2,418

5,018

7,182

10,302

12,559

number of indexed Web ���s

Figure 1: The increasing number of Web apis is often named an indicator of their success, while the overgrowth of such custommicro-protocols is unnecessary—and detrimental to the development of generic Web api clients. (data: programmableweb.com)

In order for machines to use information autonomously, it has to be composed out of pieces they can recognize andinterpret. The rdf model achieves this by identifying each of the triple components by reusable iris, which have a meaningbeyond the scope that mentions them. Furthermore, the Linked Data principles mandate the use of httpurls, which turnthese components into a↵ordances toward relevant information. For instance, given the following rdf triple:

<http://dbpedia.org/resource/Bill_Clinton> <http://xmlns.com/foaf/0.1/knows>

<http://dbpedia.org/resource/Al_Gore>.

the knowledge of the foaf:knows predicate is su�cient for a machine to determine that this relation is symmetric, and thatdbpedia:Bill_Clinton and dbpedia:Al_Gore are instances of foaf:Person—even though it might have never encoun-tered any of those iris before. Furthermore, should the foaf:knows property be unfamiliar, its iri can be dereferenced tofind this information expressed in ontological predicates. Knowledge of these predicates in turn allows an interpretationof foaf:knows and hence the aforementioned derivation. We herein recognize two characteristics in particular:

• The information is structured in a bottom-up way: machines interpret a larger unit of information through its piecesinstead of interpreting the pieces through the whole (while humans are capable of doing both simultaneously).

• Each piece in the unit is self-descriptive: anything needed to interpret a piece is contained within itself, with its iriacting as both an identifier and a direct handle towards additional interpretation mechanisms. No external resourceis required beforehand, given the knowledge of a limited set of basic concepts.

This sharply contrasts with current practice for Web apis. Machines are assumed to interpret each api operation in its en-tirety, as such smaller pieces do not exist, and api descriptions—if present—are external documents that must be collectedand interpreted before consumption is possible. While this does not imply the inviability of such an approach, it raisesserious doubt as to whether that is the most e↵ective strategy towards automated Web api consumption by generic clients.

number of indexed Web APIs in ProgrammableWeb

Page 64: The Digital Cavemen of Linked Lascaux

Just imagine we had15,000 different data models.

3 FOSTERING REUSABILITY THROUGH A SELF-DESCRIPTIVE BOTTOM-UP APPROACH

3 Fostering reusability through a self-descriptive bottom-up approach

Lacking better measurements, the Web api community has been heading the same quantity-over-quality course that hascharacterized the first years of the Linked Data initiative. An often-quoted fact in Web api papers and articles is the everincreasing number of Web apis (Figure 1), which is supposed to be an indicator of the ecosystem’s excellent health. How-ever, as Linked Data researchers have become painfully aware, quantity only loosely correlates with quality or usefulness.Perhaps for Web apis, the correlation between quantity and utility could even be negative. Few other communities wouldpride themselves on the existence of more than 12.000 di↵erent micro-protocols to achieve essentially the same thing:communicating between clients and servers over http. Of course, each application has its own domain and domain-specific vocabulary, but does that also warrant an entirely di↵erent way of exposing this, especially when we have rdf asa uniform data model? Each di↵erent api currently requires a di↵erent client, given the lack of a uniform api descriptionformat to explain the api’s response structure and functionality. Clearly, this approach to Web apis is a dead end.

2005 2007 2009 2011 2013 2015

Special.

1861,263

2,418

5,018

7,182

10,302

12,559

number of indexed Web ���s

Figure 1: The increasing number of Web apis is often named an indicator of their success, while the overgrowth of such custommicro-protocols is unnecessary—and detrimental to the development of generic Web api clients. (data: programmableweb.com)

In order for machines to use information autonomously, it has to be composed out of pieces they can recognize andinterpret. The rdf model achieves this by identifying each of the triple components by reusable iris, which have a meaningbeyond the scope that mentions them. Furthermore, the Linked Data principles mandate the use of httpurls, which turnthese components into a↵ordances toward relevant information. For instance, given the following rdf triple:

<http://dbpedia.org/resource/Bill_Clinton> <http://xmlns.com/foaf/0.1/knows>

<http://dbpedia.org/resource/Al_Gore>.

the knowledge of the foaf:knows predicate is su�cient for a machine to determine that this relation is symmetric, and thatdbpedia:Bill_Clinton and dbpedia:Al_Gore are instances of foaf:Person—even though it might have never encoun-tered any of those iris before. Furthermore, should the foaf:knows property be unfamiliar, its iri can be dereferenced tofind this information expressed in ontological predicates. Knowledge of these predicates in turn allows an interpretationof foaf:knows and hence the aforementioned derivation. We herein recognize two characteristics in particular:

• The information is structured in a bottom-up way: machines interpret a larger unit of information through its piecesinstead of interpreting the pieces through the whole (while humans are capable of doing both simultaneously).

• Each piece in the unit is self-descriptive: anything needed to interpret a piece is contained within itself, with its iriacting as both an identifier and a direct handle towards additional interpretation mechanisms. No external resourceis required beforehand, given the knowledge of a limited set of basic concepts.

This sharply contrasts with current practice for Web apis. Machines are assumed to interpret each api operation in its en-tirety, as such smaller pieces do not exist, and api descriptions—if present—are external documents that must be collectedand interpreted before consumption is possible. While this does not imply the inviability of such an approach, it raisesserious doubt as to whether that is the most e↵ective strategy towards automated Web api consumption by generic clients.

number of indexed Web APIs in ProgrammableWeb

Page 65: The Digital Cavemen of Linked Lascaux

Find resources in your domain and assign them an identifier.

http://bib.org/books/978-1-85604-964-1/

http://bib.org/authors/7356/

Page 66: The Digital Cavemen of Linked Lascaux

It’s just like building a web site.When a user comes, serve HTML.

http://bib.org/books/978-1-85604-964-1/

UGET

HTML

Page 67: The Digital Cavemen of Linked Lascaux

It’s just like building a web site.When a client comes, serve JSON.

http://bib.org/books/978-1-85604-964-1/

CGET

JSON

Page 68: The Digital Cavemen of Linked Lascaux

It’s just like building a web site.When a client comes, serve RDF.

http://bib.org/books/978-1-85604-964-1/

CGET

RDF

Page 69: The Digital Cavemen of Linked Lascaux

Content negotiation exists for a long time in HTTP.

http://bib.org/books/978-1-85604-964-1/

CGET

RDF

Resource

Representation

Page 70: The Digital Cavemen of Linked Lascaux

This allows constant URIseven with future changes.

http://bib.org/books/978-1-85604-964-1/

CGET

RDF 2.0

Page 71: The Digital Cavemen of Linked Lascaux

It enables different users andmachines to talk about things.

http://bib.org/books/978-1-85604-964-1/

CU

C

Page 72: The Digital Cavemen of Linked Lascaux

The best API is no API. Your website is already an API.

Developers like to build complicated APIs.

API keys are especially cool to build.

Every feature and change comes with a high cost.

If you ask for an API, you’ll get one.

Ask for new representations of your resources instead.

Page 73: The Digital Cavemen of Linked Lascaux

Changes

Constants

Promises

The Digital Cavemenof Linked Lascaux

Page 74: The Digital Cavemen of Linked Lascaux

Promises

Web Data

Integration

Scalability

Page 75: The Digital Cavemen of Linked Lascaux

Promises

Web Data

Integration

Scalability

Page 76: The Digital Cavemen of Linked Lascaux

The Semantic Web promiseddata on the Web.

85,567,007,302 triples from 3,426 datasets

LODStats

38,606,408,765 from 657,896 entries

LOD Laundromat

Page 77: The Digital Cavemen of Linked Lascaux

How much of this datacan we readily access?

data dumps

Linked Data documents

SPARQL endpoints

Page 78: The Digital Cavemen of Linked Lascaux

A data dump means downloading everything and querying locally.

Page 79: The Digital Cavemen of Linked Lascaux

A data dump means downloading everything and querying locally.

When was the last timeyou downloaded the full Wikipedia just because you had one question?

Page 80: The Digital Cavemen of Linked Lascaux

Dumps are not Web querying. It’s kind of like giving up.

Semantic Web ⇒ Semantic Basement?

What advantage do we havecompared to Big Data?

Still the RDF data model…

But the major difference is Web.

Page 81: The Digital Cavemen of Linked Lascaux

Linked Data documents allow you to traverse a dataset.

Page 82: The Digital Cavemen of Linked Lascaux

Linked Data documents allow you to traverse a dataset.

That’s similar to what we also do:consume information on Wikipedia by following links.

Page 83: The Digital Cavemen of Linked Lascaux

Much Linked Data is availableusing the well-known principles.

Servers publish a light-weight interface.

Clients follow their noseto retrieve information.

Page 84: The Digital Cavemen of Linked Lascaux

Linked Data documents allow query evaluation on the Web.

# Other books by the same author SELECT DISTINCT ?book WHERE { books:85604 dc:creator ?author. ?book dc:creator ?author. }

Page 85: The Digital Cavemen of Linked Lascaux

Some queries are hardor impossible to evaluate.

# Books about Hamburg SELECT DISTINCT ?book ?author WHERE { ?book dc:subject dbpedia:Hamburg. ?book dc:creator ?author.}

Page 86: The Digital Cavemen of Linked Lascaux

SPARQL endpoints allow you to ask any question you want.

Page 87: The Digital Cavemen of Linked Lascaux

SPARQL endpoints allow you to ask any question you want.

When was the last timeyou expected Wikipedia to answer specific questions automatically for you?

Page 88: The Digital Cavemen of Linked Lascaux

A public SPARQL endpoint happily answers this query.

# Other books by the same author SELECT DISTINCT ?book WHERE { books:85604 dc:creator ?author. ?book dc:creator ?author. }

Page 89: The Digital Cavemen of Linked Lascaux

A public SPARQL endpoint also happily answers this query.

# Books about Hamburg SELECT DISTINCT ?book ?author WHERE { ?book dc:subject dbpedia:Hamburg. ?book dc:creator ?author.}

Page 90: The Digital Cavemen of Linked Lascaux

A public SPARQL endpoint also happily answers this query…SELECT DISTINCT ?drug ?drug1 ?drug2 ?drug3 ?drug4 ?d1 WHERE { ?drug1 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/drugCategory> <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugcategory/antibiotics> . ?drug2 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/drugCategory> <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugcategory/antiviralAgents> . ?drug3 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/drugCategory> <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugcategory/antihypertensiveAgents> . ?drug4 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/drugCategory> <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugcategory/anti-bacterialAgents> . ?drug1 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/target> ?o1 . ?o1 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/genbankIdGene> ?g1 . ?o1 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/locus> ?l1 . ?o1 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/molecularWeight> ?mw1 . ?o1 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/hprdId> ?hp1 . ?o1 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/swissprotName> ?sn1 . ?o1 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/proteinSequence> ?ps1 . ?o1 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/generalReference> ?gr1 . ?drug <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/target>?o1 . ?drug2 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/target> ?o2 . ?o1 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/genbankIdGene> ?g2 . ?o2 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/locus> ?l2 . ?o2 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/molecularWeight> ?mw2 . ?o2 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/hprdId> ?hp2 .

Page 91: The Digital Cavemen of Linked Lascaux

There’s a price to pay for beingthe most expressive HTTP interface.

The majority of public SPARQL endpoints has less than 95% uptime.

This means we cannot query themfor more than 1.5 days each month.

This means we cannot rely on themto build Linked Data applications.Buil-Aranda – Hogan – Umbrich – Vandenbussche SPARQL Web-Querying Infrastructure: Ready for Action?

Page 92: The Digital Cavemen of Linked Lascaux

Promises

Web Data

Integration

Scalability

Page 93: The Digital Cavemen of Linked Lascaux

The main promise of Linked Datais integration, preserving semantics.

1.1. INTRODUCTION 7

Tabular data Relational model

Meta-markup languages RDF

Each data item is structured asa line of field values. Fields arethe same for all items; a headerline can indicate their name.

Data are structured as tables, each ofwhich has its own set of attributes.Records in one table can relate to oth-ers by referencing their key column.

XML documents have a hierarchicalstructure, which gives them a tree-like appearance. Each element canhave one or more children; there isexactly one root element.

Each fact about a data item is expressedas a triple, which connects a subject toan object through a precise relationship.This leads to graph-structured data thatcan take any shape.

header

row

columnrelation

key column

attributes

table/entity

root

parent

child

siblings

propertysubject

object

Figure 1.1: Schematic comparison of the four major data models

Page 94: The Digital Cavemen of Linked Lascaux

Integration is the promise. But does it work on the Web?

data dumps

Linked Data documents

SPARQL endpoints

Page 95: The Digital Cavemen of Linked Lascaux

With data dumps, we justbuild a bigger basement.

How far do we go?

How do we keep data up to date?

Page 96: The Digital Cavemen of Linked Lascaux

With Linked Data documents, we keep on following our nose.

There are no dataset boundaries.

Some queries will remain hard.

Page 97: The Digital Cavemen of Linked Lascaux

With public SPARQL endpoints, problems become worse.

1 endpoint has 95% availability.

1.5 days down each month

2 endpoints have 90% availability.

3 days down each month

3 endpoints have 85% availability.

4.5 days down each month

Page 98: The Digital Cavemen of Linked Lascaux

Promises

Web Data

Integration

Scalability

Page 99: The Digital Cavemen of Linked Lascaux

Can we think differentlyabout Linked Data on the Web?

high server costlow server cost

datadump

SPARQLendpoint

high availability low availabilityhigh bandwidth low bandwidthout-of-date data live data

low client costhigh client cost

Linked Datadocuments

Page 100: The Digital Cavemen of Linked Lascaux

Can we think differentlyabout Linked Data on the Web?

datadump

SPARQLendpoint

Linked Datadocuments

? ?

Page 101: The Digital Cavemen of Linked Lascaux

Let us combine the lessons onchanges, constants, and promises.

An interface that withstands change,

simple enough so it doesn’t break

complex enough to query.

Page 102: The Digital Cavemen of Linked Lascaux

Let us combine the lessons onchanges, constants, and promises.

Data dumps contain too much.

SPARQL endpoint results are too specific.

Linked Data documents are unidirectional.

Page 103: The Digital Cavemen of Linked Lascaux

Each interface divides a dataset into Linked Data Fragments.

Data dumps: 1 huge fragment

SPARQL endpoints: ∞ specific fragments

Linked Data: 1 fragment per subject

Page 104: The Digital Cavemen of Linked Lascaux

Can we find a new interfacewith a sustainable balance?

Triple Pattern Fragments: 1 fragment per subject / predicate / object

Page 105: The Digital Cavemen of Linked Lascaux

Browse a dataset by triple pattern—no less, no more.

Page 106: The Digital Cavemen of Linked Lascaux

Machines can accessthe exact same interface as RDF.

Page 107: The Digital Cavemen of Linked Lascaux

Triple Pattern Fragments extendLinked Data documents with forms.

That’s even more similar to what we do: consume information on the Wikipedia by following links and using forms.

Page 108: The Digital Cavemen of Linked Lascaux

Machines solve complex queries by breaking them down.

# Other books by the same author SELECT DISTINCT ?book WHERE { books:85604 dc:creator ?author. ?book dc:creator ?author. }

Page 109: The Digital Cavemen of Linked Lascaux

Machines solve complex queries by breaking them down.

# Books about Hamburg SELECT DISTINCT ?book ?author WHERE { ?book dc:subject dbpedia:Hamburg. ?book dc:creator ?author.}

Page 110: The Digital Cavemen of Linked Lascaux

Promises can be kept, becausethe interface is intelligently light.

Publishing Linked Data that can be queried on the Webis realistic because the workload is divided.

The server doesn’t even need a triplestore.

Since the client is in charge,querying multiple sources is easy.

Page 111: The Digital Cavemen of Linked Lascaux

Promises are negotiated contracts so they always involve trade-offs.

Querying will be slower.

clients send many requests to answer a query

Query times are more consistent.

0.3 secs with a SPARQL endpoint… 95% of time

3 secs with Triple Pattern Fragments… 99.9% of time

Experiment with more complex interfaces.

Page 112: The Digital Cavemen of Linked Lascaux

Make your Linked Data queryable on the Web.

Several open-source implementations: linkeddatafragments.org/software/

Query one or multiple sources online: client.linkeddatafragments.org

Example: bit.ly/harvard-hamburg

Page 113: The Digital Cavemen of Linked Lascaux

Changes

Constants

Promises

The Digital Cavemenof Linked Lascaux

Page 114: The Digital Cavemen of Linked Lascaux

Identify the constants,separate them from changes.

Satisfy Linked Data needs with promises you can keep.

Page 115: The Digital Cavemen of Linked Lascaux

Simple enough to be usable,

complex enough to be useful.

Page 116: The Digital Cavemen of Linked Lascaux

Sustainability meanspromising the simplestuseful complexity.

Page 117: The Digital Cavemen of Linked Lascaux

@RubenVerborgh ruben.verborgh.org