65
The future is federated Ruben Verborgh

The Future is Federated

Embed Size (px)

Citation preview

The future

is federated

Ruben Verborgh

Big DataI think

is boring.

Big Data thriveson centralization.

Knowledgeis inherently distributed.

Knowledgeis inherently heterogeneous.

Knowledge on the Webis inherently linked.

Centralizationskips

interestingthe most

problems

Where to find data you need?

How to access them?

How to integrate them?

Let’s create smart appsover VIVO and Web data.

a light interface to VIVO data

queries over that interface

an app built on such queries

You’ll get to see 3 things:

We can integratemultiple data sourceson the live Web,but we need to setour expectations right.

The future

is federated

Big Data fails at Web scaleLight interfaces ruleEngineer for serendipity

The future

is federated

Big Data fails at Web scaleLight interfaces ruleEngineer for serendipity

RDFTHE DATA LANGUAGE

<subject> <predicate> <object>.

triple

SPARQLTHE QUERY LANGUAGE

SPARQLTHE PROTOCOL

clientSPARQL

endpointSPARQL protocol

SPARQLquery

SELECT ?person ?name WHERE { ?person a dbo:Scientist. ?person rdfs:label ?name. ?person dbo:birthPlace dbp:Denver. }

Hey, SPARQL endpoint…

Sure!

SELECT DISTINCT ?drug ?drug1 ?drug2 ?drug3 ?drug4 ?d1 WHERE { ?drug1 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/drugCategory> <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugcategory/antibiotics> . ?drug2 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/drugCategory> <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugcategory/antiviralAgents> . ?drug3 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/drugCategory> <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugcategory/antihypertensiveAgents> . ?drug4 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/drugCategory> <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugcategory/anti-bacterialAgents> . ?drug1 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/target> ?o1 . ?o1 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/genbankIdGene> ?g1 . ?o1 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/locus> ?l1 . ?o1 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/molecularWeight> ?mw1 . ?o1 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/hprdId> ?hp1 . ?o1 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/swissprotName> ?sn1 . ?o1 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/proteinSequence> ?ps1 . ?o1 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/generalReference> ?gr1 . ?drug <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/target>?o1 . ?drug2 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/target> ?o2 . ?o1 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/genbankIdGene> ?g2 . ?o2 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/locus> ?l2 . ?o2 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/molecularWeight> ?mw2 . ?o2 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/hprdId> ?hp2 . ?o2 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/swissprotName> ?sn2 . ?o2 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/proteinSequence> ?ps2 . ?o2 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/generalReference> ?gr2 . ?drug <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/target>?o2 . ?drug3 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/target> ?o3 .

Hey, SPARQL endpoint…

Sure!

SPARQL endpointstry to be the Web’sBig Data processors.

for free

few endpoints exist

the average endpoint isdown for 1.5 days/month

Can I SPARQLyour endpoint?

Big Data failsat Web scalebecause Web Scaleis much bigger.

SEMANTIC WEBSHOULDN’T TRY TO COMPETE WITH

BIG DATA

WEBI WANT TO PUT THE

BACK INTO SEMANTIC WEB

IT’S OUR MAIN DIFFERENTIATORFROM BIG DATA

WEBIF IT’S NOT

I’M NOT INTERESTED

That’s why I thinkBig Data is boring.

The future

is federated

Big Data fails at Web scale

Light interfaces ruleEngineer for serendipity

AVERAGEHUMANWhat would the

do?

SELECT ?person ?name WHERE { ?person a dbo:Scientist. ?person rdfs:label ?name. ?person dbo:birthPlace dbp:Denver. }

AVERAGE HUMAN

You can use only Wikipedia.

AVERAGE HUMAN

Which scientists were born in Denver?

You can use only Wikipedia.

AVERAGE HUMAN1. visit the page about Denver 2. make a list of people born there 3. read their pages to see if they’re a scientist

You can use only Wikipedia.

WEB LINKINGIS UNIDIRECTIONALa Denver person’s page links to DenverDenver doesn’t necessarily link to that person

AVERAGE HUMAN1. visit the page about Denver 2. make a list of people born there 3. read their pages to see if they’re a scientist

You can use only Wikipedia.

AVERAGEHUMANWe need to empower the

but please not with a SPARQL endpoint because they’re so expensive to keep up.

SIMPLESTCOMPLEXITYWHAT IS THE ?

THE ESSENCEOF RDF

<subject> <predicate> <object>.

THE ESSENCEOF LINKED DATA

?subject <predicate> <object>.

THE ESSENCEOF LINKED DATA

Denver <predicate> <object>.

THE ESSENCEOF TPF

?subject ?predicate ?object.

THE ESSENCEOF TPF

?subject ?predicate Denver.

TRIPLEPATTERNFRAGMENTS

Clients can askthe server onlyfor triple patterns.

AVERAGE HUMAN

Which scientists were born in Denver?

You can only use a TPF interface of DBpedia.

AVERAGE HUMAN1. “?people birthPlace Denver.” 2. “?person type Scientist.” 3. “?person fullName ?name.”

You can only use a TPF interface of DBpedia.

AVERAGE MACHINE1. “?person birthPlace Denver.” 2. “?person type Scientist.” 3. “?person fullName ?name.”

You can only use a TPF interface of DBpedia.

SELECT ?person ?name WHERE { ?person a dbo:Scientist. ?person rdfs:label ?name. ?person dbo:birthPlace dbp:Denver. }

AVERAGE MACHINE

You can only use a TPF interface of DBpedia.

The future

is federated

Big Data fails at Web scaleLight interfaces rule

Engineer for serendipity

Engineer for serendipity.—Roy T. Fielding

If 1 endpoint is downfor 1.5 days each month, then 2 endpoints might be for 3 days each month.

Federated queries withSPARQL endpointspose a problem.

Just ask each of the questions to different TPF servers.

Federated queries arenative to TPF clients.

But in federated scenarios,performance can be on par with SPARQL endpoints!

TPF trades server cost for query performance.

TPF is not the final solution —no API will ever be— but an excellent starting point.

Lightweight interfacesare easy to extend and combine with others.

The Memento protocolbrings time to the Web.

Ask for representations at a certain point in the past.

TPF and Mementoare a great match.

We combined them in collaboration with Herbert Van de Sompel & team at the Los Alamos National Laboratory.

The future

is federated

Big Data fails at Web scaleLight interfaces ruleEngineer for serendipity

VIVO

client SPARQL

VIVO today

TPFserver

VIVO

client TPF

VIVO tomorrow?

Federationis a game changer.

Federationis a game changer.

with the TPF interface

powerWith great

responsibilitycomes great

realisticWe need

expectationsabout our

to be

Some queries willalways be hardon an open Web.You might need centralization if you want answers fast.

*

*Terms and conditions apply.

…and streaming!

Many more queriesthan you’d thinkare pretty fast…

OPEN SOURCElinkeddatafragments.org

@RubenVerborgh

and it

starts today

The future

is federated