151
Consuming Linked Data Juan F. Sequeda Department of Computer Science University of Texas at Austin SemTech 2010

Consuming Linked Data SemTech2010

Embed Size (px)

DESCRIPTION

This is a one hour talk introducing Linked Data and how Linked Data can be consumed by humans and by machines through SPARQL

Citation preview

Page 1: Consuming Linked Data SemTech2010

Consuming Linked Data

Juan F. SequedaDepartment of Computer Science

University of Texas at AustinSemTech 2010

Page 2: Consuming Linked Data SemTech2010

How many people are familiar with

• RDF• SPARQL• Linked Data• Web Architecture (HTTP, etc)

Page 3: Consuming Linked Data SemTech2010

History• Linked Data Design Issues by TimBL July 2006• Linked Open Data Project WWW2007• First LOD Cloud May 2007• 1st Linked Data on the Web Workshop WWW2008• 1st Triplification Challenge 2008• How to Publish Linked Data Tutorial ISWC2008• BBC publishes Linked Data 2008• 2nd Linked Data on the Web Workshop WWW2009• NY Times announcement SemTech2009 - ISWC09• 1st Linked Data-a-thon ISWC2009• 1st How to Consume Linked Data Tutorial ISWC2009• Data.gov.uk publishes Linked Data 2010• 2st How to Consume Linked Data Tutorial WWW2010• 1st International Workshop on Consuming Linked Data COLD2010• …

Page 4: Consuming Linked Data SemTech2010

May 2007

Page 5: Consuming Linked Data SemTech2010

Oct 2007

Page 6: Consuming Linked Data SemTech2010

Nov 2007 (1)

Page 7: Consuming Linked Data SemTech2010

Nov 2007 (2)

Page 8: Consuming Linked Data SemTech2010

Feb 2008

Page 9: Consuming Linked Data SemTech2010

Mar 2008

Page 10: Consuming Linked Data SemTech2010

Sept 2008

Page 11: Consuming Linked Data SemTech2010

Mar 2009 (1)

Page 12: Consuming Linked Data SemTech2010

Mar 2009 (2)

Page 13: Consuming Linked Data SemTech2010

July 2009

Page 14: Consuming Linked Data SemTech2010

June 2010

YOU GET THE PICTURE

ITS BIG and getting

BIGGER and

BIGGER

Page 15: Consuming Linked Data SemTech2010

Now what can we do with this data?

Page 16: Consuming Linked Data SemTech2010

Let’s consume it!

Page 17: Consuming Linked Data SemTech2010

The Modigliani Test

• Show me all the locations of all the original paintings of Modigliani

• Daniel Koller (@dakoller) showed that you can find this with a SPARQL query on DBpedia

Thanks Richard MacManus - ReadWriteWeb

Page 18: Consuming Linked Data SemTech2010
Page 19: Consuming Linked Data SemTech2010

Results of the Modigliani Test

• Atanas Kiryakov from Ontotext• Used LDSR – Linked Data Semantic Repository– Dbpedia– Freebase– Geonames– UMBEL– Wordnet

Published April 26, 2010: http://www.readwriteweb.com/archives/the_modigliani_test_for_linked_data.php

Page 20: Consuming Linked Data SemTech2010

SPARQL QueryPREFIX fb: http://rdf.freebase.com/ns/PREFIX dbpedia: http://dbpedia.org/resource/PREFIX dbp-prop: http://dbpedia.org/property/PREFIX dbp-ont: http://dbpedia.org/ontology/PREFIX umbel-sc: http://umbel.org/umbel/sc/PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#PREFIX ot: http://www.ontotext.com/SELECT DISTINCT ?painting_l ?owner_l ?city_fb_con ?city_db_loc ?

city_db_citWHERE { ?p fb:visual_art.artwork.artist dbpedia:Amedeo_Modigliani ;

fb:visual_art.artwork.owners [ fb:visual_art.artwork_owner_relationship.owner ?ow ] ; ot:preferredLabel ?painting_l. ?ow ot:preferredLabel ?owner_l . OPTIONAL { ?ow fb:location.location.containedby [ ot:preferredLabel ?city_fb_con ] } .

OPTIONAL { ?ow dbp-prop:location ?loc. ?loc rdf:type umbel-sc:City ; ot:preferredLabel ?city_db_loc }

OPTIONAL { ?ow dbp-ont:city [ ot:preferredLabel ?city_db_cit ] }}

Page 21: Consuming Linked Data SemTech2010
Page 22: Consuming Linked Data SemTech2010

Let’s start by making sure that we understand what Linked Data is…

Page 23: Consuming Linked Data SemTech2010

Do you SEARCH or do you FIND?

Page 24: Consuming Linked Data SemTech2010

Search for

Football Players who went to the University of Texas at Austin, played for

the Dallas Cowboys as Cornerback

Page 25: Consuming Linked Data SemTech2010
Page 26: Consuming Linked Data SemTech2010
Page 27: Consuming Linked Data SemTech2010
Page 28: Consuming Linked Data SemTech2010

Why can’t we just FIND it…

Page 29: Consuming Linked Data SemTech2010
Page 30: Consuming Linked Data SemTech2010
Page 31: Consuming Linked Data SemTech2010

Guess how I FOUND out?

Page 32: Consuming Linked Data SemTech2010

I’ll tell you how I did NOT find it

Page 33: Consuming Linked Data SemTech2010

Current Web = internet + links + docs

Page 34: Consuming Linked Data SemTech2010

So what is the problem?

• We aren’t always interested in documents– We are interested in THINGS– These THINGS might be in documents

• We can read a HTML document rendered in a browser and find what we are searching for– This is hard for computers. – Computers have to guess (even though they are

pretty good at it)

Page 35: Consuming Linked Data SemTech2010

What do we need to do?

• Make it easy for computers/software to find THINGS

Page 36: Consuming Linked Data SemTech2010

How can we do that?

• Besides publishing documents on the web– which computers can’t understand easily

• Let’s publish something that computers can understand

Page 37: Consuming Linked Data SemTech2010

RAW DATA!

Page 38: Consuming Linked Data SemTech2010

But wait… don’t we do that already?

Page 39: Consuming Linked Data SemTech2010

Current Data on the Web

• Relational Databases• APIs• XML• CSV• XLS• …• Can’t computers and applications already

consume that data on the web?

Page 40: Consuming Linked Data SemTech2010

True! But it is all in different formats and data models!

Page 41: Consuming Linked Data SemTech2010

This makes it hard to integrate data

Page 42: Consuming Linked Data SemTech2010

The data in different data sources aren’t linked

Page 43: Consuming Linked Data SemTech2010

For example, how do I know that the Juan Sequeda in Facebook is the same as Juan

Sequeda in Twitter

Page 44: Consuming Linked Data SemTech2010

Or if I create a mashup from different services, I have to learn different APIs and I get different

formats of data back

Page 45: Consuming Linked Data SemTech2010

Wouldn’t it be great if we had a standard way of publishing data on the Web?

Page 46: Consuming Linked Data SemTech2010

We have a standardized way of publishing documents on the web, right?

HTML

Page 47: Consuming Linked Data SemTech2010

Then why can’t we have a standard way of publishing data on the Web?

Page 48: Consuming Linked Data SemTech2010

Good question! And the answer is YES. There is!

Page 49: Consuming Linked Data SemTech2010

Resource Description Framework (RDF)

• A data model – A way to model data– i.e. Relational databases use relational data model

• RDF is a triple data model• Labeled Graph• Subject, Predicate, Object• <Juan> <was born in> <California>• <California> <is part of> <the USA>• <Juan> <likes> <the Semantic Web>

Page 50: Consuming Linked Data SemTech2010

RDF can be serialized in different ways

• RDF/XML• RDFa (RDF in HTML)• N3• Turtle• JSON

Page 51: Consuming Linked Data SemTech2010

So does that mean that I have to publish my data in RDF now?

Page 52: Consuming Linked Data SemTech2010

You don’t have to… but we would like you to

Page 53: Consuming Linked Data SemTech2010

An example

Page 54: Consuming Linked Data SemTech2010

Document on the Web

Page 55: Consuming Linked Data SemTech2010

Databases back up documents

Isbn Title Author PublisherID ReleasedData

978-0-596-15381-6

Programming the Semantic Web

Toby Segaran 1 July 2009

… … … … …

PublisherID PublisherName

1 O’Reilly Media

… …

This is a THING:A book title “Programming the Semantic Web” by Toby Segaran, …

THINGS have PROPERTIES:A Book as a Title, an author, …

Page 56: Consuming Linked Data SemTech2010

Lets represent the data in RDF

book

Programming the Semantic Web

978-0-596-15381-6

Toby Segaran

Publisher O’Reilly

title

name

author

publisher

isbn

Isbn Title Author PublisherID ReleasedData

978-0-596-15381-6

Programming the Semantic Web

Toby Segaran

1 July 2009

PublisherID PublisherName

1 O’Reilly Media

Page 57: Consuming Linked Data SemTech2010

Remember that we are on the web

Everything on the web is identified by a URI

Page 58: Consuming Linked Data SemTech2010

And now let’s link the data to other data

http://…/isbn978

Programming the Semantic Web

978-0-596-15381-6

Toby Segaran

http://…/publisher1 O’Reilly

title

name

author

publisher

isbn

Page 59: Consuming Linked Data SemTech2010

And now consider the data from Revyu.com

http://…/isbn978

http://…/

review1

Awesome Book

http://…/

reviewer

Juan Sequeda

hasReview

reviewer

description

name

Page 60: Consuming Linked Data SemTech2010

Let’s start to link data

http://…/isbn978

Programming the Semantic Web

978-0-596-15381-6

Toby Segaran

http://…/publisher1 O’Reilly

title

name

author

publisher

isbn

http://…/isbn978

sameAs

http://…/

review1

Awesome Book

http://…/

reviewer

Juan Sequeda

hasReview

hasReviewer

description

name

Page 61: Consuming Linked Data SemTech2010

Juan Sequeda publishes data too

http://juansequeda.

com/id

livesIn

Juan Sequedaname

http://dbpedia.org/Austin

Page 62: Consuming Linked Data SemTech2010

Let’s link more datahttp://…/isbn978

http://…/

review1

Awesome Book

http://…/

reviewer

Juan Sequeda

http://juansequeda.

com/id

hasReview

hasReviewer

description

name

sameAs

livesIn

Juan Sequedaname

http://dbpedia.org/Austin

Page 63: Consuming Linked Data SemTech2010

And more

http://…/isbn978

Programming the Semantic Web

978-0-596-15381-6

Toby Segaran

http://…/publisher1

O’Reilly

title

name

author

publisher

isbn

http://…/isbn978

sameAs

http://…/

review1

Awesome Book

http://…/

reviewer

Juan Sequeda

http://juansequeda.

com/id

hasReview

hasReviewer

description

name

sameAs

livesIn

Juan Sequedaname

http://dbpedia.org/Austin

Page 64: Consuming Linked Data SemTech2010

Data on the Web that is in RDF and is linked to other RDF data is LINKED

DATA

Page 65: Consuming Linked Data SemTech2010

Linked Data Principles

1. Use URIs as names for things

2. Use HTTP URIs so that people can look up (dereference) those names.

3. When someone looks up a URI, provide useful information.

4. Include links to other URIs so that they can discover more things.

Page 66: Consuming Linked Data SemTech2010

Linked Data makes the web appear as ONE

GIANTHUGE

GLOBAL

DATABASE!

Page 67: Consuming Linked Data SemTech2010

I can query a database with SQL. Is there a way to query Linked Data with a query language?

Page 68: Consuming Linked Data SemTech2010

Yes! There is actually a standardize language for that

SPARQL

Page 69: Consuming Linked Data SemTech2010

FIND all the reviews on the book “Programming the Semantic Web” by people who live in

Austin

Page 70: Consuming Linked Data SemTech2010

http://…/isbn978

Programming the Semantic Web

978-0-596-15381-6

Toby Segaran

http://…/publisher1 O’Reilly

title

name

author

publisher

isbn

http://…/isbn978

sameAs

http://…/

review1

Awesome Book

http://…/

reviewer

Juan Sequeda

http://juansequeda.

com

hasReview

hasReviewer

description

name

sameAs

livesIn

Juan Sequedaname

http://dbpedia.org/Austin

Page 71: Consuming Linked Data SemTech2010

This looks cool, but let’s be realistic. What is the incentive to publish Linked Data?

Page 72: Consuming Linked Data SemTech2010

What was your incentive to publish an HTML page in 1990?

Page 73: Consuming Linked Data SemTech2010

1) Share data in documents2) Because you neighbor was doing it

Page 74: Consuming Linked Data SemTech2010

So why should we publish Linked Data in 2010?

Page 75: Consuming Linked Data SemTech2010

1) Share data as data2) Because you neighbor is doing it

Page 76: Consuming Linked Data SemTech2010

And guess who is starting to publish Linked Data now?

Page 77: Consuming Linked Data SemTech2010

Linked Data Publishers

• UK Government• US Government• BBC• Open Calais – Thomson Reuters• Freebase• NY Times• Best Buy• CNET• Dbpedia• Are you?

Page 78: Consuming Linked Data SemTech2010

How can I publish Linked Data?

Page 79: Consuming Linked Data SemTech2010

Publishing Linked Data• Legacy Data in Relational Databases– D2R Server– Virtuoso– Triplify– Ultrawrap

• CMS– Drupal 7

• Native RDF Stores– Databases for RDF (Triple Stores)

• AllegroGraph, Jena, Sesame, Virtuoso

– Talis Platform (Linked Data in the Cloud)• In HTML with RDFa

Page 80: Consuming Linked Data SemTech2010

Consuming Linked Data by Humans

Page 81: Consuming Linked Data SemTech2010

HTML Browsers

Page 82: Consuming Linked Data SemTech2010

Links to other URIs

Page 83: Consuming Linked Data SemTech2010

<span rel="foaf:interest"><a href="http://dbpedia.org/resource/Database"

property="dcterms:title">Database</a>, <a href="http://dbpedia.org/resource/Data_integration" property="dcterms:title">Data Integration</a>, <a href="http://dbpedia.org/resource/Semantic_Web" property="dcterms:title">Semantic Web</a>, <a href="http://dbpedia.org/resource/Linked_Data" property="dcterms:title">Linked Data</a>, etc.</span>

Page 84: Consuming Linked Data SemTech2010

HTML Browsers

• RDF can be serialized in RDFa• Have you heard of– Yahoo’s Search Monkey– Google Rich Snippets?

• They are consuming RDFa• But WHY?

Page 85: Consuming Linked Data SemTech2010

Because there is life beyond ten blue links

Page 86: Consuming Linked Data SemTech2010
Page 87: Consuming Linked Data SemTech2010

Google and Yahoo are starting to crawl RDFa!

The Semantic Web is a reality!

Page 88: Consuming Linked Data SemTech2010

The Reality

• Yahoo is crawling data that is in RDFa and Microformats under a specific vocabularies – FOAF– GoodRelations– …

• Google is crawling RDFa and Microformats that use the Google vocabulary

Page 89: Consuming Linked Data SemTech2010

Linked Data Browsers

Page 90: Consuming Linked Data SemTech2010

Linked Data Browsers

• Not actually separate browsers. Run inside of HTML browsers

• View the data that is returned after looking up a URI in tabular form

• (IMO) UI lacks usability

Page 91: Consuming Linked Data SemTech2010
Page 92: Consuming Linked Data SemTech2010

Linked Data Browsers

• Tabulator– http://www.w3.org/2005/ajar/tab

• OpenLink– http://ode.openlinksw.com/

• Zitgist Dataviewr– http://dataviewer.zitgist.com/

• Marbles– http://www5.wiwiss.fu-berlin.de/marbles/

• Explorator– http://www.tecweb.inf.puc-rio.br/explorator

Page 93: Consuming Linked Data SemTech2010

Faceted Browsers

Page 94: Consuming Linked Data SemTech2010

http://dbpedia.neofonie.de

Page 95: Consuming Linked Data SemTech2010

http://dev.semsol.com/2010/semtech/

Page 96: Consuming Linked Data SemTech2010

On-the-fly Mashups

Page 97: Consuming Linked Data SemTech2010

http://sig.ma

Page 98: Consuming Linked Data SemTech2010

What’s next?

Page 99: Consuming Linked Data SemTech2010

Time to create new and innovative ways to interact with Linked Data

Page 100: Consuming Linked Data SemTech2010

This may be one of the Killer Apps that we have all been waiting for

http://en.wikipedia.org/wiki/File:Mosaic_browser_plaque_ncsa.jpg

Page 101: Consuming Linked Data SemTech2010

It’s time to partner with HCI community

Semantic Web UIs don’t have to be ugly

Page 102: Consuming Linked Data SemTech2010

Consume Linked Data with SPARQL

Page 103: Consuming Linked Data SemTech2010

SPARQL Endpoints

• Linked Data sources usually provide a SPARQL endpoint for their dataset(s)

• SPARQL endpoint: SPARQL query processing service that supports the SPARQL protocol*

• Send your SPARQL query, receive the result

* http://www.w3.org/TR/rdf-sparql-protocol/

Page 104: Consuming Linked Data SemTech2010

Where can I find SPARQL Endpoints?

• Dbpedia: http://dbpedia.org/sparql

• Musicbrainz: http://dbtune.org/musicbrainz/sparql

• U.S. Census: http://www.rdfabout.com/sparql

• Semantic Crunchbase: http://cb.semsol.org/sparql

• http://esw.w3.org/topic/SparqlEndpoints

Page 105: Consuming Linked Data SemTech2010

Accessing a SPARQL Endpoint

• SPARQL endpoints: RESTful Web services• Issuing SPARQL queries to a remote SPARQL

endpoint is basically an HTTP GET request to the SPARQL endpoint with parameter query

GET /sparql?query=PREFIX+rd... HTTP/1.1 Host: dbpedia.org User-agent: my-sparql-client/0.1

URL-encoded string with the SPARQL query

Page 106: Consuming Linked Data SemTech2010

Query Results Formats

• SPARQL endpoints usually support different result formats:– XML, JSON, plain text

(for ASK and SELECT queries)– RDF/XML, NTriples, Turtle, N3

(for DESCRIBE and CONSTRUCT queries)

Page 107: Consuming Linked Data SemTech2010

Query Results Formats

PREFIX dbp: http://dbpedia.org/ontology/PREFIX dbpprop: http://dbpedia.org/property/SELECT ?name ?bday WHERE { ?p dbp:birthplace <http://dbpedia.org/resource/Berlin> . ?p dbpprop:dateOfBirth ?bday . ?p dbpprop:name ?name .}

Page 108: Consuming Linked Data SemTech2010
Page 109: Consuming Linked Data SemTech2010
Page 110: Consuming Linked Data SemTech2010

Query Result Formats

• Use the ACCEPT header to request the preferred result format:

GET /sparql?query=PREFIX+rd... HTTP/1.1 Host: dbpedia.org User-agent: my-sparql-client/0.1 Accept: application/sparql-results+json

Page 111: Consuming Linked Data SemTech2010

Query Result Formats

• As an alternative some SPARQL endpoint implementations (e.g. Joseki) provide an additional parameter out

GET /sparql?out=json&query=... HTTP/1.1 Host: dbpedia.org User-agent: my-sparql-client/0.1

Page 112: Consuming Linked Data SemTech2010

Accessing a SPARQL Endpoint

• More convenient: use a library• SPARQL JavaScript Library

– http://www.thefigtrees.net/lee/blog/2006/04 sparql_calendar_demo_a_sparql.html

• ARC for PHP– http://arc.semsol.org/

• RAP – RDF API for PHP– http://www4.wiwiss.fu-berlin.de/bizer/rdfapi/index.html

Page 113: Consuming Linked Data SemTech2010

Accessing a SPARQL Endpoint

• Jena / ARQ (Java)– http://jena.sourceforge.net/

• Sesame (Java)– http://www.openrdf.org/

• SPARQL Wrapper (Python)– http://sparql-wrapper.sourceforge.net/

• PySPARQL (Python)– http://code.google.com/p/pysparql/

Page 114: Consuming Linked Data SemTech2010

Accessing a SPARQL Endpoint

Example with Jena/ARQimport com.hp.hpl.jena.query.*;

String service = "..."; // address of the SPARQL endpoint String query = "SELECT ..."; // your SPARQL query QueryExecution e =

QueryExecutionFactory.sparqlService(service, query)

ResultSet results = e.execSelect(); while ( results.hasNext() ) {

QuerySolution s = results.nextSolution(); // ...

}

e.close();

Page 115: Consuming Linked Data SemTech2010

• Querying a single dataset is quite boringcompared to:

• Issuing SPARQL queries over multiple datasets

• How can you do this?1. Issue follow-up queries to different endpoints2. Querying a central collection of datasets3. Build store with copies of relevant datasets4. Use query federation system

Page 116: Consuming Linked Data SemTech2010

Follow-up Queries

• Idea: issue follow-up queries over other datasets based on results from previous queries

• Substituting placeholders in query templates

Page 117: Consuming Linked Data SemTech2010

String s1 = "http://cb.semsol.org/sparql"; String s2 = "http://dbpedia.org/sparql";

String qTmpl = "SELECT ?c WHERE{ <%s> rdfs:comment ?c }";String q1 = "SELECT ?s WHERE { ..."; QueryExecution e1 = QueryExecutionFactory.sparqlService(s1,q1); ResultSet results1 = e1.execSelect(); while ( results1.hasNext() ) {

QuerySolution s1 = results.nextSolution(); String q2 = String.format( qTmpl, s1.getResource("s"),getURI()

); QueryExecution e2=

QueryExecutionFactory.sparqlService(s2,q2); ResultSet results2 = e2.execSelect(); while ( results2.hasNext() ) {

// ... }e2.close();

}e1.close();

Find a list of companies Filtered by some criteria and return Dbpedia URIs from them

Page 118: Consuming Linked Data SemTech2010

Follow-up Queries

• Advantage– Queried data is up-to-date

• Drawbacks– Requires the existence of a SPARQL endpoint for

each dataset– Requires program logic– Very inefficient

Page 119: Consuming Linked Data SemTech2010

Querying a Collection of Datasets

• Idea: Use an existing SPARQL endpoint that provides access to a set of copies of relevant datasets

• Example:– SPARQL endpoint over a majority of datasets from

the LOD cloud at:

http://lod.openlinksw.com/sparql

http://uberblic.org

Page 120: Consuming Linked Data SemTech2010

Querying a Collection of Datasets

• Advantage:– No need for specific program logic

• Drawbacks:– Queried data might be out of date – Not all relevant datasets in the collection

Page 121: Consuming Linked Data SemTech2010

Own Store of Dataset Copies

• Idea: Build your own store with copies of relevant datasets and query it

• Possible stores:– Jena TDB http://jena.hpl.hp.com/wiki/TDB– Sesame http://www.openrdf.org/– OpenLink Virtuoso http://virtuoso.openlinksw.com/– 4store http://4store.org/– AllegroGraph http://www.franz.com/agraph/ – etc.

Page 122: Consuming Linked Data SemTech2010

Populating Your Store

• Get RDF dumps provided for the datasets• (Focused) Crawling

• ldspider http://code.google.com/p/ldspider/– Multithreaded API for focussed crawling– Crawling strategies (breath-first, load-balancing)– Flexible configuration with callbacks and hooks

Page 123: Consuming Linked Data SemTech2010

Own Store of Dataset Copies

• Advantages:– No need for specific program logic – Can include all datasets– Independent of the existence, availability, and

efficiency of SPARQL endpoints• Drawbacks:– Requires effort to set up and to operate the store – Ideally, data sources provide RDF dumps; if not? – How to keep the copies in sync with the originals?– Queried data might be out of date

Page 124: Consuming Linked Data SemTech2010

Federated Query Processing

• Idea: Querying a mediator which distributes sub-queries to relevant sources and integrates the results

Page 125: Consuming Linked Data SemTech2010

Federated Query Processing

• Instance-based federation– Each thing described by only one data source – Untypical for the Web of Data

• Triple-based federation– No restrictions – Requires more distributed joins

• Statistics about datasets required (both cases)

Page 126: Consuming Linked Data SemTech2010

Federated Query Processing

• DARQ (Distributed ARQ)– http://darq.sourceforge.net/ – Query engine for federated SPARQL queries– Extension of ARQ (query engine for Jena)– Last update: June 28, 2006

• Semantic Web Integrator and Query Engine(SemWIQ)– http://semwiq.sourceforge.net/– Actively maintained

Page 127: Consuming Linked Data SemTech2010

Federated Query Processing

• Advantages:– No need for specific program logic – Queried data is up to date

• Drawbacks:– Requires the existence of a SPARQL endpoint for

each dataset– Requires effort to set up and configure the

mediator

Page 128: Consuming Linked Data SemTech2010

In any case:

• You have to know the relevant data sources– When developing the app using follow-up queries– When selecting an existing SPARQL endpoint over

a collection of dataset copies– When setting up your own store with a collection

of dataset copies– When configuring your query federation system

• You restrict yourself to the selected sources

Page 129: Consuming Linked Data SemTech2010

In any case:

• You have to know the relevant data sources– When developing the app using follow-up queries– When selecting an existing SPARQL endpoint over

a collection of dataset copies– When setting up your own store with a collection

of dataset copies– When configuring your query federation system

• You restrict yourself to the selected sourcesThere is an alternative:

Remember, URIs link to data

Page 130: Consuming Linked Data SemTech2010

Automated Link Traversal

• Idea: Discover further data by looking up relevant URIs in your application

• Can be combined with the previous approaches

Page 131: Consuming Linked Data SemTech2010

Link Traversal Based Query Execution

• Applies the idea of automated link traversal to the execution of SPARQL queries

• Idea:– Intertwine query evaluation with traversal of RDF links– Discover data that might contribute to query results

during query execution• Alternately:– Evaluate parts of the query – Look up URIs in intermediate solutions

Page 132: Consuming Linked Data SemTech2010

Link Traversal Based Query Execution

Page 133: Consuming Linked Data SemTech2010

Link Traversal Based Query Execution

Page 134: Consuming Linked Data SemTech2010

Link Traversal Based Query Execution

Page 135: Consuming Linked Data SemTech2010

Link Traversal Based Query Execution

Page 136: Consuming Linked Data SemTech2010

Link Traversal Based Query Execution

Page 137: Consuming Linked Data SemTech2010

Link Traversal Based Query Execution

Page 138: Consuming Linked Data SemTech2010

Link Traversal Based Query Execution

Page 139: Consuming Linked Data SemTech2010

Link Traversal Based Query Execution

Page 140: Consuming Linked Data SemTech2010

Link Traversal Based Query Execution

Page 141: Consuming Linked Data SemTech2010

Link Traversal Based Query Execution

Page 142: Consuming Linked Data SemTech2010

Link Traversal Based Query Execution

• Advantages:– No need to know all data sources in advance– No need for specific programming logic– Queried data is up to date– Does not depend on the existence of SPARQL

endpoints provided by the data sources• Drawbacks:– Not as fast as a centralized collection of copies– Unsuitable for some queries– Results might be incomplete (do we care?)

Page 143: Consuming Linked Data SemTech2010

Implementations

• Semantic Web Client library (SWClLib) for Javahttp://www4.wiwiss.fu-berlin.de/bizer/ng4j/semwebclient/• SWIC for Prologhttp://moustaki.org/swic/

Page 144: Consuming Linked Data SemTech2010

Implementations

• SQUIN http://squin.org – Provides SWClLib functionality as a Web service– Accessible like a SPARQL endpoint– Install package: unzip and start• Less than 5 mins!

– Convenient access with SQUIN PHP tools:

$s = 'http:// ...'; // address of the SQUIN service $q = new SparqlQuerySock( $s, '... SELECT ...' ); $res = $q->getJsonResult();// or getXmlResult()

Page 145: Consuming Linked Data SemTech2010

Real World Example

Page 146: Consuming Linked Data SemTech2010

Getting Started

• Finding URIs• Finding Additional Data• Finding SPARQL Endpoints

Page 147: Consuming Linked Data SemTech2010

What is a Linked Data application

• Software system that makes use of data on the web from multiple datasets and that benefits from links between the datasets

Page 148: Consuming Linked Data SemTech2010

Characteristics of Linked Data Applications

• Consume data that is published on the web following the Linked Data principles: an application should be able to request, retrieve and process the accessed data

• Discover further information by following the links between different data sources: the fourth principle enables this.

• Combine the consumed linked data with data from sources (not necessarily Linked Data)

• Expose the combined data back to the web following the Linked Data principles

• Offer value to end-users

Page 149: Consuming Linked Data SemTech2010

Examples

• http://data-gov.tw.rpi.edu/wiki• http://dbrec.net/• http://fanhu.bz/• http://data.nytimes.com/schools/schools.html• http://sig.ma • http://visinav.deri.org/semtech2010/

Page 150: Consuming Linked Data SemTech2010

Hot Research Topics• Interlinking Algorithms• Provenance and Trust• Dataset Dynamics• UI• Distributed Query• Evaluation– “You want a good thesis? IR is based on precision and recall.

The minute you add semantics, it is a meaningless feature. Logic is based on soundness and completeness. We don’t want soundness and completeness. We want a few good answers quickly.” – Jim Hendler at WWW2009 during the LOD gathering

Thanks Michael Hausenblas

Page 151: Consuming Linked Data SemTech2010

THANKS

Juan Sequedawww.juansequeda.com

@juansequeda#cold

www.consuminglinkeddata.org

Acknowledgements: Olaf Hartig, Patrick Sinclair, Jamie Taylor

Slides for Consuming Linked Data with SPARQL by Olaf Hartig