48
ISWC 2009 Tutorial "How to Consume Linked Data on the Web" Querying Linked Data with SPARQL

Querying Linked Data with SPARQL

Embed Size (px)

DESCRIPTION

This slideset was part of our "How to consume Linked Data" tutorial at the International Semantic Web Conference (ISWC), Oct. 2009

Citation preview

Page 1: Querying Linked Data with SPARQL

ISWC 2009 Tutorial "How to Consume Linked Data on the Web"

QueryingLinked Data

withSPARQL

Page 2: Querying Linked Data with SPARQL

ISWC 2009 Tutorial "How to Consume Linked Data on the Web"

Brief Introduction to SPARQL

● SPARQL: Query Language for RDF data● Main idea: pattern matching

● Describe subgraphs of the queried RDF graph● Subgraphs that match your description yield a result● Mean: graph patterns (i.e. RDF graphs /w variables)

?vhttp://.../Volcano

rdf:type

Page 3: Querying Linked Data with SPARQL

ISWC 2009 Tutorial "How to Consume Linked Data on the Web"

Brief Introduction to SPARQLQueriedgraph:

?vhttp://.../Volcano

rdf:type

http://.../Mount_Baker http://.../Volcanordf:type

"1880"

p:lastEruption

htp://.../Mount_Etna

rdf:type

?v

http://.../Mount_Bakerhttp://.../Mount_Etna

Results:

Page 4: Querying Linked Data with SPARQL

ISWC 2009 Tutorial "How to Consume Linked Data on the Web"

SPARQL Endpoints

● Linked data sources usually provide aSPARQL endpoint for their dataset(s)

● SPARQL endpoint: SPARQL query processing service that supports the SPARQL protocol*

● Send your SPARQL query, receive the result

* http://www.w3.org/TR/rdf-sparql-protocol/

Page 5: Querying Linked Data with SPARQL

ISWC 2009 Tutorial "How to Consume Linked Data on the Web"

SPARQL Endpoints

More complete list: http://esw.w3.org/topic/SparqlEndpoints

Data Source Endpoint Address

DBpedia http://dbpedia.org/sparql

Musicbrainz http://dbtune.org/musicbrainz/sparql

U.S. Census http://www.rdfabout.com/sparql

Semantic Crunchbase http://cb.semsol.org/sparql

Page 6: Querying Linked Data with SPARQL

ISWC 2009 Tutorial "How to Consume Linked Data on the Web"

Accessing a SPARQL Endpoint

● SPARQL endpoints: RESTful Web services● Issuing SPARQL queries to a remote SPARQL

endpoint is basically an HTTP GET request to the SPARQL endpoint with parameter query

GET /sparql?query=PREFIX+rd... HTTP/1.1Host: dbpedia.orgUser-agent: my-sparql-client/0.1

URL-encoded stringwith the SPARQL query

Page 7: Querying Linked Data with SPARQL

ISWC 2009 Tutorial "How to Consume Linked Data on the Web"

Query Results Formats

● SPARQL endpoints usually support different result formats:● XML, JSON, plain text

(for ASK and SELECT queries)● RDF/XML, NTriples, Turtle, N3

(for DESCRIBE and CONSTRUCT queries)

Page 8: Querying Linked Data with SPARQL

ISWC 2009 Tutorial "How to Consume Linked Data on the Web"

PREFIX dbp: <http://dbpedia.org/ontology/>PREFIX dbpprop: <http://dbpedia.org/property/>

SELECT ?name ?bday WHERE {?p dbp:birthplace <http://dbpedia.org/resource/Berlin> ; dbpprop:dateOfBirth ?bday ; dbpprop:name ?name .} name | bday ------------------------+------------ Alexander von Humboldt | 1769-09-14 Ernst Lubitsch | 1892-01-28 ...

Query Results Formats

Page 9: Querying Linked Data with SPARQL

ISWC 2009 Tutorial "How to Consume Linked Data on the Web"

<?xml version="1.0"?><sparql xmlns="http://www.w3.org/2005/sparql-results#"> <head> <variable name="name"/> <variable name="bday"/> </head> <results distinct="false" ordered="true"> <result> <binding name="name"> <literal xml:lang="en">Alexander von Humboldt</literal> </binding> <binding name="bday"> <literal datatype="http://www.w3.org/2001/XMLSchema#date">1769-09-14</literal> </binding> </result> <result> <binding name="name"> <literal xml:lang="en">Ernst Lubitsch</literal> </binding> <binding name="bday"> <literal datatype="http://www.w3.org/2001/XMLSchema#date">1892-01-28</literal> </binding> </result> <!-- … --> </results></sparql>

http://www.w3.org/TR/rdf-sparql-XMLres/

Page 10: Querying Linked Data with SPARQL

ISWC 2009 Tutorial "How to Consume Linked Data on the Web"

{

"head": { "link": [], "vars": ["name", "bday"] }, "results": { "distinct": false, "ordered": true, "bindings": [

{ "name": { "type": "literal", "xml:lang": "en",

"value": "Alexander von Humboldt" } , "bday": { "type": "typed-literal",

"datatype": "http://www.w3.org/2001/XMLSchema#date",

"value": "1769-09-14" } },

{ "name": { "type": "literal", "xml:lang": "en",

"value": "Ernst Lubitsch" } , "bday": { "type": "typed-literal",

"datatype": "http://www.w3.org/2001/XMLSchema#date", "value": "1892-01-28" }

},

// ... ] }

}

http://www.w3.org/TR/rdf-sparql-json-res/

Page 11: Querying Linked Data with SPARQL

ISWC 2009 Tutorial "How to Consume Linked Data on the Web"

Query Result Formats

● Use the ACCEPT header to request the preferred result format:

GET /sparql?query=PREFIX+rd... HTTP/1.1Host: dbpedia.orgUser-agent: my-sparql-client/0.1Accept: application/sparql-results+json

Page 12: Querying Linked Data with SPARQL

ISWC 2009 Tutorial "How to Consume Linked Data on the Web"

Query Result Formats

● As an alternative some SPARQL endpoint implementations (e.g. Joseki) provide an additional parameter out

GET /sparql?out=json&query=... HTTP/1.1Host: dbpedia.orgUser-agent: my-sparql-client/0.1

Page 13: Querying Linked Data with SPARQL

ISWC 2009 Tutorial "How to Consume Linked Data on the Web"

Accessing a SPARQL Endpoint

● More convenient: use a library● Libraries:

● SPARQL JavaScript Library http://www.thefigtrees.net/lee/blog/2006/04/sparql_calendar_demo_a_sparql.html

● ARC for PHPhttp://arc.semsol.org/

● RAP – RDF API for PHPhttp://www4.wiwiss.fu-berlin.de/bizer/rdfapi/index.html

Page 14: Querying Linked Data with SPARQL

ISWC 2009 Tutorial "How to Consume Linked Data on the Web"

Accessing a SPARQL Endpoint

● Libraries (cont.):● Jena / ARQ (Java) http://jena.sourceforge.net/● Sesame (Java) http://www.openrdf.org/● SPARQL Wrapper (Python)

http://sparql-wrapper.sourceforge.net/● PySPARQL (Python)

http://code.google.com/p/pysparql/

Page 15: Querying Linked Data with SPARQL

ISWC 2009 Tutorial "How to Consume Linked Data on the Web"

Accessing a SPARQL Endpoint

● Example with Jena / ARQ:

import com.hp.hpl.jena.query.*;

String service = "..."; // address of the SPARQL endpointString query = "SELECT ..."; // your SPARQL queryQueryExecution e = QueryExecutionFactory.sparqlService( service, query );ResultSet results = e.execSelect();while ( results.hasNext() ) {

QuerySolution s = results.nextSolution();// …

}e.close();

Page 16: Querying Linked Data with SPARQL

ISWC 2009 Tutorial "How to Consume Linked Data on the Web"

● Querying a single dataset is quite boring

compared to:● Issuing SPARQL queries over multiple datasets

● How can you do this?

1. Issue follow-up queries to different endpoints

2. Querying a central collection of datasets

3. Build store with copies of relevant datasets

4. Use query federation system

Page 17: Querying Linked Data with SPARQL

ISWC 2009 Tutorial "How to Consume Linked Data on the Web"

Follow-up Queries

● Idea: issue follow-up queries over other datasets based on results from previous queries

● Substituting placeholders in query templates

Page 18: Querying Linked Data with SPARQL

ISWC 2009 Tutorial "How to Consume Linked Data on the Web"

String s1 = "http://cb.semsol.org/sparql";String s2 = "http://dbpedia.org/sparql";

String qTmpl = "SELECT ?c WHERE{ <%s> rdfs:comment ?c }";

String q1 = "SELECT ?s WHERE { ...";QueryExecution e1 = QueryExecutionFactory.sparqlService(s1,q1);ResultSet results1 = e1.execSelect();while ( results1.hasNext() ) { QuerySolution s1 = results.nextSolution(); String q2 = String.format( qTmpl, s1.getResource("s"),getURI() ); QueryExecution e2= QueryExecutionFactory.sparqlService(s2,q2); ResultSet results2 = e2.execSelect(); while ( results2.hasNext() ) { // ... } e2.close();}e1.close();

Find a list of companiesfiltered by some criteria and

return DBpedia URIs of them

Page 19: Querying Linked Data with SPARQL

ISWC 2009 Tutorial "How to Consume Linked Data on the Web"

Follow-up Queries

● Advantage:● Queried data is up-to-date

● Drawbacks:● Requires the existence of a SPARQL endpoint for

each dataset● Requires program logic● Very inefficient

Page 20: Querying Linked Data with SPARQL

ISWC 2009 Tutorial "How to Consume Linked Data on the Web"

Querying a Collection of Datasets

● Idea: Use an existing SPARQL endpoint that provides access to a set of copies of relevant datasets

● Example:● SPARQL endpoint by OpenLink SW over a majority

of datasets from the LOD cloud at: http://lod.openlinksw.com/sparql

Page 21: Querying Linked Data with SPARQL

ISWC 2009 Tutorial "How to Consume Linked Data on the Web"

Querying a Collection of Datasets

● Advantage:● No need for specific program logic

● Drawbacks:● Queried data might be out of date● Not all relevant datasets in the collection

Page 22: Querying Linked Data with SPARQL

ISWC 2009 Tutorial "How to Consume Linked Data on the Web"

Own Store of Dataset Copies

● Idea: Build your own store with copies of relevant datasets and query it

● Possible stores:● Jena TDB http://jena.hpl.hp.com/wiki/TDB● Sesame http://www.openrdf.org/● OpenLink Virtuoso http://virtuoso.openlinksw.com/● 4store http://4store.org/● AllegroGraph http://www.franz.com/agraph/● etc.

Page 23: Querying Linked Data with SPARQL

ISWC 2009 Tutorial "How to Consume Linked Data on the Web"

Own Store of Dataset Copies

● Advantages:● No need for specific program logic● Can include all datasets● Independent of the existence, availability, and

efficiency of SPARQL endpoints

● Drawbacks:● Requires effort to set up and to operate the store● Ideally, data sources provide RDF dumps; if not?● How to keep the copies in sync with the originals?● Queried data might be out of date

Page 24: Querying Linked Data with SPARQL

ISWC 2009 Tutorial "How to Consume Linked Data on the Web"

Federated Query Processing

● Idea: Querying a mediator whichdistributes subqueries torelevant sources andintegrates the results

???

?

Page 25: Querying Linked Data with SPARQL

ISWC 2009 Tutorial "How to Consume Linked Data on the Web"

Federated Query Processing

● Instance-based federation● Each thing described by only one data source● Untypical for the Web of Data

● Triple-based federation● No restrictions● Requires more distributed joins

● Statistics about datasets requires (both cases)

Page 26: Querying Linked Data with SPARQL

ISWC 2009 Tutorial "How to Consume Linked Data on the Web"

Federated Query Processing

● DARQ (Distributed ARQ) http://darq.sourceforge.net/● Query engine for federated SPARQL queries● Extension of ARQ (query engine for Jena)● Last update: June 28, 2006

Page 27: Querying Linked Data with SPARQL

ISWC 2009 Tutorial "How to Consume Linked Data on the Web"

Federated Query Processing

● Semantic Web Integrator and Query Engine(SemWIQ) http://semwiq.sourceforge.net/● Actively maintained by Andreas Langegger

Page 28: Querying Linked Data with SPARQL

ISWC 2009 Tutorial "How to Consume Linked Data on the Web"

Federated Query Processing

● Advantages:● No need for specific program logic● Queried data is up to date

● Drawbacks:● Requires the existence of a SPARQL endpoint for

each dataset● Requires effort to set up and configure the mediator

Page 29: Querying Linked Data with SPARQL

ISWC 2009 Tutorial "How to Consume Linked Data on the Web"

In any case:

● You have to know the relevant data sources● When developing the app using follow-up queries● When selecting an existing SPARQL endpoint over

a collection of dataset copies● When setting up your own store with a collection of

dataset copies● When configuring your query federation system

● You restrict yourself to the selected sources

Page 30: Querying Linked Data with SPARQL

ISWC 2009 Tutorial "How to Consume Linked Data on the Web"

In any case:

● You have to know the relevant data sources● When developing the app using follow-up queries● When selecting an existing SPARQL endpoint over

a collection of dataset copies● When setting up your own store with a collection of

dataset copies● When configuring your query federation system

● You restrict yourself to the selected sourcesThere is an alternative:

Remember, URIs link to data

Page 31: Querying Linked Data with SPARQL

ISWC 2009 Tutorial "How to Consume Linked Data on the Web"

AutomatedLink Traversal

Page 32: Querying Linked Data with SPARQL

ISWC 2009 Tutorial "How to Consume Linked Data on the Web"

Automated Link Traversal

● Idea: Discover further data by looking-up relevant URIs in your application

● Can be combined with the previous approaches

Page 33: Querying Linked Data with SPARQL

ISWC 2009 Tutorial "How to Consume Linked Data on the Web"

Link Traversal BasedQuery Execution

● Applies the idea of automated link traversal to the execution of SPARQL queries

● Idea:● Intertwine query evaluation with traversal of RDF links● Discover data that might contribute to query results

during query execution

● Alternately:● Evaluate parts of the query● Look up URIs in intermediate solutions

Queried data

Page 34: Querying Linked Data with SPARQL

ISWC 2009 Tutorial "How to Consume Linked Data on the Web"

Link Traversal BasedQuery Execution

SELECT ?c ?u WHERE {

<http://mymovie.db/movie2449> mov:filming_location ?c .

?c geo:statistics ?cStats .

?cStats stat:unempRate ?u . }

Queried data

● Example:Return unemployment rate of the countries in which the movie http://mymovie.db/movie2449 was filmed.

Page 35: Querying Linked Data with SPARQL

ISWC 2009 Tutorial "How to Consume Linked Data on the Web"

Link Traversal BasedQuery Execution

SELECT ?c ?u WHERE {

<http://mymovie.db/movie2449> mov:filming_location ?c .

?c geo:statistics ?cStats .

?cStats stat:unempRate ?u . }

Queried data

http://mymovie.db/movie2449

?

Page 36: Querying Linked Data with SPARQL

ISWC 2009 Tutorial "How to Consume Linked Data on the Web"

Link Traversal BasedQuery Execution

SELECT ?c ?u WHERE {

<http://mymovie.db/movie2449> mov:filming_location ?c .

?c geo:statistics ?cStats .

?cStats stat:unempRate ?u . }

Queried data

Page 37: Querying Linked Data with SPARQL

ISWC 2009 Tutorial "How to Consume Linked Data on the Web"

Link Traversal BasedQuery Execution

SELECT ?c ?u WHERE {

<http://mymovie.db/movie2449> mov:filming_location ?c .

?c geo:statistics ?cStats .

?cStats stat:unempRate ?u . }

Queried data

...

<http://mymovie.db/movie2449> mov:filming_location <http://geo.../Italy> .

...

Page 38: Querying Linked Data with SPARQL

ISWC 2009 Tutorial "How to Consume Linked Data on the Web"

Link Traversal BasedQuery Execution

SELECT ?c ?u WHERE {

<http://mymovie.db/movie2449> mov:filming_location ?c .

?c geo:statistics ?cStats .

?cStats stat:unempRate ?u . }

Queried data

...

<http://mymovie.db/movie2449> mov:filming_location <http://geo.../Italy> .

...

http://geo.../Italy

?loc

Page 39: Querying Linked Data with SPARQL

ISWC 2009 Tutorial "How to Consume Linked Data on the Web"

Link Traversal BasedQuery Execution

SELECT ?c ?u WHERE {

<http://mymovie.db/movie2449> mov:filming_location ?c .

?c geo:statistics ?cStats .

?cStats stat:unempRate ?u . }

Queried data

http://geo.../Italy

?loc

http://geo.../Italy

?

Page 40: Querying Linked Data with SPARQL

ISWC 2009 Tutorial "How to Consume Linked Data on the Web"

Link Traversal BasedQuery Execution

SELECT ?c ?u WHERE {

<http://mymovie.db/movie2449> mov:filming_location ?c .

?c geo:statistics ?cStats .

?cStats stat:unempRate ?u . }

Queried data

http://geo.../Italy

?loc

http://geo.../Italy

?

Page 41: Querying Linked Data with SPARQL

ISWC 2009 Tutorial "How to Consume Linked Data on the Web"

Link Traversal BasedQuery Execution

SELECT ?c ?u WHERE {

<http://mymovie.db/movie2449> mov:filming_location ?c .

?c geo:statistics ?cStats .

?cStats stat:unempRate ?u . } http://geo.../Italy

?loc

Queried data

Page 42: Querying Linked Data with SPARQL

ISWC 2009 Tutorial "How to Consume Linked Data on the Web"

Link Traversal BasedQuery Execution

SELECT ?c ?u WHERE {

<http://mymovie.db/movie2449> mov:filming_location ?c .

?c geo:statistics ?cStats .

?cStats stat:unempRate ?u . }

...

<http://geo.../Italy> geo:statistics <http://example.db/stat/IT> .

...

http://geo.../Italy

?loc

Queried data

Page 43: Querying Linked Data with SPARQL

ISWC 2009 Tutorial "How to Consume Linked Data on the Web"

Link Traversal BasedQuery Execution

SELECT ?c ?u WHERE {

<http://mymovie.db/movie2449> mov:filming_location ?c .

?c geo:statistics ?cStats .

?cStats stat:unempRate ?u . } http://geo.../Italy

?loc

http://geo.../Italy http://stats.db/../it

?stat?loc

...

<http://geo.../Italy> geo:statistics <http://example.db/stat/IT> .

... Queried data

Page 44: Querying Linked Data with SPARQL

ISWC 2009 Tutorial "How to Consume Linked Data on the Web"

Link Traversal BasedQuery Execution

SELECT ?c ?u WHERE {

<http://mymovie.db/movie2449> mov:filming_location ?c .

?c geo:statistics ?cStats .

?cStats stat:unempRate ?u . } http://geo.../Italy

?loc

http://geo.../Italy http://stats.db/../it

?stat?loc

● Proceed with this strategy(traverse RDF links during query execution)

Queried data

Page 45: Querying Linked Data with SPARQL

ISWC 2009 Tutorial "How to Consume Linked Data on the Web"

● Advantages:● No need to know all data sources in advance● No need for specific programming logic● Queried data is up to date● Independent of the existence of SPARQL endpoints

provided by the data sources

● Drawbacks:● Not as fast as a centralized collection of copies● Unsuitable for some queries● Results might be incomplete

Link Traversal BasedQuery Execution

Page 46: Querying Linked Data with SPARQL

ISWC 2009 Tutorial "How to Consume Linked Data on the Web"

Implementations

● Semantic Web Client library (SWClLib) for Javahttp://www4.wiwiss.fu-berlin.de/bizer/ng4j/semwebclient/

● SWIC for Prolog http://moustaki.org/swic/

Page 47: Querying Linked Data with SPARQL

ISWC 2009 Tutorial "How to Consume Linked Data on the Web"

Implementations

● SQUIN http://squin.org● Provides SWClLib functionality as a Web service● Accessible like a SPARQL endpoint● Public SQUIN service at:

http://squin.informatik.hu-berlin.de/SQUIN/● Install package: unzip and start● Convenient access with SQUIN PHP tools:

$s = 'http:// …'; // address of the SQUIN service$q = new SparqlQuerySock( $s, '… SELECT ...' );$res = $q->getJsonResult(); // or getXmlResult()

Page 48: Querying Linked Data with SPARQL

ISWC 2009 Tutorial "How to Consume Linked Data on the Web"

Real-World Examples

SELECT DISTINCT ?author ?phone WHERE {

?pub swc:isPartOf <http://data.semanticweb.org/conference/eswc/2009/proceedings> .

?pub swc:hasTopic ?topic . ?topic rdfs:label ?topicLabel .

FILTER regex( str(?topicLabel), "ontology engineering", "i" ) .

?pub swrc:author ?author .

{ ?author owl:sameAs ?authorAlt }

UNION

{ ?authorAlt owl:sameAs ?author }

?authorAlt foaf:phone ?phone .

}

2

297

16

1min 30sec

# of query results

# of retrieved graphs

# of accessed servers

avg. execution time

Returnphone numbers of authors

of ontology engineering papersat ESWC'09.