2

Click here to load reader

[IEEE 2014 IEEE International Conference on Semantic Computing (ICSC) - Newport Beach, CA, USA (2014.6.16-2014.6.18)] 2014 IEEE International Conference on Semantic Computing - Computing

Embed Size (px)

Citation preview

Page 1: [IEEE 2014 IEEE International Conference on Semantic Computing (ICSC) - Newport Beach, CA, USA (2014.6.16-2014.6.18)] 2014 IEEE International Conference on Semantic Computing - Computing

Computing Recursive SPARQL Queries

Maurizio AtzoriDept. of Mathematics and Computer Science

University of Cagliari

Via Ospedale 72, 09124 Cagliari (Italy)

Email: [email protected]

Abstract—We present a simple approach to handle recursiveSPARQL queries, that is, nested queries that may contain refer-ences to the query itself. This powerful feature is obtained byimplementing a custom SPARQL function that takes a SPARQL

query as a parameter and executes it over a specified endpoint.The behaviour is similar to the SPARQL 1.1 SERVICE clause,with a few fundamental differences: (1) the query passed asargument can be arbitrarily complex, (2) being a string, thequery can be created at runtime in the calling (outer) query,and (3) it can reference to itself, enabling recursion. Thesefeatures transform the SPARQL language into a Turing-equivalentone without introducing special constructs or needing anotherinterpreter implemented on the endpoint server engine. Thefeature is implemented using the standard Estensible ValueTesting described in the recommendations since 1.0; therefore,our proposal is standard compliant and also compatible witholder endpoints not supporting 1.1 Specifications, where it canbe also a replacement for the missing SERVICE clause.

I. INTRODUCTION

The Semantic Web is impacting on a number of fields,showing huge potential of having the Web as an unbounded,decentralized and free crowdsourced data store where everyonecan access and contribute. Each data publisher provides apart of the Semantic Web graph, and through endpoints thosesubgraphs can be easily queried by means of an effectivepattern-based query language, the well-known SPARQL. Whenannounced, Sir Tim Berners-Lee declared that “SPARQL willmake a huge difference” making the web machine-readable.More recently, as also detailed in the Related Work, a numberof researchers worked on extending the relations betweenSPARQL and the Web of Data, allowing for instance dynamicexploration of the linked data through dereferencing the URIsappearing in the query, and therefore not relegating SPARQL

to be a language for local data only.

Although well structured, elegant and addressed to alsonon-expert users, the SPARQL language lacks advanced featuresthat can be found on Turing-equivalent languages. Being aquery language, not a programming language, its expressivityis reduced by design, and programmers can use ad-hoc librariesto extract data using SPARQL and then further elaborate thedata by means of other, more expressive, programming lan-guages such as Java or C++. Although sufficient in most of thecases, whenever the computation needs to be tightly interleavedwith data exploration/extraction, the limits in the expressivitypower arise, reducing the computation to a long alternatecalls of SPARQL from the external programming language,inefficient in terms of execution time and producing codedifficult to read and maintain. To clarify the problem, let usconsider the problem of finding paths between two nodes. This

very simple and frequent problem on Semantic Web graphscould not be managed using only SPARQL 1.0, and in fact thenewer 1.1 version of SPARQL introduced the so called propertypaths, an extension with ad-hoc syntax that allows some degreeof recursive explorations of paths.

In other words, some expressivity limitation of originalSPARQL has been recognized and some features have beenadded to address the weaknesses. Unfortunately, still somedesirable and useful queries cannot be posed even on endpointsthat implements property paths. For instance, we can computewhether there is a path between two nodes, but we cannotcount how many of them nor compute the shortest path. Mostof the interesting queries are based on a missing feature ofSPARQL: recursion.

In our work we simply extend the SPARQL language withonly one function that enables the execution of a dynamicallygenerated SPARQL query, therefore enhancing the language andmaking it Turing-equivalent but without introducing unwantedsyntax extensions that may impact on the learning curve ofusers.

II. INTERPRETING SPARQL QUERIES

We propose a simple solution to the problem of runningrecursive and potentially dynamically-generated queries inSPARQL. We define the following function:

wfn:runSPARQL(?query, ?endpoint [, ...])

The function wfn:runSPARQL takes two string parameters,the query to be run and an endpoint, and then it executes thequery at the specified SPARQL endpoint. It also accepts optionalparameters that will be passed to the query as ?i0, ?i1, . . . ,as later explained in the examples. The wfn:runSPARQLfunction returns the value of the variable ?result, thatthe query must bind. The function wfn:runSPARQL, otherthan binding variables ?i0, ?i1, . . . , to the optional inputarguments, it also binds ?query and ?endpoint to thevalues passed to the function, in order to simplify recursion.

We explain the semantics of the function by means oftwo examples difficult or impossible to compute without ourextension: factorial and path length. The online endpointand other examples are available at our dedicated websitehttp://atzori.webofcode.org/projects/runSPARQL/.

A. Use Case 1: Computing the Factorial

To understand how to write a recursive SPARQL query weshow how to compute the factorial of any integer. In the

2014 IEEE International Conference on Semantic Computing

978-1-4799-4003-5/14 $31.00 © 2014 IEEE

DOI 10.1109/ICSC.2014.54

258

Page 2: [IEEE 2014 IEEE International Conference on Semantic Computing (ICSC) - Newport Beach, CA, USA (2014.6.16-2014.6.18)] 2014 IEEE International Conference on Semantic Computing - Computing

following, we define the SPARQL query (the string assignedto the variable ?query) that recursively calls itself:

# the recursive query and the endpointVALUES (?query ?endpoint) { (

"""BIND ( IF(?i0 <= 0, 1, ?i0 *wfn:runSPARQL(?query,?endpoint, ?i0-1))

AS ?result)""""http://runsparql.webofcode.org/sparql"

)}

# actual call of the recursive queryBIND( wfn:runSPARQL(?query,?endpoint,5)AS ?result)

The last line actually calls the wfn:runSPARQL function,with one input parameter 5 (the integer we want to computethe factorial of). This will end up in a query run to the specifiedendpoint (http://runsparql.webofcode.org/sparql in the case athand), where the actual query is the one specified in the first(?query) parameter. Notice that, as specified above, the finalquery will be expanded in something like the following:

PREFIX wfn : <http://webofcode.org/wfn/>SELECT ?result {# bind variables to parameter valuesVALUES (?query ?endpoint ?i0) { (

"""BIND ( IF(?i0 <= 0, 1, ?i0 *wfn:runSPARQL(?query,?endpoint, ?i0-1))

AS ?result)""""http://runsparql.webofcode.org/sparql"5

)}

# the recursive queryBIND ( IF(?i0 <= 1, 1, ?i0 *wfn:runSPARQL(?query,?endpoint, ?i0-1))

AS ?result)} LIMIT 1

The endpoint will run the above query 5 times, each timedecreasing the value of ?i0. In the last execution, the IFguard will evaluate to true, reaching the base case of thefactorial function, and therefore the wfn:runSPARQL willnot be evaluated anymore, which completes the computationreturning 5! = 120.

B. Use Case 2: Computing the Paths

Other than the didactic problem of factorial, recursivequeries can be used to easily compute useful functions other-wise difficult or impossible to express with plain SPARQL suchas, for instance, the length of a path between two entities in agraph. The following shows a recursive query to compute thelength of a path from :Village and :PopulatedPlaceby following rdfs:subClassOf edges:

?i0 rdfs:subClassOf ?next.BIND( IF(?next = :PopulatedPlace, 1 ,

1 + wfn:runSPARQL(?query, ?endpoint, ?next))AS ?result)

It must be called by passing the starting node, e.g.,wfn:runSPARQL(?query,?endpoint, :Village),returning the value of 2 (as per the path V illage →Settlement → PopulatedP lace on DBpedia categories).

III. RELATED WORK

The work in [1] defines an approach to make the estensionfunctions interoperable, allowing arbitrary javascript code,downloaded from third-party servers at query time, to be run onsupporting endpoints. While improving SPARQL expressivity,it does not seem to help on the interleaving of powerfuljavascript code with data accessed using SPARQL only. Ourwork in [2] presents an approach based on Remote ProcedureCalls compatible with the SPARQL 1.1 standard, where third-party functions can be called remotely, enriching expressivityand interoperability but without addressing recursive SPARQL

functions.

In [3], [4] the authors study the expressive power ofSPARQL 1.0, showing equivalence with non-recursive safeDatalog with negations and Relational Algebra, and possibleimplementations of subqueries.

Work in [5], [6] addresses the use of regular expressionsto handle paths within SPARQL, therefore enhancing expres-sivity of the language and limited recursion focused on graphtraversal.

Finally, SPIN [7] provides a vocabulary to representSPARQL queries under the RDF model. Manipulation of querystructure using SPIN notation allows the creation of recursivefunctions. While tightly related to SPARQL and very powerful,SPIN is not a standard and requires some efforts for thefinal user, as it seems designed to let publishers show datadynamically generated using RDF-based SPARQL complexfunctions. In contrast, our work in this paper is focused onallowing complex user queries to be run against RDF staticdata. Another drawback is that the implementation of SPIN onexisting engines is not trivial. Our approach, in contrast, re-quires few lines of code on the server engine to implement thewfn:runSPARQL custom function, and queries are alreadystandard compliant.

ACKNOWLEDGMENTS

This work was supported in part by the RAS Project CRP-17615 DENIS: Dataspaces Enhancing Next Internet in Sar-dinia and by MIUR PRIN 2010-11 project Security Horizons.

REFERENCES

[1] G. Williams, “Extensible SPARQL Functions with Embedded Javascript,”in SFSW, ser. CEUR Workshop Proceedings, S. Auer, C. Bizer, T. Heath,and G. A. Grimnes, Eds., vol. 248. CEUR-WS.org, 2007.

[2] M. Atzori, “Toward the Web of Functions: Interoperable High-OrderFunctions in SPARQL,” (Submitted) 2014.

[3] R. Angles and C. Gutierrez, “Subqueries in SPARQL,” in AMW, ser.CEUR Workshop Proceedings, P. Barcelo and V. Tannen, Eds., vol. 749.CEUR-WS.org, 2011.

[4] R. Angles and C. Gutierrez, “The Expressive Power of SPARQL,” inInternational Semantic Web Conference, ser. Lecture Notes in ComputerScience, A. P. Sheth, S. Staab, M. Dean, M. Paolucci, D. Maynard, T. W.Finin, and K. Thirunarayan, Eds., vol. 5318. Springer, 2008, pp. 114–129.

[5] F. Alkhateeb, J.-F. Baget, and J. Euzenat, “Extending sparql with regularexpression patterns (for querying rdf),” J. Web Sem., vol. 7, no. 2, pp.57–73, 2009.

[6] F. Alkhateeb and J. Euzenat, “Answering SPARQL queries modulo RDFSchema with paths,” CoRR, vol. abs/1311.3879, 2013.

[7] H. Knublauch, J. A. Hendler, and K. Idehen, “SPIN: SPARQL InferenceNotation. W3C Member Submission.” 2011.

259