29
SPARQL Query Rewriting for Implementing Data Integration over Linked Data Gianluca Correndo, Manuel Salvadores, Ian Millard, Hugh Glaser, Nigel Shadbolt

SPARQL Query Rewriting for Implementing Data Integration over Linked Data

  • Upload
    calida

  • View
    53

  • Download
    0

Embed Size (px)

DESCRIPTION

SPARQL Query Rewriting for Implementing Data Integration over Linked Data. Gianluca Correndo, Manuel Salvadores, Ian Millard, Hugh Glaser, Nigel Shadbolt. Linked Data access. Retrieving RDF content via HTTP requests Instance based vs. schema based access Accessing SPARQL endpoints - PowerPoint PPT Presentation

Citation preview

Page 1: SPARQL Query Rewriting for Implementing Data Integration over Linked Data

SPARQL Query Rewriting for Implementing Data Integration over Linked Data

Gianluca Correndo, Manuel Salvadores, Ian Millard, Hugh Glaser, Nigel Shadbolt

Page 2: SPARQL Query Rewriting for Implementing Data Integration over Linked Data

Linked Data access• Retrieving RDF content via HTTP requests

– Instance based vs. schema based access

• Accessing SPARQL endpoints

– Schema based vs. instance based access

2SPARQL+HTTP

Page 3: SPARQL Query Rewriting for Implementing Data Integration over Linked Data

Linked Data – Schema based integration

3

source target

Data set

Ontology(SPARQL) Query

Co-reference

OA = <SO,TO,TD,EA>SO: Source OntologiesTO: Target OntologiesTD: Target DatasetEA: Entity Alignments

• Datasets can use more than one ontology for describing the data• More than one dataset can use the same set of ontologies coherently (e.g. RKB)• More than one ontology is used for defining a SPARQL query• Ontologies contain many entities to be aligned

Page 4: SPARQL Query Rewriting for Implementing Data Integration over Linked Data

Query Rewriting Architecture

4

<source>SPARQLquery

SPARQL query

rewriter

<target>SPARQLquery

<KISTI>SPARQLquery

<dbpedia>SPARQLquery

voiDAlignments

Page 5: SPARQL Query Rewriting for Implementing Data Integration over Linked Data

Ontology Alignment• DL primitives are used to describe concept alignments

(i.e. Equivalent, Subsume)

– Implementation of the underneath ontological mediation usually not provided or relies on reasoners

• Ontological mediation usually applied to data, not queries

– rule systems that exploit alignments to translate data

– [Euzenat] SPARQL for integrating dataCONSTRUCT { ?x rdf:type vc:VCard } WHERE { ?x rdf:type foaf:Person }

How to write such queries?

5

Page 6: SPARQL Query Rewriting for Implementing Data Integration over Linked Data

Anatomy of a SPARQL query• Query type: SELECT, DESCRIBE, CONSTRUCT, ASK

• Basic Graph Pattern (or BGP): graph pattern that resulting triples must satisfy

• Filter section: additional constraints over variables present in the BGP

PREFIX id:<http://southampton.rkbexplorer.com/id/>PREFIX akt:<http://www.aktors.org/ontology/portal#>SELECT DISTINCT ?a WHERE {

?paper akt:has-author id:person-02686 .?paper akt:has-author ?a .

}

6

Page 7: SPARQL Query Rewriting for Implementing Data Integration over Linked Data

SPARQL BGPPREFIX id:<http://southampton.rkbexplorer.com/id/>PREFIX akt:<http://www.aktors.org/ontology/portal#>SELECT DISTINCT ?a WHERE {

?paper akt:has-author id:person-02686 , ?a .}

•“DISTINCT ?a” is not represented in this graph

•Constraints over nodes can be represented either as a graph and within FILTER section

7

?paper

id:person-02686

akt:has-author

?a

akt:has-author

Page 8: SPARQL Query Rewriting for Implementing Data Integration over Linked Data

Entity Alignment as Graph Rewriting• Query rewriting based on BGP graph rewriting

• Entity Alignment EA = <LHS, RHS, FD>

– LHS : Triple to match (open variables to bind)

– RHS : Set of triples to instantiate (depending on previous bindings on open variables)

– FD : Functional dependencies (between variables)

8

Page 9: SPARQL Query Rewriting for Implementing Data Integration over Linked Data

Entity Alignment as Graph Rewriting• Using the graph rewriting formalism we can

rewrite queries defined for a dataset (or ontology) to integrate results from other data sets

– But not only, we can also generate CONSTRUCT queries to integrate entire data sets

9

Page 10: SPARQL Query Rewriting for Implementing Data Integration over Linked Data

SPARQL Rewriting• Each triple from the BGP is matched to the LHSs

(generating variable bindings in the process)

• Eventual functional dependencies are solved (enriching the bindings with new associations)

• The respective RHS is instantiated with the given bindings and replace the original triple

• Unbounded variables generates new variables

10

Page 11: SPARQL Query Rewriting for Implementing Data Integration over Linked Data

SPARQL Rewriting• Example:

– LHS1 = <_:1,rdf:type, source:A>

– RHS1 = {<_:1,rdf:type,target:B>}

– FD1 = {}

• <?p,rdf:type,source:A> = LHS1[_:1/?p]

• RHS1[_:1/?p]=<?p,rdf:type,target:B>

• _:1 it’s the RDF way to define blank nodes, that are treated, within a graph, as existentially quantified variables.Triple(v1,rdf:type,source:A)Triple(v1,rdf:type,target:B)

11

Page 12: SPARQL Query Rewriting for Implementing Data Integration over Linked Data

SELECT *WHERE { ?s a source:User.…}

<_:1,rdf:type,source:User>

SELECT *WHERE { ?s a target:Agent.…}

<_:1,rdf:type,target:Agent>

Ontology Alignments – Class Eq.

_:1

source:User

rdf:type

_:1

target:Agent

rdf:type

12

Page 13: SPARQL Query Rewriting for Implementing Data Integration over Linked Data

SELECT *WHERE { ?s a source:WhiteWine.…}

<_:1,rdf:type,source:WhiteWine>

SELECT *WHERE { ?s a target:Vin; target:has-color ”blanc”@fr…}

<_:1,rdf:type,target:Vin><_:1,target:has-color, ”blanc”@fr>

Ontology Alignments – Class Partition

_:1

source:WhiteWine

rdf:type

_:1

target:Vin

rdf:type

“blanc”@frtarget:has-color

13

Page 14: SPARQL Query Rewriting for Implementing Data Integration over Linked Data

SELECT *WHERE { ?s source:has-name ?n.…}

<_:1,source:has-name,_:2>

SELECT *WHERE { ?s target:fullName ?n.…}

<_:1,target:fullName,_:2>

Ontology Alignments – Property Eq.

_:1

source:has-name

_:1

target:fullName

_:2 _:2

14

Page 15: SPARQL Query Rewriting for Implementing Data Integration over Linked Data

SELECT *WHERE { ?p akt:has-author ?a.…}

<_:1,akt:has-author,_:2>

SELECT *WHERE { ?s kisti:CreatorInfo ?i. ?i kisti:hasCreator ?a…}

<_:1,kisti:CreatorInfo,:_3><_:3,kisti:hasCreator,_:2>

Ontology Alignments – Property Eq.

_:1

akt:has-author

_:1

kisti:CreatorInfo

_:2_:3

_:2

kisti:hasCreator

15

Page 16: SPARQL Query Rewriting for Implementing Data Integration over Linked Data

SELECT *WHERE { ?p source:temp ”10”^^C.…}

<_:1,source:temp,_:2>

SELECT *WHERE { ?p target:farenheit ”50”^^F…}

<_:1,target:farenheit,_:2>

Ontology Alignments – Property Eq.

_:1

source:temp

_:1

target:farenheit

_:2 _:2

binding directly Celsius values to Fahrenheit is wrong, the two values are linked by a functional dependency.

_:3

celsius2farenheit

16

Page 17: SPARQL Query Rewriting for Implementing Data Integration over Linked Data

SPARQL Rewriting• PREFIX id:<http://southampton.rkbexplorer.com/id/>

PREFIX akt:<http://www.aktors.org/ontology/portal#>SELECT DISTINCT ?a WHERE {

?paper akt:has-author id:person-02686 .?paper akt:has-author ?a .

}

17

?paper

id:person-02686

akt:has-author

?a

akt:has-author

_:1

akt:has-author

_:1

kisti:CreatorInfo

_:2_:3

_:2

kisti:hasCreator

Page 18: SPARQL Query Rewriting for Implementing Data Integration over Linked Data

SPARQL Rewriting

18

?paper

id:person-02686

akt:has-author

?a

akt:has-author

?paper

id:person-02686

kisti:CreatorInfo

?new1

akt:has-author

?a

kisti:hasCreator

?paper

id:person-02686

kisti:CreatorInfo

?new1

kisti:hasCreator

?a

kisti:hasCreator

?new2

kisti:CreatorInfo Problemin KISTI dataset <http://southampton.rkbexplorer.com/id/person-02686> is unknown.

Page 19: SPARQL Query Rewriting for Implementing Data Integration over Linked Data

Co-reference integration• Constants in the query (like URIs) must be translated in

order to retrieve correct results

• URI equivalences are maintained by co-reference services like http://sameas.org accessible via REST interface.

• Modeled as functional dependency within variables

– Function returns the equivalent URI that satisfy a regex pattern

– Datasets maintain URIs that are recognizable by a common schema (prefix for sure, e.g. http://dbpedia.org/resource/*)

19

Page 20: SPARQL Query Rewriting for Implementing Data Integration over Linked Data

Co-reference integration

20

_:11

akt:has-author

_:12

kisti:CreatorInfo

_:21

_:3

_:22

kisti:hasCreator

sameas

sameas

id:person-02686 kisti:PER_000000000105047

http://kisti.rkbexplorer.com/id/\S*

Page 21: SPARQL Query Rewriting for Implementing Data Integration over Linked Data

Implementation• Java package based on Jena API for SPARQL Query

rewriting

• Code not released yet (planning to integrate it with INRIA ontology alignment API)

21

Page 22: SPARQL Query Rewriting for Implementing Data Integration over Linked Data

Progress report• Contact with Francois Schraffe and Jerome Euzenat

• Partial mapping to EDOAL ontology alignment specification (work in progress)

• SPARQL query rewriter to be implemented in the Alignment API (partially done)

22

Page 23: SPARQL Query Rewriting for Implementing Data Integration over Linked Data

EDOAL - Expressive and Declarative Ontology Alignment Language • Construction of entities from other entities can be

expressed through algebraic operators

• Restrictions can be expressed on entities in order to narrow their scope.

• Transformations of property values can be specified. Property values using different encoding or units can be aligned using transformations.

23

Page 24: SPARQL Query Rewriting for Implementing Data Integration over Linked Data

EDOAL - Example

24

<http://oms.omwg.org/wine-vin/MappingRule_3> :entity1 wine:Bordeaux ; :entity2 [ edoal:and (vin:Vin [

a edoal:AttributeValueRestriction edoal:comparator xsd:equals ; edoal:onAttribute [ edoal:compose (vin:hasTerroir proton:locatedIn ) ; a edoal:Relation ] ; edoal:value vin:Aquitaine ] ) ; a edoal:Class ] ; :measure "1."^^xsd:float ; :relation "SubsumedBy" ; a :Cell .

Page 25: SPARQL Query Rewriting for Implementing Data Integration over Linked Data

Internal Representation

25

_:6

rdf:type_:6

rdf:typewine:Bordeaux

vin:Vin

vin:Aquitaine

vin:hasTerroir_:9

proton:locatedIn

Page 26: SPARQL Query Rewriting for Implementing Data Integration over Linked Data

Progress report• Graph pattern rewriting can be used also for

creating CONSTRUCT queries for translate RDF graphs with different ontologies.

26

CONSTRUCT { ?9 <http://proton.semanticweb.org/locatedIn> <http://ontology.deri.org/vin#Aquitaine> . ?6 <http://ontology.deri.org/vin#hasTerroir> ?9 . ?6 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://ontology.deri.org/vin#Vin> .}WHERE { ?6 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/TR/2003/CR-owl-guide-20030818/wine#Bordeaux> .}

Page 27: SPARQL Query Rewriting for Implementing Data Integration over Linked Data
Page 28: SPARQL Query Rewriting for Implementing Data Integration over Linked Data

Outline• Linked Data

– Data topology

– Data access

• Query Rewriting

– Ontology Alignment

– Entity Alignment

– SPARQL rewriting

28

Page 29: SPARQL Query Rewriting for Implementing Data Integration over Linked Data

Linked Data topology• Foreign URIs for referring to external entities

• Co-references for referring to instance “equivalence”

29