30
Shape Expressions: An RDF validation and transformation language Eric Prud'hommeaux World Wide Web Consortium MIT, Cambridge, MA, USA [email protected] Harold Solbrig Mayo Clinic USA College of Medicine, Rochester, MN, USA Jose Emilio Labra Gayo WESO Research group University of Oviedo Spain [email protected]

Shape Expressions: An RDF validation and transformation language

Embed Size (px)

DESCRIPTION

Presentation at Semantics-2014, Leipzig, Sept. 2014 Author: Jose Emilio Labra Gayo

Citation preview

Page 1: Shape Expressions: An RDF validation and transformation language

Shape Expressions: An RDF validation and transformation language

Eric Prud'hommeauxWorld Wide Web

ConsortiumMIT, Cambridge, MA, USA

[email protected]

Harold SolbrigMayo Clinic

USACollege of Medicine, Rochester,

MN, USA

Jose Emilio Labra GayoWESO Research groupUniversity of Oviedo

[email protected]

Page 2: Shape Expressions: An RDF validation and transformation language

This talk in 1 slide

Motivating example: Represent issues and users in RDF...and validate that data

Shape Expressions = simple language to:Describe the topology of RDF dataValidate if an RDF graph matches a given shape

Shape expressions can be extended with actionsPossible application: transform RDF into XML

Page 3: Shape Expressions: An RDF validation and transformation language

Motivating example

Represent in RDF a issue tracking systemIssues are reported by users on some dateIssues have some status (assigned/unassigned)Issues can also be reproduced on some date by users

User Issue

Page 4: Shape Expressions: An RDF validation and transformation language

User__ foaf:name: xsd:stringfoaf:givenName: xsd:string*foaf:familyName: xsd:stringfoaf:mbox: IRI

Issue__ :status: (:Assigned :Unassigned):reportedOn: xsd:date:reproducedOn: xsd:date

:reportedBy 0..*1

:reproducedBy0..* 0..1

0..*

0..1

:related

E-R Diagram

...and several constraints

A user: - has full name or several given names and one

family name- can have one mbox

A Issue- has status Assigned/Unassigned- is reported by a user- is reported on a date- can be reproduced by a user on a

date- is related to other issues

Page 5: Shape Expressions: An RDF validation and transformation language

Example data in RDF:Issue1 :status :Unassigned ; :reportedBy :Bob ; :reportedOn "2013-01-23"^^xsd:date ; :reproducedBy :Thompson.J ; :reproducedOn "2013-01-23"^^xsd:date .

:Bob foaf:name "Bob Smith" ; foaf:mbox <mail:[email protected]> .

:Thompson.J foaf:givenName "Joe", "Joseph" ; foaf:familyName "Thompson" ; foaf:mbox <mail:[email protected]> .

:Issue2 :status :Checked ; :reportedBy :Issue1 ; :reportedOn 2014 ; :reproducedBy :Tom .

:Tom foaf:name "Tom Smith", "Tam" .

:Anna foaf:givenName "Anna" ; foaf:mbox 23.

Page 6: Shape Expressions: An RDF validation and transformation language

Problem statementWe want to detect possible errors in RDF like:

Issues without statusIssues with status different of Assigned/UnassignedIssues reported by something different to a userIssues reported on a date with a non-date typeIssues reproduced on a date before the reported dateUsers without mboxUsers with 2 namesUsers with with a name of type integer...lots of other errors...

Q: How can we describe RDF data to be able to detect those errors?A: Our proposal = Shape Expressions

Page 7: Shape Expressions: An RDF validation and transformation language

Shape Expressions - UsersA user can have either:

one foaf:name or one or more foaf:givenName and one foaf:familyName all of them must be of type xsd:string

A user can have one foaf:mbox with value any IRI

<UserShape> { ( foaf:name xsd:string | foaf:givenName xsd:string+ , foaf:familyName xsd:string ), foaf:mbox IRI ?} The example uses compact syntax

Shape Expressions can also be represented in RDF

Page 8: Shape Expressions: An RDF validation and transformation language

Shape Expressions - Issues

<IssueShape> { :status (:Assigned :Unassigned), :reportedBy @<UserShape>, :reportedOn xsd:date, ( :reproducedBy @<UserShape> , :reproducedOn xsd:date )?, :related @<IssueShape>*}

Issues :status must be either :Assigned or :UnassignedIssues are :reportedBy a user Issues are :reportedOn a xsd:dateA issue may be :reproducedBy a user and :reproduceOn an xsd:dateA issue can be :related to several issues

Page 9: Shape Expressions: An RDF validation and transformation language

Full exampleprefix : <http://example.org/>prefix xsd: <http://www.w3.org/2001/XMLSchema#>prefix foaf: <http://xmlns.com/foaf/0.1/>

<UserShape> { ( foaf:name xsd:string | foaf:givenName xsd:string+ , foaf:familyName xsd:string ), foaf:mbox IRI ?}

<IssueShape> { :status (:Assigned :Unassigned), :reportedBy @<UserShape>, :reportedOn xsd:date, ( :reproducedBy @<UserShape> , :reproducedOn xsd:date )?, :related @<IssueShape>*}

Online Shape Expressions validators: http://www.w3.org/2013/ShEx http://rdfshape.weso.es

Page 10: Shape Expressions: An RDF validation and transformation language

FAQ: Why not use SPARQL?

<UserShape> { ( foaf:name xsd:string | foaf:givenName xsd:string+ , foaf:familyName xsd:string ), foaf:mbox IRI ?}

<IssueShape> { :status (:Assigned :Unassigned), :reportedBy @<UserShape>, :reportedOn xsd:date, ( :reproducedBy @<UserShape> , :reproducedOn xsd:date )?, :related @<IssueShape>*}

1234567891011121314151617

CONSTRUCT { ?IssueShape :hasShape <IssueShape> . ?UserShape :hasShape <UserShape> .} { { SELECT ?IssueShape { ?IssueShape :status ?o . } GROUP BY ?IssueShape HAVING (COUNT(*)=1)} { SELECT ?IssueShape { ?IssueShape :status ?o . FILTER ((?o = :Assigned || ?o = :Unassigned)) } GROUP BY ?IssueShape HAVING (COUNT(*)=1)} { SELECT ?IssueShape (COUNT(*) AS ?IssueShape_c0) { ?IssueShape :reportedBy ?o . } GROUP BY ?IssueShape HAVING (COUNT(*)=1)} { SELECT ?IssueShape { ?IssueShape :reportedBy ?o .

FILTER ((isIRI(?o) || isBlank(?o))) } GROUP BY ?IssueShape HAVING (COUNT(*)=1)} { SELECT ?IssueShape (COUNT(*) AS ?IssueShape_c1) { { SELECT ?IssueShape ?UserShape { ?IssueShape :reportedBy ?UserShape . FILTER (isIRI(?UserShape) || isBlank(?UserShape)) } } { SELECT ?UserShape WHERE { { { SELECT ?UserShape { ?UserShape foaf:name ?o . } GROUP BY ?UserShape HAVING (COUNT(*)=1)} { SELECT ?UserShape { ?UserShape foaf:name ?o . FILTER ((isLiteral(?o) && datatype(?o) = xsd:string))} GROUP BY ?UserShape HAVING (COUNT(*)=1)

123456789101112131415161718192021222324252627282930

} UNION { { SELECT ?UserShape (COUNT(*) AS ?UserShape_c0) { ?UserShape foaf:givenName ?o . } GROUP BY ?UserShape HAVING (COUNT(*)>=1)} { SELECT ?UserShape (COUNT(*) AS ?UserShape_c1) { ?UserShape foaf:givenName ?o . FILTER ((isLiteral(?o) && datatype(?o) = xsd:string))} GROUP BY ?UserShape HAVING (COUNT(*)>=1)} FILTER (?UserShape_c0 = ?UserShape_c1) { SELECT ?UserShape { ?UserShape foaf:familyName ?o . } GROUP BY ?UserShape HAVING (COUNT(*)=1)} { SELECT ?UserShape { ?UserShape foaf:familyName ?o . FILTER ((isLiteral(?o) && datatype(?o) = xsd:string))} GROUP BY ?UserShape HAVING (COUNT(*)=1)}} } GROUP BY ?UserShape HAVING (COUNT(*) = 1)} { SELECT ?UserShape (COUNT(*) AS ?UserShape_c2) { ?UserShape foaf:mbox ?o . } GROUP BY ?UserShape HAVING (COUNT(*)<=1)} { SELECT ?UserShape (COUNT(*) AS ?UserShape_c3) { ?UserShape foaf:mbox ?o .

FILTER (isIRI(?o)) } GROUP BY ?UserShape HAVING (COUNT(*)<=1)} FILTER (?UserShape_c2 = ?UserShape_c3)

313233343536373839404142434445464748495051525354555657585960

FILTER (?UserShape_c2 = ?UserShape_c3) } GROUP BY ?IssueShape } FILTER (?IssueShape_c0 = ?IssueShape_c1) OPTIONAL { ?IssueShape :reportedBy ?IssueShape_UserShape_ref0 . FILTER (isIRI(?IssueShape_UserShape_ref0) || isBlank(?IssueShape_UserShape_ref0)) } { SELECT ?IssueShape { ?IssueShape :reportedOn ?o . } GROUP BY ?IssueShape HAVING (COUNT(*)=1)} { SELECT ?IssueShape { ?IssueShape :reportedOn ?o . FILTER ((isLiteral(?o) && datatype(?o) = xsd:date))} GROUP BY ?IssueShape HAVING (COUNT(*)=1)} { { SELECT ?IssueShape (COUNT(*) AS ?IssueShape_c2) { ?IssueShape :reproducedBy ?o . } GROUP BY ?IssueShape} { SELECT ?IssueShape (COUNT(*) AS ?IssueShape_c3) { ?IssueShape :reproducedBy ?o . FILTER ((isIRI(?o) || isBlank(?o))) } GROUP BY ?IssueShape} FILTER (?IssueShape_c2 = ?IssueShape_c3) { SELECT ?IssueShape (COUNT(*) AS ?IssueShape_c5) { ?IssueShape :reproducedOn ?o . } GROUP BY ?IssueShape} { SELECT ?IssueShape (COUNT(*) AS ?IssueShape_c6) { ?IssueShape :reproducedOn ?o . FILTER ((isLiteral(?o) && datatype(?o) = xsd:date))} GROUP BY ?IssueShape} FILTER (?IssueShape_c5 = ?IssueShape_c6)

616263646566676869707172737475767778798081828384858687888990

FILTER (?IssueShape_c2=0 && ?IssueShape_c5=0 || ?IssueShape_c2>=1&&?IssueShape_c2<=1 && ?IssueShape_c5>=1&&?IssueShape_c5<=1) } { SELECT ?IssueShape (COUNT(*) AS ?IssueShape_c7) { ?IssueShape :related ?o . } GROUP BY ?IssueShape} { SELECT ?IssueShape (COUNT(*) AS ?IssueShape_c8) { ?IssueShape :related ?o . } GROUP BY ?IssueShape}FILTER (?IssueShape_c7 = ?IssueShape_c8) { SELECT ?UserShape WHERE { { { SELECT ?UserShape { ?UserShape foaf:name ?o . } GROUP BY ?UserShape HAVING (COUNT(*)=1)} { SELECT ?UserShape { ?UserShape foaf:name ?o . FILTER ((isLiteral(?o) && datatype(?o) = xsd:string)) } GROUP BY ?UserShape HAVING (COUNT(*)=1)} } UNION { { SELECT ?UserShape (COUNT(*) AS ?UserShape_c0) { ?UserShape foaf:givenName ?o . } GROUP BY ?UserShape HAVING (COUNT(*)>=1)} { SELECT ?UserShape (COUNT(*) AS ?UserShape_c1) { ?UserShape foaf:givenName ?o . FILTER ((isLiteral(?o) && datatype(?o) = xsd:string))} GROUP BY ?UserShape HAVING (COUNT(*)>=1)} FILTER (?UserShape_c0 = ?UserShape_c1) { SELECT ?UserShape { ?UserShape foaf:familyName ?o .

919293949596979899100101102103104105106107108109110111112113114115116117118119120

} GROUP BY ?UserShape HAVING (COUNT(*)=1)} { SELECT ?UserShape { ?UserShape foaf:familyName ?o . FILTER ((isLiteral(?o) && datatype(?o) = xsd:string)) } GROUP BY ?UserShape HAVING (COUNT(*)=1)}} } GROUP BY ?UserShape HAVING (COUNT(*) = 1)} { SELECT ?UserShape (COUNT(*) AS ?UserShape_c2) { ?UserShape foaf:mbox ?o . } GROUP BY ?UserShape HAVING (COUNT(*)<=1)} { SELECT ?UserShape (COUNT(*) AS ?UserShape_c3) { ?UserShape foaf:mbox ?o . FILTER (isIRI(?o)) } GROUP BY ?UserShape HAVING (COUNT(*)<=1)} FILTER (?UserShape_c2 = ?UserShape_c3)}

121122123124125126127128129130131132133134135136

.

.

.

.

Shape Expression

Shape Expressions can be converted to SPARQLBut Shape Expressions are simpler and more readable to solve this problem

Page 11: Shape Expressions: An RDF validation and transformation language

Shape Expressions Language

Schema = set of Shape ExpressionsShape Expression = labeled pattern

Typical pattern = conjunction of several expressionsConjunction represented by ,

<IssueShape> { :status (:Assigned :Unassigned), :reportedBy @<UserShape>, :reportedOn xsd:date...}

<label> { ...pattern... }

Label

Conjunction

Page 12: Shape Expressions: An RDF validation and transformation language

Arcs

Basic expression: an ArcArc = name definition followed by value definition

<IssueShape> { :status (:Assigned :Unassigned), :reportedBy @<UserShape>, :reportedOn xsd:date...}

:bob:isue1 :reportedBy

:status :Unassigned

:reportedOn 23-01-2013

Name defn Value defn

Page 13: Shape Expressions: An RDF validation and transformation language

Value definition

Value definitions can be Value type xsd:date Matches a value of type xsd:date

Value set ( :Assigned :Unassigned )

The object is an element of the given set

Reference @<UserShape> The object has shape <UserShape>

Stem foaf:~ Starts with the IRI associated with foaf

Any - :Checked Any value except :Checked

<IssueShape> { :status (:Assigned :Unassigned), :reportedBy @<UserShape>, :reportedOn xsd:date...}

Value set

Value reference

Value type

Page 14: Shape Expressions: An RDF validation and transformation language

Name definition

Name definitions can be

Name term foaf:name Matches given IRI

Name stem foaf:~ Any predicate that starts by foaf

Name any - foaf:name Any predicate except foaf:name

<IssueShape> { :status (:Assigned :Unassigned), :reportedBy @<UserShape>, :reportedOn xsd:date...}

Name terms

Page 15: Shape Expressions: An RDF validation and transformation language

Alternatives

Alternatives (disjunctions) are marked by |Example 1: An agent has either foaf:name or rdfs:label

<Agent> { ( foaf:name xsd:string | rdfs:label xsd:string ) ...}

<listOfInt> { rdf:first xsd:integer , ( rdf:rest ( rdf:nil ) | rdf:rest @<listOfInt> )}

Example 2: A list of integers

Page 16: Shape Expressions: An RDF validation and transformation language

Cardinalities

The same as in common regular expressions* 0 or more

+ 1 or more? 0 or 1

{m} m repetitions

{m,n} Between m and n repetitions

<IssueShape> { ... ( :reproducedBy @<UserShape>, :reproducedOn xsd:date)? , :related @<IssueShape>*}

Page 17: Shape Expressions: An RDF validation and transformation language

Semantic actionsDefine actions to be executed during validation

<Issue> { ... :reportedOn xsd:date %js{ report = _.o; return true; %} , ( :reproducedBy @<UserShape> , :reproducedOn xsd:date %js{ return _.o.lex > report.lex; %} ) ?}

%lang{ ...actions... %}

Calls lang processor passing it the given actions

Example: Check that :reportedOn must be before :reproducedOn

Page 18: Shape Expressions: An RDF validation and transformation language

Semantics of Shape Expressions

Operational semantics using inference rulesInspired by the semantics of RelaxNGFormalism used to define type inference systemsMatching infer shape typingsAxioms and rules of the form:

Page 19: Shape Expressions: An RDF validation and transformation language

Example: matching rules ( )

More details in the paper

Graph can be decomposedin g1 and g2

Combine typingst1 and t2

Type AssignmentContext Graph

Page 20: Shape Expressions: An RDF validation and transformation language

Transforming RDF using ShEx

Semantic actions can be combined with specialized languages

Possible languages: sparql, js Other examples:GenX = very simple language to generate XML

Goal: Semantic loweringMap RDF clinical records to XML

GenJ generates JSON

Page 21: Shape Expressions: An RDF validation and transformation language

Example:Issue1 :status :Unassigned ; :reportedBy :Bob ; :reportedOn "2013-01-23"^^xsd:date ; :reproducedBy :Thompson.J ; :reproducedOn "2013-01-23"^^xsd:date .

:Bob foaf:name "Bob Smith" ; foaf:mbox <mail:[email protected]> .

:Thompson.J foaf:givenName "Joe", "Joseph" ; foaf:familyName "Thompson" ; foaf:mbox <mail:[email protected]> .

<issue xmlns="http://ex.example/xml" id="Issue1" status="Unassigned"> <reported date="2013-01-23"> <given-name>Bob</given-name> <family-name>Smith</family-name> <email>mail:[email protected]</email> </reported> <reproduced date="2013-01-23"> <given-name>Joe</given-name> <given-name>Joseph</given-name> <family-name>Thompson</family-name> <email>mail:[email protected]</email> </reproduced></issue>

RDF (Turtle)

XML

Shape Expressions+

GenX

Page 22: Shape Expressions: An RDF validation and transformation language

GenXGenX syntax

$IRI Generates elements in that namespace

<name> Add element <name>@<name> Add attribute <name>

=<expr> XPath function applied to the value

= Don't emit the value

[-n] Place the value up n values in the hierarchy

Page 23: Shape Expressions: An RDF validation and transformation language

Example transforming RDF to XML%GenX{ issue $http://ex.example/xml %}<IssueShape> { ex:status (ex:unassigned ex:assigned) %GenX{@status =substr(19)%}, ex:reportedBy @<UserShape> %GenX{ reported = %}, ex:reportedOn xsd:date %GenX{ [-1]@date %}, (ex:reproducedBy @<UserShape>, ex:reproducedOn xsd:date %GenX{ @date %} )? %GenX{ reproduced = %}, ex:related @<IssueShape>* } %GenX{ @id %}<UserShape> { (foaf:name xsd:string %GenX{ full-name %} | foaf:givenName xsd:string+ %GenX{ given-name %} , foaf:familyName xsd:string %GenX{ family-name %} ) , foaf:mbox shex:IRI ? %GenX{ email %}}

Page 24: Shape Expressions: An RDF validation and transformation language

Example:Issue1 :status :Unassigned ; :reportedBy :Bob ; :reportedOn "2013-01-23"^^xsd:date ; :reproducedBy :Thompson.J ; :reproducedOn "2013-01-23"^^xsd:date .

:Bob foaf:name "Bob Smith" ; foaf:mbox <mail:[email protected]> .

:Thompson.J foaf:givenName "Joe", "Joseph" ; foaf:familyName "Thompson" ; foaf:mbox <mail:[email protected]> .

<issue xmlns="http://ex.example/xml" id="Issue1" status="Unassigned"> <reported date="2013-01-23"> <given-name>Bob</given-name> <family-name>Smith</family-name> <email>mail:[email protected]</email> </reported> <reproduced date="2013-01-23"> <given-name>Joe</given-name> <given-name>Joseph</given-name> <family-name>Thompson</family-name> <email>mail:[email protected]</email> </reproduced></issue>

RDF (Turtle)

XML

Shape Expressions+

GenX

%GenX{ issue $http://ex.example/xml %}<IssueShape> { ex:status (ex:unassigned ex:assigned) %GenX{@status =substr(19)%}, ex:reportedBy @<UserShape> %GenX{ reported = %}, ex:reportedOn xsd:date %GenX{ [-1]@date %}, (ex:reproducedBy @<UserShape>, ex:reproducedOn xsd:date %GenX{ @date %} )? %GenX{ reproduced = %}, ex:related @<IssueShape>* } %GenX{ @id %}<UserShape> { (foaf:name xsd:string %GenX{ full-name %} | foaf:givenName xsd:string+ %GenX{ given-name %} , foaf:familyName xsd:string %GenX{ family-name %} ) , foaf:mbox shex:IRI ? %GenX{ email %}}

Shape Expressions + GenX

Page 25: Shape Expressions: An RDF validation and transformation language

Current ImplementationsName Main

DeveloperLanguage Features

FancyDemo Eric Prud'hommeaux

Javascript First implementationSemantic Actions - GenX, GenJConversion to SPARQLhttp://www.w3.org/2013/ShEx/

JsShExTest Jesse van Dam Javascript Supports RDF and Compact syntaxhttps://github.com/jessevdam/shextest

ShExcala Jose E. Labra Scala Several extensions: negations, reverse arcs, relations,...Efficient implementation using Derivativeshttp://labra.github.io/ShExcala/

Haws Jose E. Labra Haskell Prototype to check inference semanticshttp://labra.github.io/haws/

Page 26: Shape Expressions: An RDF validation and transformation language

Applications to linked data portals2 data portals: WebIndex and LandPortal

Data portal documentationhttp://weso.github.io/wiDoc/ http://weso.github.io/landportalDoc/data<Observation> { cex:md5-checksum xsd:string , cex:computation @<Computation> , dcterms:issued xsd:integer , dcterms:publisher ( wi-org:WebFoundation ), qb:dataSet @<Dataset> , rdfs:label (@en) , sdmx-concept:obsStatus @<ObsStatus> , wi-onto:ref-area @<Area>, wi-onto:ref-indicator @<Indicator> , wi-onto:ref-year xsd:int , cex:value xsd:double, a ( qb:Observation )}

<Observation> { cex:ref-area @<Area>, cex:ref-indicator @<Indicator>, cex:ref-time @<Time>, cex:value xsd:double? , cex:computation @<Computation>, dcterms:issued xsd:dateTime, qb:dataSet @<DataSet>, qb:slice @<Slice>, rdfs:label xsd:string, lb:source @<Upload> , a ( qb:Observation )}

Same type: qb:Observation ...but different shapes More info:

Paper on Linked Data Quality Workshop

Page 27: Shape Expressions: An RDF validation and transformation language

Conclusions

Shape Expressions = simple language One goal: Describe and validate RDF graphs

Semantics of Shape ExpressionsDescribed using inference rules...but Shape Expressions can be converted to SPARQL

Compatible with other Semantic technologies

Semantic actions = Extensibility mechanismCan be applied to transform RDF

Page 28: Shape Expressions: An RDF validation and transformation language

Future WorkImprove implementations and language

Debugging and error messagesExpressiveness and usability of languagePerformance evaluation

Shape Expressions = role similar to Schema for XMLFuture applications:

Online validatorsInterface generatorsBinding: generate parsers/tools from shapesPerformance of RDF triplestores?

Page 29: Shape Expressions: An RDF validation and transformation language

Future work at w3c

RDF Data shapes WG charteredMailing list: [email protected]

"The discussion on [email protected] is the best entertainment since years; Game of Thrones colors pale." Paul Hermans (@PaulZH)

Page 30: Shape Expressions: An RDF validation and transformation language

End of presentation

Slides available at: http://www.slideshare.net/jelabra/semantics-2014