Shape Expressions: An RDF validation and transformation language

Preview:

DESCRIPTION

Presentation at Semantics-2014, Leipzig, Sept. 2014 Author: Jose Emilio Labra Gayo

Citation preview

Shape Expressions: An RDF validation and transformation language

Eric Prud'hommeauxWorld Wide Web

ConsortiumMIT, Cambridge, MA, USA

eric@w3.org

Harold SolbrigMayo Clinic

USACollege of Medicine, Rochester,

MN, USA

Jose Emilio Labra GayoWESO Research groupUniversity of Oviedo

Spainlabra@uniovi.es

This talk in 1 slide

Motivating example: Represent issues and users in RDF...and validate that data

Shape Expressions = simple language to:Describe the topology of RDF dataValidate if an RDF graph matches a given shape

Shape expressions can be extended with actionsPossible application: transform RDF into XML

Motivating example

Represent in RDF a issue tracking systemIssues are reported by users on some dateIssues have some status (assigned/unassigned)Issues can also be reproduced on some date by users

User Issue

User__ foaf:name: xsd:stringfoaf:givenName: xsd:string*foaf:familyName: xsd:stringfoaf:mbox: IRI

Issue__ :status: (:Assigned :Unassigned):reportedOn: xsd:date:reproducedOn: xsd:date

:reportedBy 0..*1

:reproducedBy0..* 0..1

0..*

0..1

:related

E-R Diagram

...and several constraints

A user: - has full name or several given names and one

family name- can have one mbox

A Issue- has status Assigned/Unassigned- is reported by a user- is reported on a date- can be reproduced by a user on a

date- is related to other issues

Example data in RDF:Issue1 :status :Unassigned ; :reportedBy :Bob ; :reportedOn "2013-01-23"^^xsd:date ; :reproducedBy :Thompson.J ; :reproducedOn "2013-01-23"^^xsd:date .

:Bob foaf:name "Bob Smith" ; foaf:mbox <mail:bob@example.org> .

:Thompson.J foaf:givenName "Joe", "Joseph" ; foaf:familyName "Thompson" ; foaf:mbox <mail:joe@example.org> .

:Issue2 :status :Checked ; :reportedBy :Issue1 ; :reportedOn 2014 ; :reproducedBy :Tom .

:Tom foaf:name "Tom Smith", "Tam" .

:Anna foaf:givenName "Anna" ; foaf:mbox 23.

Problem statementWe want to detect possible errors in RDF like:

Issues without statusIssues with status different of Assigned/UnassignedIssues reported by something different to a userIssues reported on a date with a non-date typeIssues reproduced on a date before the reported dateUsers without mboxUsers with 2 namesUsers with with a name of type integer...lots of other errors...

Q: How can we describe RDF data to be able to detect those errors?A: Our proposal = Shape Expressions

Shape Expressions - UsersA user can have either:

one foaf:name or one or more foaf:givenName and one foaf:familyName all of them must be of type xsd:string

A user can have one foaf:mbox with value any IRI

<UserShape> { ( foaf:name xsd:string | foaf:givenName xsd:string+ , foaf:familyName xsd:string ), foaf:mbox IRI ?} The example uses compact syntax

Shape Expressions can also be represented in RDF

Shape Expressions - Issues

<IssueShape> { :status (:Assigned :Unassigned), :reportedBy @<UserShape>, :reportedOn xsd:date, ( :reproducedBy @<UserShape> , :reproducedOn xsd:date )?, :related @<IssueShape>*}

Issues :status must be either :Assigned or :UnassignedIssues are :reportedBy a user Issues are :reportedOn a xsd:dateA issue may be :reproducedBy a user and :reproduceOn an xsd:dateA issue can be :related to several issues

Full exampleprefix : <http://example.org/>prefix xsd: <http://www.w3.org/2001/XMLSchema#>prefix foaf: <http://xmlns.com/foaf/0.1/>

<UserShape> { ( foaf:name xsd:string | foaf:givenName xsd:string+ , foaf:familyName xsd:string ), foaf:mbox IRI ?}

<IssueShape> { :status (:Assigned :Unassigned), :reportedBy @<UserShape>, :reportedOn xsd:date, ( :reproducedBy @<UserShape> , :reproducedOn xsd:date )?, :related @<IssueShape>*}

Online Shape Expressions validators: http://www.w3.org/2013/ShEx http://rdfshape.weso.es

FAQ: Why not use SPARQL?

<UserShape> { ( foaf:name xsd:string | foaf:givenName xsd:string+ , foaf:familyName xsd:string ), foaf:mbox IRI ?}

<IssueShape> { :status (:Assigned :Unassigned), :reportedBy @<UserShape>, :reportedOn xsd:date, ( :reproducedBy @<UserShape> , :reproducedOn xsd:date )?, :related @<IssueShape>*}

1234567891011121314151617

CONSTRUCT { ?IssueShape :hasShape <IssueShape> . ?UserShape :hasShape <UserShape> .} { { SELECT ?IssueShape { ?IssueShape :status ?o . } GROUP BY ?IssueShape HAVING (COUNT(*)=1)} { SELECT ?IssueShape { ?IssueShape :status ?o . FILTER ((?o = :Assigned || ?o = :Unassigned)) } GROUP BY ?IssueShape HAVING (COUNT(*)=1)} { SELECT ?IssueShape (COUNT(*) AS ?IssueShape_c0) { ?IssueShape :reportedBy ?o . } GROUP BY ?IssueShape HAVING (COUNT(*)=1)} { SELECT ?IssueShape { ?IssueShape :reportedBy ?o .

FILTER ((isIRI(?o) || isBlank(?o))) } GROUP BY ?IssueShape HAVING (COUNT(*)=1)} { SELECT ?IssueShape (COUNT(*) AS ?IssueShape_c1) { { SELECT ?IssueShape ?UserShape { ?IssueShape :reportedBy ?UserShape . FILTER (isIRI(?UserShape) || isBlank(?UserShape)) } } { SELECT ?UserShape WHERE { { { SELECT ?UserShape { ?UserShape foaf:name ?o . } GROUP BY ?UserShape HAVING (COUNT(*)=1)} { SELECT ?UserShape { ?UserShape foaf:name ?o . FILTER ((isLiteral(?o) && datatype(?o) = xsd:string))} GROUP BY ?UserShape HAVING (COUNT(*)=1)

123456789101112131415161718192021222324252627282930

} UNION { { SELECT ?UserShape (COUNT(*) AS ?UserShape_c0) { ?UserShape foaf:givenName ?o . } GROUP BY ?UserShape HAVING (COUNT(*)>=1)} { SELECT ?UserShape (COUNT(*) AS ?UserShape_c1) { ?UserShape foaf:givenName ?o . FILTER ((isLiteral(?o) && datatype(?o) = xsd:string))} GROUP BY ?UserShape HAVING (COUNT(*)>=1)} FILTER (?UserShape_c0 = ?UserShape_c1) { SELECT ?UserShape { ?UserShape foaf:familyName ?o . } GROUP BY ?UserShape HAVING (COUNT(*)=1)} { SELECT ?UserShape { ?UserShape foaf:familyName ?o . FILTER ((isLiteral(?o) && datatype(?o) = xsd:string))} GROUP BY ?UserShape HAVING (COUNT(*)=1)}} } GROUP BY ?UserShape HAVING (COUNT(*) = 1)} { SELECT ?UserShape (COUNT(*) AS ?UserShape_c2) { ?UserShape foaf:mbox ?o . } GROUP BY ?UserShape HAVING (COUNT(*)<=1)} { SELECT ?UserShape (COUNT(*) AS ?UserShape_c3) { ?UserShape foaf:mbox ?o .

FILTER (isIRI(?o)) } GROUP BY ?UserShape HAVING (COUNT(*)<=1)} FILTER (?UserShape_c2 = ?UserShape_c3)

313233343536373839404142434445464748495051525354555657585960

FILTER (?UserShape_c2 = ?UserShape_c3) } GROUP BY ?IssueShape } FILTER (?IssueShape_c0 = ?IssueShape_c1) OPTIONAL { ?IssueShape :reportedBy ?IssueShape_UserShape_ref0 . FILTER (isIRI(?IssueShape_UserShape_ref0) || isBlank(?IssueShape_UserShape_ref0)) } { SELECT ?IssueShape { ?IssueShape :reportedOn ?o . } GROUP BY ?IssueShape HAVING (COUNT(*)=1)} { SELECT ?IssueShape { ?IssueShape :reportedOn ?o . FILTER ((isLiteral(?o) && datatype(?o) = xsd:date))} GROUP BY ?IssueShape HAVING (COUNT(*)=1)} { { SELECT ?IssueShape (COUNT(*) AS ?IssueShape_c2) { ?IssueShape :reproducedBy ?o . } GROUP BY ?IssueShape} { SELECT ?IssueShape (COUNT(*) AS ?IssueShape_c3) { ?IssueShape :reproducedBy ?o . FILTER ((isIRI(?o) || isBlank(?o))) } GROUP BY ?IssueShape} FILTER (?IssueShape_c2 = ?IssueShape_c3) { SELECT ?IssueShape (COUNT(*) AS ?IssueShape_c5) { ?IssueShape :reproducedOn ?o . } GROUP BY ?IssueShape} { SELECT ?IssueShape (COUNT(*) AS ?IssueShape_c6) { ?IssueShape :reproducedOn ?o . FILTER ((isLiteral(?o) && datatype(?o) = xsd:date))} GROUP BY ?IssueShape} FILTER (?IssueShape_c5 = ?IssueShape_c6)

616263646566676869707172737475767778798081828384858687888990

FILTER (?IssueShape_c2=0 && ?IssueShape_c5=0 || ?IssueShape_c2>=1&&?IssueShape_c2<=1 && ?IssueShape_c5>=1&&?IssueShape_c5<=1) } { SELECT ?IssueShape (COUNT(*) AS ?IssueShape_c7) { ?IssueShape :related ?o . } GROUP BY ?IssueShape} { SELECT ?IssueShape (COUNT(*) AS ?IssueShape_c8) { ?IssueShape :related ?o . } GROUP BY ?IssueShape}FILTER (?IssueShape_c7 = ?IssueShape_c8) { SELECT ?UserShape WHERE { { { SELECT ?UserShape { ?UserShape foaf:name ?o . } GROUP BY ?UserShape HAVING (COUNT(*)=1)} { SELECT ?UserShape { ?UserShape foaf:name ?o . FILTER ((isLiteral(?o) && datatype(?o) = xsd:string)) } GROUP BY ?UserShape HAVING (COUNT(*)=1)} } UNION { { SELECT ?UserShape (COUNT(*) AS ?UserShape_c0) { ?UserShape foaf:givenName ?o . } GROUP BY ?UserShape HAVING (COUNT(*)>=1)} { SELECT ?UserShape (COUNT(*) AS ?UserShape_c1) { ?UserShape foaf:givenName ?o . FILTER ((isLiteral(?o) && datatype(?o) = xsd:string))} GROUP BY ?UserShape HAVING (COUNT(*)>=1)} FILTER (?UserShape_c0 = ?UserShape_c1) { SELECT ?UserShape { ?UserShape foaf:familyName ?o .

919293949596979899100101102103104105106107108109110111112113114115116117118119120

} GROUP BY ?UserShape HAVING (COUNT(*)=1)} { SELECT ?UserShape { ?UserShape foaf:familyName ?o . FILTER ((isLiteral(?o) && datatype(?o) = xsd:string)) } GROUP BY ?UserShape HAVING (COUNT(*)=1)}} } GROUP BY ?UserShape HAVING (COUNT(*) = 1)} { SELECT ?UserShape (COUNT(*) AS ?UserShape_c2) { ?UserShape foaf:mbox ?o . } GROUP BY ?UserShape HAVING (COUNT(*)<=1)} { SELECT ?UserShape (COUNT(*) AS ?UserShape_c3) { ?UserShape foaf:mbox ?o . FILTER (isIRI(?o)) } GROUP BY ?UserShape HAVING (COUNT(*)<=1)} FILTER (?UserShape_c2 = ?UserShape_c3)}

121122123124125126127128129130131132133134135136

.

.

.

.

Shape Expression

Shape Expressions can be converted to SPARQLBut Shape Expressions are simpler and more readable to solve this problem

Shape Expressions Language

Schema = set of Shape ExpressionsShape Expression = labeled pattern

Typical pattern = conjunction of several expressionsConjunction represented by ,

<IssueShape> { :status (:Assigned :Unassigned), :reportedBy @<UserShape>, :reportedOn xsd:date...}

<label> { ...pattern... }

Label

Conjunction

Arcs

Basic expression: an ArcArc = name definition followed by value definition

<IssueShape> { :status (:Assigned :Unassigned), :reportedBy @<UserShape>, :reportedOn xsd:date...}

:bob:isue1 :reportedBy

:status :Unassigned

:reportedOn 23-01-2013

Name defn Value defn

Value definition

Value definitions can be Value type xsd:date Matches a value of type xsd:date

Value set ( :Assigned :Unassigned )

The object is an element of the given set

Reference @<UserShape> The object has shape <UserShape>

Stem foaf:~ Starts with the IRI associated with foaf

Any - :Checked Any value except :Checked

<IssueShape> { :status (:Assigned :Unassigned), :reportedBy @<UserShape>, :reportedOn xsd:date...}

Value set

Value reference

Value type

Name definition

Name definitions can be

Name term foaf:name Matches given IRI

Name stem foaf:~ Any predicate that starts by foaf

Name any - foaf:name Any predicate except foaf:name

<IssueShape> { :status (:Assigned :Unassigned), :reportedBy @<UserShape>, :reportedOn xsd:date...}

Name terms

Alternatives

Alternatives (disjunctions) are marked by |Example 1: An agent has either foaf:name or rdfs:label

<Agent> { ( foaf:name xsd:string | rdfs:label xsd:string ) ...}

<listOfInt> { rdf:first xsd:integer , ( rdf:rest ( rdf:nil ) | rdf:rest @<listOfInt> )}

Example 2: A list of integers

Cardinalities

The same as in common regular expressions* 0 or more

+ 1 or more? 0 or 1

{m} m repetitions

{m,n} Between m and n repetitions

<IssueShape> { ... ( :reproducedBy @<UserShape>, :reproducedOn xsd:date)? , :related @<IssueShape>*}

Semantic actionsDefine actions to be executed during validation

<Issue> { ... :reportedOn xsd:date %js{ report = _.o; return true; %} , ( :reproducedBy @<UserShape> , :reproducedOn xsd:date %js{ return _.o.lex > report.lex; %} ) ?}

%lang{ ...actions... %}

Calls lang processor passing it the given actions

Example: Check that :reportedOn must be before :reproducedOn

Semantics of Shape Expressions

Operational semantics using inference rulesInspired by the semantics of RelaxNGFormalism used to define type inference systemsMatching infer shape typingsAxioms and rules of the form:

Example: matching rules ( )

More details in the paper

Graph can be decomposedin g1 and g2

Combine typingst1 and t2

Type AssignmentContext Graph

Transforming RDF using ShEx

Semantic actions can be combined with specialized languages

Possible languages: sparql, js Other examples:GenX = very simple language to generate XML

Goal: Semantic loweringMap RDF clinical records to XML

GenJ generates JSON

Example:Issue1 :status :Unassigned ; :reportedBy :Bob ; :reportedOn "2013-01-23"^^xsd:date ; :reproducedBy :Thompson.J ; :reproducedOn "2013-01-23"^^xsd:date .

:Bob foaf:name "Bob Smith" ; foaf:mbox <mail:bob@example.org> .

:Thompson.J foaf:givenName "Joe", "Joseph" ; foaf:familyName "Thompson" ; foaf:mbox <mail:joe@example.org> .

<issue xmlns="http://ex.example/xml" id="Issue1" status="Unassigned"> <reported date="2013-01-23"> <given-name>Bob</given-name> <family-name>Smith</family-name> <email>mail:bob@example.org</email> </reported> <reproduced date="2013-01-23"> <given-name>Joe</given-name> <given-name>Joseph</given-name> <family-name>Thompson</family-name> <email>mail:joe@example.org</email> </reproduced></issue>

RDF (Turtle)

XML

Shape Expressions+

GenX

GenXGenX syntax

$IRI Generates elements in that namespace

<name> Add element <name>@<name> Add attribute <name>

=<expr> XPath function applied to the value

= Don't emit the value

[-n] Place the value up n values in the hierarchy

Example transforming RDF to XML%GenX{ issue $http://ex.example/xml %}<IssueShape> { ex:status (ex:unassigned ex:assigned) %GenX{@status =substr(19)%}, ex:reportedBy @<UserShape> %GenX{ reported = %}, ex:reportedOn xsd:date %GenX{ [-1]@date %}, (ex:reproducedBy @<UserShape>, ex:reproducedOn xsd:date %GenX{ @date %} )? %GenX{ reproduced = %}, ex:related @<IssueShape>* } %GenX{ @id %}<UserShape> { (foaf:name xsd:string %GenX{ full-name %} | foaf:givenName xsd:string+ %GenX{ given-name %} , foaf:familyName xsd:string %GenX{ family-name %} ) , foaf:mbox shex:IRI ? %GenX{ email %}}

Example:Issue1 :status :Unassigned ; :reportedBy :Bob ; :reportedOn "2013-01-23"^^xsd:date ; :reproducedBy :Thompson.J ; :reproducedOn "2013-01-23"^^xsd:date .

:Bob foaf:name "Bob Smith" ; foaf:mbox <mail:bob@example.org> .

:Thompson.J foaf:givenName "Joe", "Joseph" ; foaf:familyName "Thompson" ; foaf:mbox <mail:joe@example.org> .

<issue xmlns="http://ex.example/xml" id="Issue1" status="Unassigned"> <reported date="2013-01-23"> <given-name>Bob</given-name> <family-name>Smith</family-name> <email>mail:bob@example.org</email> </reported> <reproduced date="2013-01-23"> <given-name>Joe</given-name> <given-name>Joseph</given-name> <family-name>Thompson</family-name> <email>mail:joe@example.org</email> </reproduced></issue>

RDF (Turtle)

XML

Shape Expressions+

GenX

%GenX{ issue $http://ex.example/xml %}<IssueShape> { ex:status (ex:unassigned ex:assigned) %GenX{@status =substr(19)%}, ex:reportedBy @<UserShape> %GenX{ reported = %}, ex:reportedOn xsd:date %GenX{ [-1]@date %}, (ex:reproducedBy @<UserShape>, ex:reproducedOn xsd:date %GenX{ @date %} )? %GenX{ reproduced = %}, ex:related @<IssueShape>* } %GenX{ @id %}<UserShape> { (foaf:name xsd:string %GenX{ full-name %} | foaf:givenName xsd:string+ %GenX{ given-name %} , foaf:familyName xsd:string %GenX{ family-name %} ) , foaf:mbox shex:IRI ? %GenX{ email %}}

Shape Expressions + GenX

Current ImplementationsName Main

DeveloperLanguage Features

FancyDemo Eric Prud'hommeaux

Javascript First implementationSemantic Actions - GenX, GenJConversion to SPARQLhttp://www.w3.org/2013/ShEx/

JsShExTest Jesse van Dam Javascript Supports RDF and Compact syntaxhttps://github.com/jessevdam/shextest

ShExcala Jose E. Labra Scala Several extensions: negations, reverse arcs, relations,...Efficient implementation using Derivativeshttp://labra.github.io/ShExcala/

Haws Jose E. Labra Haskell Prototype to check inference semanticshttp://labra.github.io/haws/

Applications to linked data portals2 data portals: WebIndex and LandPortal

Data portal documentationhttp://weso.github.io/wiDoc/ http://weso.github.io/landportalDoc/data<Observation> { cex:md5-checksum xsd:string , cex:computation @<Computation> , dcterms:issued xsd:integer , dcterms:publisher ( wi-org:WebFoundation ), qb:dataSet @<Dataset> , rdfs:label (@en) , sdmx-concept:obsStatus @<ObsStatus> , wi-onto:ref-area @<Area>, wi-onto:ref-indicator @<Indicator> , wi-onto:ref-year xsd:int , cex:value xsd:double, a ( qb:Observation )}

<Observation> { cex:ref-area @<Area>, cex:ref-indicator @<Indicator>, cex:ref-time @<Time>, cex:value xsd:double? , cex:computation @<Computation>, dcterms:issued xsd:dateTime, qb:dataSet @<DataSet>, qb:slice @<Slice>, rdfs:label xsd:string, lb:source @<Upload> , a ( qb:Observation )}

Same type: qb:Observation ...but different shapes More info:

Paper on Linked Data Quality Workshop

Conclusions

Shape Expressions = simple language One goal: Describe and validate RDF graphs

Semantics of Shape ExpressionsDescribed using inference rules...but Shape Expressions can be converted to SPARQL

Compatible with other Semantic technologies

Semantic actions = Extensibility mechanismCan be applied to transform RDF

Future WorkImprove implementations and language

Debugging and error messagesExpressiveness and usability of languagePerformance evaluation

Shape Expressions = role similar to Schema for XMLFuture applications:

Online validatorsInterface generatorsBinding: generate parsers/tools from shapesPerformance of RDF triplestores?

Future work at w3c

RDF Data shapes WG charteredMailing list: public-rdf-shapes@mail.org

"The discussion on public-rdf-shapes@w3.org is the best entertainment since years; Game of Thrones colors pale." Paul Hermans (@PaulZH)

End of presentation

Slides available at: http://www.slideshare.net/jelabra/semantics-2014

Recommended