Publishing RDF SKOS with Java microservices
Publishing RDF SKOS
with Java microservices
Fedict Brussel jan 2017
Linked Data
Resource Description Framework
Triple stores
Jena and RDF4j
Dropwizard
Agenda
Tekening van een hoed
DropwizardRDF4j
Overview
JettyLuceneRDF storeJerseyFreemarkerSlf4j
Linked Data
Making the web machine-readable
Distributed / webChallenging for queries
Data not guaranteed to be available / persistent
Add meaning to relations / links
Semantic web
Using URI as identifier
Dereferenceable URI
Identifier
Resource Description Framework
Triple
S and P are resource identifiers (IRI)http://example.com, mailto:[email protected],
urn:example:1234-56789, ...
O can be: Identifier (link to something else)
LiteralString value with optional language tag
OR typed value (e.g XSD date, integer...)
RDF Basics
RDF is not a file formatAlthough .rdf extension is often used for RDF/XML
Popular serializationsN-Triples (.nt): fast and easy
Turtle (.ttl): human-friendly
RDF/XML (.rdf): XML-flows
JSON-LD (.json): web devs
RDF serializations
Based upon RDF SchemaSomewhat similar to XML Schema
Classes and properties
Can (and should be !) be mixed, reused
Popular vocabulariesDublin Core: generic title, description
SKOS: broader / narrower term
ROV: registered organizations
http://lov.okfn.org/dataset/lov/
Vocabularies
RDF can be generated without triple store
Less suitable for:Very large tabular sets (e.g. RDBMS dumps)
Tiny sensor data
Notes
Jena and RDF4j
Both great Java open source frameworksReading/writing/converting RDF, Triple stores ...
Apache Jenahttps://jena.apache.org/
Better performance / more scalable ?
Eclipse RDF4j (Sesame)http://rdf4j.org/
Better architecture (Sails) ?
Jena vs RDF4j
Embedded store / standalone server100 - 150 mln triples
No out-of-the-box HA / replicationProbably not needed for publishing smaller sets
Running multiple shared nothing ?
Bonus: Sail abstractionSwitch to GraphDB, Blazegraph with minor changes
Why (not) RDF4j as data store
Triple stores
TS optimized for storing triples
TS often lack fine-grained checksFew checks for data types, non-null
Commercial stores like StarDog offer more options
Work in progress: https://www.w3.org/TR/shacl/
Full text search often handled by LuceneOften product-specific extension
Queries and updates with SPARQL (SQL-alike)And / or custom api, faster but less portable
Triple store vs RDBMS
Small / medium setsApache Jena store (part of framework)
Eclipse RDF4j store (part of framework)
Larger setsBlazegraph (GPU acceleration in comm.version)
OntoText GraphDB (free demo)
Oracle Spatial and Graph
Virtuoso (hybrid XML / RDBMS / TS)
Popular stores
SPARQL endpointsAdvanced queries
Heavy load on server side
Linked Data FragmentsVery basic queries
Shifting workload to client
More network traffic
http://linkeddatafragments.org/concept/
Distributed queries
Dropwizard
Mixing REST / SOA / Unix philosophyDo 1 thing and do it well
Back-end
Also in JavaTraditional Java EE to complex for small apps
Pippo, RH Wildfly Swarm, Jooby, Ninja,
Using Annotations, default config
Microservices
HTTP methodsGET, PUT, POST, DELETE, PATCH, HEAD, ...
Content NegotiationHTTP request header
Automatically serve different formats using same URL
REST
Initially developed by Yammerhttp://www.dropwizard.io
Modular but opinionatedJetty server, Jersey JAX-RS, Jackson JSON, Metrics
Very good for RESTLess suitable for front-end apps
Easy deployment1 uberjar (no need for Docker ?)
Dropwizard
Notes
Small hack for file type / language negotiationFor human-friendly HTML view
Use Jetty UriConnegFilter
Not intended for multiple vhosts, heavy cachingProxy / web server in front
AuthenticationMaybe Pac4j (3rd party): http://www.pac4j.org/
Thanks !
Bart Hanssens / FedictWTC III, Simon Bolivarlaan 301000 Brussels, [email protected] [at] fedict.be | www.fedict.belgium.be
| p.
Fedict 2014. All rights reserved | p.