Upload
albert-merono-penuela
View
45
Download
1
Embed Size (px)
Citation preview
1 Het begint met een idee
GRLC MAKES GITHUB TASTE LIKE LINKED DATA APIS
Chefs Albert Meroño-Peñuela Rinke Hoekstra
Services and Applications over Linked APIs and Data (SALAD)ESWC29-05-2016
Vrije Universiteit Amsterdam
VU University Amsterdam – Computer Science (Knowledge Representation & Reasoning group)
International Institute of Social History (IISG), Amsterdam
CLARIAH – National Infrastructure for Digital Humanities> DataLegend : Structured Data Hub
Previously incubated by CEDAR – Dutch historical censuses as 5-star LOD
2
INSTITUTIONAL SLIDE
3 Het begint met een idee
DISCLAIMER
3
Frustration-driven research
4 Het begint met een idee
1. LD-CONSUMING APPLICATIONS
4
5 Het begint met een idee5 Het begint met een idee
Publishing Dutch historical censuses as 5-star LD> Intensive use of RDF Data Cube> Harmonization rules> Provenance
1st historical census data as Linked Data (1795-1971)
8 million observations (sex, marital status, occupation position, housing type, residence status)
External links> Geographical: 2.7M> Occupations: 350K> Belief: 250K
High value for social historians5 Faculty / department / title presentation
THE CEDAR STORY
Vrije Universiteit Amsterdam
Historians can’t really write SPARQL Variety of access interfaces needed
6
CENSUS DATA QUERYING INTERFACES
Vrije Universiteit Amsterdam
CLARIAH-WP4: Structured data hub for social historians
IPUMS, NAPP, CEDAR, etc> Macro-, micro-, meso-data> Civil registries, occupation, religion,
country-level economic indicators> National (Netherlands) and
international Mostly CSV tables turned
into RDF Data Cube and CSVW
More than 1B triples already Higher variety of humanities
scholars higher variety of data access requirements)
7
SCALING VARIETY
8 Het begint met een idee8
9 Het begint met een idee
FRUSTRATION 1
9
This is SPARQL mess!!!1one
10 Het begint met een idee
11 Het begint met een idee11 Het begint met een idee
One .rq file for SPARQL query Good support of query curation
processes> Versioning> Branching> Clone-pull-push
Web-friendly features!> One URI per query> Uniquely identifiable> De-referenceable
(raw.githubusercontent.com)
11 Faculty / department / title presentation
GITHUB AS A HUB OF SPARQL QUERIES
12 Het begint met een idee
LESSON 1
12
Query centralization helps maintaining distributed applications
13 Het begint met een idee
2. THE NEED FOR APIS
13
Vrije Universiteit Amsterdam
Linked Data APIs emerge RESTful entry point to Linked Data hubs for Web applications OpenPHACTS
…but the Linked Data API (e.g. Swagger spec, code itself) still needs to be coded and maintained
14
MEANWHILE IN THE SEMANTIC WEB…
Vrije Universiteit Amsterdam
Love story – thanks KMi! Automatically builds Swagger
specs and API code Takes SPARQL queries as input
(1 API operation = 1 SPARQL query)> API call functionality limited to SPARQL
expressivity Makes SPARQL queries uniquely
referenceable by using their equivalent LDA operation> Stores SPARQL internally> But we already have uniquely
referenceable SPARQL…
15
BASIL
16 Het begint met een idee
FRUSTRATION 2
16
Copy-pasting 200 queries!!!&Organization problem
17 Het begint met een idee17 Het begint met een idee
Cousin of BASIL in a SALAD Same basic principle: 1 SPARQL
query = 1 API operation Automatically builds Swagger spec
and UI from SPARQL
But: External query management Organization of SPARQL queries in
the GitHub repo matches organization of the API
Thin layer – nothing stored server-side
Maps> GitHub API> Swagger spec
17 Faculty / department / title presentation
Vrije Universiteit Amsterdam18
MAPPING GITHUB AND SWAGGER
Vrije Universiteit Amsterdam
19
SPARQL DECORATOR SYNTAX
Vrije Universiteit Amsterdam
20
THE GRLC SERVICE
Assuming your repo is at https://github.com/:owner/:repo and your grlc instance at :host,
> http://:host/:owner/:repo/spec returns the JSON swagger spec> http://:host/:owner/:repo/api-docs returns the swagger UI> http://:host/:owner/:repo/:operation?p_1=v_1...p_n=v_n calls
operation with specifiec parameter values> Uses BASIL’s SPARQL variable name convention for query parameters
Sends requests to> https://api.github.com/repos/:owner/:repo to look for SPARQL queries and their
decorators> https://raw.githubusercontent.com/:owner/:repo/master/file.rq to dereference
queries, get the SPARQL, and parse it
Vrije Universiteit Amsterdam
21
SPICED-UP SWAGGER UI
Vrije Universiteit Amsterdam
22
EVALUATION – USE CASES
CEDAR: Access to census data for historians> Hides SPARQL> Allows them to fill query parameters
through forms> Co-existence of SPARQL and non-SPARQL
clients CLARIAH - Born Under a Bad
Sign: Do prenatal and early-life conditions have an impact on socioeconomic and health outcomes later in life? (uses 1891 Canada and Sweden Linked Census Data)> Reduction of coupling between SPARQL
libs and R> Shorter R code – input stream as CSV
Vrije Universiteit Amsterdam
The spectrum of Linked Data clients: SPARQL intensive applications vs RESTful API applications
grlc uses decoupling of SPARQL from all client applications (including LDA) as a powerful practice
Separates query curation workflows from everything else Allows at the same time
> Web-friendly SPARQL queries> Web-friendly RESTful APIs
Helps you to easily organise your LDA – just organise your SPARQL repository and you’re set
Try it out!> http://grlc.clariah-sdh.eculture.labs.vu.nl> https://github.com/CLARIAH/grlc 23
CONCLUSIONS
24 Het begint met een idee
THANK YOU!
@ALBERTMERONYO
DATALEGEND.NETCLARIAH.NL
24