grlc Makes GitHub Taste Like Linked Data APIs

Preview:

Citation preview

1 Het begint met een idee

GRLC MAKES GITHUB TASTE LIKE LINKED DATA APIS

Chefs Albert Meroño-Peñuela Rinke Hoekstra

Services and Applications over Linked APIs and Data (SALAD)ESWC29-05-2016

Vrije Universiteit Amsterdam

VU University Amsterdam – Computer Science (Knowledge Representation & Reasoning group)

International Institute of Social History (IISG), Amsterdam

CLARIAH – National Infrastructure for Digital Humanities> DataLegend : Structured Data Hub

Previously incubated by CEDAR – Dutch historical censuses as 5-star LOD

2

INSTITUTIONAL SLIDE

3 Het begint met een idee

DISCLAIMER

3

Frustration-driven research

4 Het begint met een idee

1. LD-CONSUMING APPLICATIONS

4

5 Het begint met een idee5 Het begint met een idee

Publishing Dutch historical censuses as 5-star LD> Intensive use of RDF Data Cube> Harmonization rules> Provenance

1st historical census data as Linked Data (1795-1971)

8 million observations (sex, marital status, occupation position, housing type, residence status)

External links> Geographical: 2.7M> Occupations: 350K> Belief: 250K

High value for social historians5 Faculty / department / title presentation

THE CEDAR STORY

Vrije Universiteit Amsterdam

Historians can’t really write SPARQL Variety of access interfaces needed

6

CENSUS DATA QUERYING INTERFACES

Vrije Universiteit Amsterdam

CLARIAH-WP4: Structured data hub for social historians

IPUMS, NAPP, CEDAR, etc> Macro-, micro-, meso-data> Civil registries, occupation, religion,

country-level economic indicators> National (Netherlands) and

international Mostly CSV tables turned

into RDF Data Cube and CSVW

More than 1B triples already Higher variety of humanities

scholars higher variety of data access requirements)

7

SCALING VARIETY

8 Het begint met een idee8

9 Het begint met een idee

FRUSTRATION 1

9

This is SPARQL mess!!!1one

10 Het begint met een idee

11 Het begint met een idee11 Het begint met een idee

One .rq file for SPARQL query Good support of query curation

processes> Versioning> Branching> Clone-pull-push

Web-friendly features!> One URI per query> Uniquely identifiable> De-referenceable

(raw.githubusercontent.com)

11 Faculty / department / title presentation

GITHUB AS A HUB OF SPARQL QUERIES

12 Het begint met een idee

LESSON 1

12

Query centralization helps maintaining distributed applications

13 Het begint met een idee

2. THE NEED FOR APIS

13

Vrije Universiteit Amsterdam

Linked Data APIs emerge RESTful entry point to Linked Data hubs for Web applications OpenPHACTS

…but the Linked Data API (e.g. Swagger spec, code itself) still needs to be coded and maintained

14

MEANWHILE IN THE SEMANTIC WEB…

Vrije Universiteit Amsterdam

Love story – thanks KMi! Automatically builds Swagger

specs and API code Takes SPARQL queries as input

(1 API operation = 1 SPARQL query)> API call functionality limited to SPARQL

expressivity Makes SPARQL queries uniquely

referenceable by using their equivalent LDA operation> Stores SPARQL internally> But we already have uniquely

referenceable SPARQL…

15

BASIL

16 Het begint met een idee

FRUSTRATION 2

16

Copy-pasting 200 queries!!!&Organization problem

17 Het begint met een idee17 Het begint met een idee

Cousin of BASIL in a SALAD Same basic principle: 1 SPARQL

query = 1 API operation Automatically builds Swagger spec

and UI from SPARQL

But: External query management Organization of SPARQL queries in

the GitHub repo matches organization of the API

Thin layer – nothing stored server-side

Maps> GitHub API> Swagger spec

17 Faculty / department / title presentation

Vrije Universiteit Amsterdam18

MAPPING GITHUB AND SWAGGER

Vrije Universiteit Amsterdam

19

SPARQL DECORATOR SYNTAX

Vrije Universiteit Amsterdam

20

THE GRLC SERVICE

Assuming your repo is at https://github.com/:owner/:repo and your grlc instance at :host,

> http://:host/:owner/:repo/spec returns the JSON swagger spec> http://:host/:owner/:repo/api-docs returns the swagger UI> http://:host/:owner/:repo/:operation?p_1=v_1...p_n=v_n calls

operation with specifiec parameter values> Uses BASIL’s SPARQL variable name convention for query parameters

Sends requests to> https://api.github.com/repos/:owner/:repo to look for SPARQL queries and their

decorators> https://raw.githubusercontent.com/:owner/:repo/master/file.rq to dereference

queries, get the SPARQL, and parse it

Vrije Universiteit Amsterdam

21

SPICED-UP SWAGGER UI

Vrije Universiteit Amsterdam

22

EVALUATION – USE CASES

CEDAR: Access to census data for historians> Hides SPARQL> Allows them to fill query parameters

through forms> Co-existence of SPARQL and non-SPARQL

clients CLARIAH - Born Under a Bad

Sign: Do prenatal and early-life conditions have an impact on socioeconomic and health outcomes later in life? (uses 1891 Canada and Sweden Linked Census Data)> Reduction of coupling between SPARQL

libs and R> Shorter R code – input stream as CSV

Vrije Universiteit Amsterdam

The spectrum of Linked Data clients: SPARQL intensive applications vs RESTful API applications

grlc uses decoupling of SPARQL from all client applications (including LDA) as a powerful practice

Separates query curation workflows from everything else Allows at the same time

> Web-friendly SPARQL queries> Web-friendly RESTful APIs

Helps you to easily organise your LDA – just organise your SPARQL repository and you’re set

Try it out!> http://grlc.clariah-sdh.eculture.labs.vu.nl> https://github.com/CLARIAH/grlc 23

CONCLUSIONS

24 Het begint met een idee

THANK YOU!

@ALBERTMERONYO

DATALEGEND.NETCLARIAH.NL

24

Recommended