Transcript

The COMSODE project has received funding from the Seventh Framework Programme of the European Union in the grant agreement number 611358.

Open Data Node

Platform and Methodology

Peter Hanečák <[email protected]>, EEA s.r.o.

May, 2015

Who am I

● Peter Hanečák <[email protected]>

● member of COMSODE project

– leader of WP2 (architecture and design of ODN)

– leader of WP4 (implementation ODN)

● enthusiast in many things “Open”,

active in NGOs and other communities

– member of OpenData.sk and SOIT

– Fedora Linux packager

https://www.facebook.com/hany.skhttps://www.linkedin.com/in/peterhanecakhttps://twitter.com/PHanecak

Agenda

● What is COMSODE

● What is COMSODE Methodology

● What is Open Data Node (ODN)

● Integration with ODN

● HW and SW requirements

● Future of ODN

COMSODE

● Components Supporting the Open Data Exploitation

● main target: publication platform for Open Data

– software tool

● supplemental goal: methodology for publication of Open Data

– mainly for those with little or no experience with Open Data

– because software as of itself is useless for such people, organizations

● validation: pilots

– pilots by 3rd parties

– pilot by COMSODE itself: 150 datasets + 3rd party-like Search app by Spinque

COMSODE Methodology

● publication plan

● preparation of publication

● realization of publication

● archiving

reference:

● http://www.comsode.eu/index.php/deliverables/

● Deliverable D5.1 + ANNEX 1 and 2

COMSODE Methodology

● publication plan

● preparation of publication

● realization of publication

● archiving

COMSODE Methodology

● publication plan

● preparation of publication

● realization of publication

● archiving

COMSODE Methodology

● publication plan

● preparation of publication

● realization of publication

● archiving

COMSODE Methodology

● publication plan

● preparation of publication

● realization of publication

● archiving

Open Data Node

help with many publication steps as outlined in Methodology

handle complexities as present in sources of data

make it easy to publish high-quality (Linked) Open Data from those sources

in automated fashion

most common use-cases: 2* -> 3*+

● input: XLS, SQL DB, ...

● transformations: XLS, SQL -> CSV, „bad CSV“ -> CSV, CSV -> Linked Data

● output:

– tabular/relational data: CSV, REST API

– Linked Data: RDF, SPARQL endpoint

Open Datanot

Open Data

Open Data Node

Open Data Node

ODN can be used by:

● data publishers

● data users

Many publishers are also users, thus

the data ecosystem is quite

complex.

ODN can be used in many roles

within that ecosystem.

Open Data Node

● platform supporting whole

OD publishing process

● modular design

● allowing to create distributed

network of nodes

● able to be integrated to

existing infrastructure

Open Data Node

● extraction, transformation and

enrichment of internal data

● storage of resulting Open Data

● publishing of stored Open Data

on the Web

● cataloging functionality

● management functions

Open Data Node

● publication plan

● preparation of publication

● realization of publication

● archiving

Open Data Node

● publication plan

● preparation of publication

● realization of publication

● archiving

Open Data Node

● publication plan

● preparation of publication

● realization of publication

● archiving

Open Data Node

● publication plan

● preparation of publication

● realization of publication

● archiving

Open Data Node

● publication plan

● preparation of publication

● realization of publication

● archiving

Integration with Open Data Node

● data harvesting side

● data publication side

● special cases

Integration with Open Data Node

data publication side: as implied by most common use-cases

● files: CSV, RDF

● API: REST API, SPARQL endpoint

Integration with Open Data Node

data harvesting side: as implied by most common use-cases

● files: XLS, „bad CSV“, ... - almost anything(*)

● API: SQL, SOAP, ... - almost anything(*)

● plus all the „Open Data files and APIs“

(*) given a prominence of a format/technology or particular interest of „customer“

Integration with Open Data Node

special cases:

● ODN/Management: integration of SSO with your existing infrastructure

● ODN/Storage: direct access to SPARQL endpoint

● ODN/InternalCatalog: direct access to management API

● etc.

HW and SW requirements

HW:

● CPU: common x86_64 compatible (dual/quad core is recommended)

● memory: minimum 4 GB (recommended 8 GB) (*)

● storage: minimum 40 GB (*)

SW:

● OS: Debian 7.6 „Wheezy“

● OpenJDK 7

(*) Subject to size of transformed data and requirements on transformation operations.

Future of ODN

Key point: Open Source

Future depends on many factors:

● strenght of communities

– around ODN itself

– around individual components: UnifiedViews, CKAN, PostgreSQL, etc.

● how well the business goes for commercial partners which use and

maintain ODN (EEA, etc.)

Future of ODN

Key point: Open Source

Future depends on many factors:

● strenght of communities

● how well the business goes for commercial partners which use and maintain ODN (EEA, etc.)

Existing achievements strenghtening the future:

● consortium around UnifiedViews: three companies and other organizations

● Slovak government as customer for ODN

● around 10 COMSODE Pilots in various EU countries

(so far, at various stages)

ODN implementation in Slovakia

in eDemokracia project, ODN is used as:

● centralized component

● de-centralized component

de-centralized component

centralized component

ODN implementation in Slovakia

ODN as part of centralized component:

● heavily customized

– only some modules used, commercial version of triplestore,

clustered RDBMS, etc.

● decomposed to multiple servers

● integrated with other components

– centralized SSO, OCR and content clasification services, etc.

● an “upgrade” for existing data portal

data.gov.sk

● incorporated as extension into top-level GOV portal

slovensko.sk

ODN implementation in Slovakia

ODN as de-centralized component:

● ODN with little customizations

– central catalog and storage preconfigured

– etc.

● distributed as „live DVD“

● for gov. organizations and

municipalities


Recommended