The COMSODE project has received funding from the Seventh Framework Programme of the European Union in the grant agreement number 611358.
Open Data Node
Platform and Methodology
Peter Hanečák <[email protected]>, EEA s.r.o.
May, 2015
Who am I
● Peter Hanečák <[email protected]>
● member of COMSODE project
– leader of WP2 (architecture and design of ODN)
– leader of WP4 (implementation ODN)
● enthusiast in many things “Open”,
active in NGOs and other communities
– member of OpenData.sk and SOIT
– Fedora Linux packager
https://www.facebook.com/hany.skhttps://www.linkedin.com/in/peterhanecakhttps://twitter.com/PHanecak
Agenda
● What is COMSODE
● What is COMSODE Methodology
● What is Open Data Node (ODN)
● Integration with ODN
● HW and SW requirements
● Future of ODN
COMSODE
● Components Supporting the Open Data Exploitation
● main target: publication platform for Open Data
– software tool
● supplemental goal: methodology for publication of Open Data
– mainly for those with little or no experience with Open Data
– because software as of itself is useless for such people, organizations
● validation: pilots
– pilots by 3rd parties
– pilot by COMSODE itself: 150 datasets + 3rd party-like Search app by Spinque
COMSODE Methodology
● publication plan
● preparation of publication
● realization of publication
● archiving
reference:
● http://www.comsode.eu/index.php/deliverables/
● Deliverable D5.1 + ANNEX 1 and 2
COMSODE Methodology
● publication plan
● preparation of publication
● realization of publication
● archiving
COMSODE Methodology
● publication plan
● preparation of publication
● realization of publication
● archiving
COMSODE Methodology
● publication plan
● preparation of publication
● realization of publication
● archiving
COMSODE Methodology
● publication plan
● preparation of publication
● realization of publication
● archiving
Open Data Node
help with many publication steps as outlined in Methodology
handle complexities as present in sources of data
make it easy to publish high-quality (Linked) Open Data from those sources
in automated fashion
most common use-cases: 2* -> 3*+
● input: XLS, SQL DB, ...
● transformations: XLS, SQL -> CSV, „bad CSV“ -> CSV, CSV -> Linked Data
● output:
– tabular/relational data: CSV, REST API
– Linked Data: RDF, SPARQL endpoint
Open Datanot
Open Data
Open Data Node
Open Data Node
ODN can be used by:
● data publishers
● data users
Many publishers are also users, thus
the data ecosystem is quite
complex.
ODN can be used in many roles
within that ecosystem.
Open Data Node
● platform supporting whole
OD publishing process
● modular design
● allowing to create distributed
network of nodes
● able to be integrated to
existing infrastructure
Open Data Node
● extraction, transformation and
enrichment of internal data
● storage of resulting Open Data
● publishing of stored Open Data
on the Web
● cataloging functionality
● management functions
Open Data Node
● publication plan
● preparation of publication
● realization of publication
● archiving
Open Data Node
● publication plan
● preparation of publication
● realization of publication
● archiving
Open Data Node
● publication plan
● preparation of publication
● realization of publication
● archiving
Open Data Node
● publication plan
● preparation of publication
● realization of publication
● archiving
Open Data Node
● publication plan
● preparation of publication
● realization of publication
● archiving
Integration with Open Data Node
data publication side: as implied by most common use-cases
● files: CSV, RDF
● API: REST API, SPARQL endpoint
Integration with Open Data Node
data harvesting side: as implied by most common use-cases
● files: XLS, „bad CSV“, ... - almost anything(*)
● API: SQL, SOAP, ... - almost anything(*)
● plus all the „Open Data files and APIs“
(*) given a prominence of a format/technology or particular interest of „customer“
Integration with Open Data Node
special cases:
● ODN/Management: integration of SSO with your existing infrastructure
● ODN/Storage: direct access to SPARQL endpoint
● ODN/InternalCatalog: direct access to management API
● etc.
HW and SW requirements
HW:
● CPU: common x86_64 compatible (dual/quad core is recommended)
● memory: minimum 4 GB (recommended 8 GB) (*)
● storage: minimum 40 GB (*)
SW:
● OS: Debian 7.6 „Wheezy“
● OpenJDK 7
(*) Subject to size of transformed data and requirements on transformation operations.
Future of ODN
Key point: Open Source
Future depends on many factors:
● strenght of communities
– around ODN itself
– around individual components: UnifiedViews, CKAN, PostgreSQL, etc.
● how well the business goes for commercial partners which use and
maintain ODN (EEA, etc.)
Future of ODN
Key point: Open Source
Future depends on many factors:
● strenght of communities
● how well the business goes for commercial partners which use and maintain ODN (EEA, etc.)
Existing achievements strenghtening the future:
● consortium around UnifiedViews: three companies and other organizations
● Slovak government as customer for ODN
● around 10 COMSODE Pilots in various EU countries
(so far, at various stages)
ODN implementation in Slovakia
in eDemokracia project, ODN is used as:
● centralized component
● de-centralized component
de-centralized component
centralized component
ODN implementation in Slovakia
ODN as part of centralized component:
● heavily customized
– only some modules used, commercial version of triplestore,
clustered RDBMS, etc.
● decomposed to multiple servers
● integrated with other components
– centralized SSO, OCR and content clasification services, etc.
● an “upgrade” for existing data portal
data.gov.sk
● incorporated as extension into top-level GOV portal
slovensko.sk
ODN implementation in Slovakia
ODN as de-centralized component:
● ODN with little customizations
– central catalog and storage preconfigured
– etc.
● distributed as „live DVD“
● for gov. organizations and
municipalities