View
79
Download
0
Category
Preview:
Citation preview
The COMSODE project has received funding from the Seventh Framework Programme of the European Union in the grant agreement number 611358.
Open Data Node
Platform and Methodology
Peter Hanečák <peter.hanecak@eea.sk>, EEA s.r.o.
May, 2015
Who am I
● Peter Hanečák <peter.hanecak@eea.sk>
● member of COMSODE project
– leader of WP2 (architecture and design of ODN)
– leader of WP4 (implementation ODN)
● enthusiast in many things “Open”,
active in NGOs and other communities
– member of OpenData.sk and SOIT
– Fedora Linux packager
https://www.facebook.com/hany.skhttps://www.linkedin.com/in/peterhanecakhttps://twitter.com/PHanecak
Agenda
● What is COMSODE
● What is COMSODE Methodology
● What is Open Data Node (ODN)
● Integration with ODN
● HW and SW requirements
● Future of ODN
COMSODE
● Components Supporting the Open Data Exploitation
● main target: publication platform for Open Data
– software tool
● supplemental goal: methodology for publication of Open Data
– mainly for those with little or no experience with Open Data
– because software as of itself is useless for such people, organizations
● validation: pilots
– pilots by 3rd parties
– pilot by COMSODE itself: 150 datasets + 3rd party-like Search app by Spinque
COMSODE Methodology
● publication plan
● preparation of publication
● realization of publication
● archiving
reference:
● http://www.comsode.eu/index.php/deliverables/
● Deliverable D5.1 + ANNEX 1 and 2
COMSODE Methodology
● publication plan
● preparation of publication
● realization of publication
● archiving
COMSODE Methodology
● publication plan
● preparation of publication
● realization of publication
● archiving
COMSODE Methodology
● publication plan
● preparation of publication
● realization of publication
● archiving
COMSODE Methodology
● publication plan
● preparation of publication
● realization of publication
● archiving
Open Data Node
help with many publication steps as outlined in Methodology
handle complexities as present in sources of data
make it easy to publish high-quality (Linked) Open Data from those sources
in automated fashion
most common use-cases: 2* -> 3*+
● input: XLS, SQL DB, ...
● transformations: XLS, SQL -> CSV, „bad CSV“ -> CSV, CSV -> Linked Data
● output:
– tabular/relational data: CSV, REST API
– Linked Data: RDF, SPARQL endpoint
Open Datanot
Open Data
Open Data Node
Open Data Node
ODN can be used by:
● data publishers
● data users
Many publishers are also users, thus
the data ecosystem is quite
complex.
ODN can be used in many roles
within that ecosystem.
Open Data Node
● platform supporting whole
OD publishing process
● modular design
● allowing to create distributed
network of nodes
● able to be integrated to
existing infrastructure
Open Data Node
● extraction, transformation and
enrichment of internal data
● storage of resulting Open Data
● publishing of stored Open Data
on the Web
● cataloging functionality
● management functions
Open Data Node
● publication plan
● preparation of publication
● realization of publication
● archiving
Open Data Node
● publication plan
● preparation of publication
● realization of publication
● archiving
Open Data Node
● publication plan
● preparation of publication
● realization of publication
● archiving
Open Data Node
● publication plan
● preparation of publication
● realization of publication
● archiving
Open Data Node
● publication plan
● preparation of publication
● realization of publication
● archiving
Integration with Open Data Node
data publication side: as implied by most common use-cases
● files: CSV, RDF
● API: REST API, SPARQL endpoint
Integration with Open Data Node
data harvesting side: as implied by most common use-cases
● files: XLS, „bad CSV“, ... - almost anything(*)
● API: SQL, SOAP, ... - almost anything(*)
● plus all the „Open Data files and APIs“
(*) given a prominence of a format/technology or particular interest of „customer“
Integration with Open Data Node
special cases:
● ODN/Management: integration of SSO with your existing infrastructure
● ODN/Storage: direct access to SPARQL endpoint
● ODN/InternalCatalog: direct access to management API
● etc.
HW and SW requirements
HW:
● CPU: common x86_64 compatible (dual/quad core is recommended)
● memory: minimum 4 GB (recommended 8 GB) (*)
● storage: minimum 40 GB (*)
SW:
● OS: Debian 7.6 „Wheezy“
● OpenJDK 7
(*) Subject to size of transformed data and requirements on transformation operations.
Future of ODN
Key point: Open Source
Future depends on many factors:
● strenght of communities
– around ODN itself
– around individual components: UnifiedViews, CKAN, PostgreSQL, etc.
● how well the business goes for commercial partners which use and
maintain ODN (EEA, etc.)
Future of ODN
Key point: Open Source
Future depends on many factors:
● strenght of communities
● how well the business goes for commercial partners which use and maintain ODN (EEA, etc.)
Existing achievements strenghtening the future:
● consortium around UnifiedViews: three companies and other organizations
● Slovak government as customer for ODN
● around 10 COMSODE Pilots in various EU countries
(so far, at various stages)
ODN implementation in Slovakia
in eDemokracia project, ODN is used as:
● centralized component
● de-centralized component
de-centralized component
centralized component
ODN implementation in Slovakia
ODN as part of centralized component:
● heavily customized
– only some modules used, commercial version of triplestore,
clustered RDBMS, etc.
● decomposed to multiple servers
● integrated with other components
– centralized SSO, OCR and content clasification services, etc.
● an “upgrade” for existing data portal
data.gov.sk
● incorporated as extension into top-level GOV portal
slovensko.sk
ODN implementation in Slovakia
ODN as de-centralized component:
● ODN with little customizations
– central catalog and storage preconfigured
– etc.
● distributed as „live DVD“
● for gov. organizations and
municipalities
Recommended