Upload
trinhque
View
219
Download
4
Embed Size (px)
Citation preview
Homer: a case study of federation among open data portals
Nives Alciato - CSI Piemonte [email protected]
•Regional law on Open Data •Guidelines for reuse •Adoption of a standard licence model • Creation of a working group • Diffusion to other Public Administrations • Reuse at national level • European Projects • Metadata catalogues • Data uploading platform • A portal as an access point for data and information
The initiative of Piedmont Region
Regional Law n. 24 dated 23/12/2011 •First regional law in Italy on Open Data
Basic principle: • Data belong to people Cornerstones of reusability of data: • Diffusion without restriction and in open
and standard digital formats • Use of standard legal tools Creative
Common Licences • Re-use and re-distribution of data is free
of charge
Legal framework
Organizational framework
Regional level An initiative whith ANCI Piemonte (association of municipalities): dati.piemonte.it is the infrastructure for all the regional territory (120 Municipalities and other bodies like ARPA Piemonte and Unioncamere) National level Re-use of the platform and joint project with Emilia Romagna Region and Milano Municipality European level HOMER project to transfer methodological / technical standards and increase circulation and re-use of public data OPENDAI project to improve a new architectural model to increase digital services and business opportunities
Technological framework: a permanent beta
Si riesce a trasformare queste scatole in una grafica più carina? Portal Search
DATA
Operational data bases of PAs
New platform
from Open data
to Data Services
and a Federated
Search Engine
• Harmonize policies and licenses for the re-use of data
• Federation of Open Data Portal
• Open data silos PA • Cloud architecture • Open data Services
HOMER is the acronym of
Harmonising Open data in the MEditerranean through better access and
Reuse of public sector information
www.homerproject.eu
It is a project within the MED Programme financed by the EU Commission
Implementation Starting date 01/04/2012
Implementation End date 31/03/2015
Who are the Homer’s Partners 13 Partners as territorial government and 6 Partners as technological support
Country Partner Mission Spain SARGA - Agencia de Gestion Agraria y Pesquera de Andalucia Territorial Gov. AGAPA - Sociedad Aragonesa de Gestión Agroambiental Territorial Gov. FUNDITEC – Foundation for Development, Innovation and Technology Technical Support France Région Provence-Alpes-Côte d'Azur, Territorial Gov. Région Corse Territorial Gov. AVITEM – Agency for sustainable Mediterranean cities and territories Technical Support FING – Fondation Internet Nouvelle Generation Technical Support Italy Piedmont Region Project Leader Sardinia, Emilia-Romagna and Veneto Regions Territorial Gov. CSI Piemonte Technical Support Slovenia Geodetic Institute Territorial Gov. Montenegro Mediterranean University of Montenegro Territorial Gov. Greek GFOSS – The Greek Free Open Source Software Society Technical Support Crete Decentralized Administration of Crete Territorial Gov. University of Crete Technical Support Cyprus Sewerage Board of Limassol – Amathus Territorial Gov. Malta Local Council Ass. of Malta Gozo Territorial Gov.
HOMER’s objectives a federation of Open Data portals among partners, sharing
common datasets related to MED strategic domains (agriculture, culture, energy, environment, tourism),
ensuring long sustainability and exploiting a huge number of harmonized and federated datasets, enhancing the e-participation and digital market opportunities of the MED
citizens
CSI Piemonte’s responsabilities in HOMER it is the developer of a Federation of Open Data Portals among
partners providing ICT and legal support and
it is the promoter of the reuse of the technological solutions underlying the portal, developed in the context of the
project
What we intend for federation of open data portals? “Federation” means the virtual system composed by a software
able to collect and retrieve the metadata of published data derived from the 5 categories (agriculture, culture, energy, environment, tourism) exposed and searched by Open Data
Partners Portals ‘
Look at this symbol: it represents the metadata catalogue
•Memorandum of Understanding
• Definition of a metadata common structure for federation • Use of EuroVoc • The cross lingual search • The federated search multi-language engine • The indexing scenario • The searching scenario
Design, methodology, and approach
Legal framework - Memorandum of Understanding Partners have been involved upon signing a Memorandum of Understanding where technological, organizational and legal boundaries have been defined as common understanding for everybody and referring to the Directive 2013/37/EU
It is indicated that all technological components of the solution for the Federation (Index, Semantic Search Engine, Translator) are provided and managed – under the conditions and the coordination of CSI Piemonte – that releases them on the basis of an open source philosophy
Data framework – the metadata structure
Each Open Data Portals share metadata common fields: this structure builds the Federated Index
title description url metadata source package_id topics language tags geographic bounding box refresh date creation date spatial scale resolution license id owner
Inspire
DCAT
CKAN
Dublin Core
Intersecting the Protocols and Directives in the schema, it has been identified the minimun common set of fields for the definition of a metadata structure and to federate, indipendent from the type of dataset geographical or alphanumerical
Data framework – the use of EuroVoc (1)
Homer, now, speaks 7 languages (spanish, french, italian, slovenian, serbian-montenegrin, greek and english) with 4 different alphabets and we must share a dictionary to communicate
title description url metadata source package_id topics language tags geographic bounding box refresh date creation date spatial scale resolution license id owner
iso code 639-1 to identify the language
Data framework – the use of EuroVoc (2)
EuroVoc is a multilingual, multidisciplinary thesaurus of the EU conformant to W3C recommendations and in it a specific concept of the 5 categories involved has the same classification and meaning in the domains and languages
title description url metadata source package_id topics language tags geographic bounding box refresh date creation date spatial scale resolution license id owner
iso code 639-1 to identify the language
Homer’s categories = EuroVoc domains
WATER νερό VODA вода AGUA EAU ACQUA
Each ODP inserts tags in the metadata cards in its own language without the burden of translation The same concept
is identified in all languages
The semantic search multi language engine needs a specific common structure to index and retrieve the metadata of all metadata catalogues of the Homer’s Partners’ Open Data Portals. The search engine is like a librarian who finds books only if the request form is filled out in a specific way Field_0
Data framework – The cross lingual search
The technological solution for indexing and searching among all the federated open data portals has 4 components:
1. Fed-Index Homer: the federated index file component containing the
complete list of metadata
2. Fed-Translator: the component that translates every tags of the datasets via EuroVoc
3. Fed-Searcher: the centralized semantic search engine component
4. Fed-Loader API: the loader that calls the API o Webservices exposed by each Open Data Portal to create the federated Index
Based on the open source project Apache Sorl Released open source on sourceforge
Technological framework: the federated search multi language engine
Technological framework: the indexing scenario (1)
The indexing process requires that each federated portal exposes the metadata cards of the data using 2 types of url
url1 that returns the list of the data id: Package List 1
url2 that returns the attributes for the single data: Package Dataset
2
It is a stand alone process scheduled, which could be nightly
Technological framework: the indexing scenario (2)
Scheduled
Eau
Voda
Agua
Water
Opendata Portals
Search Engine
Technological framework: the indexing scenario (3) 3 ways supported to expose the metadata: API CKAN compliant: Package List >url1 that returns a xml file1 with the list of the data id Package Dataset > url2 that returns a xml file2 with the attributes for the single data Web services dati.piemonte.it compliant: Package List >url1 that returns a xml file1 with the list of the data id
http://www.dati.piemonte.it/index.php?option=com_rd&view=pceli_list2&format=xml&layout=xml
Package Dataset > url2 that returns a xml file2 with the attributes for the single data
http://www.dati.piemonte.it/index.php?option=com_rd&view=pceli_item2&format=xml&layout=xml&itemid=1083
API Catalogue Service for the Web compliant: Package List > url1 that returns a csw file1 with the list of the data id Package Dataset > url2 that returns a csw file2 with the attributes for the single data
Field_0
Technological framework: the indexing scenario (4) 3 ways supported to expose the metadata:
API CKAN compliant
Web services dati.piemonte.it compliant
API CSW compliant
Technological framework: the searching scenario User
1 Search in lang of the portal
Open Data Portal (ODP)
Sear
ch E
ngin
e (S
E)
2 ODP call SE adding lang
3 ODP use EuroVoc and search in the index in all lang
3
5 The User chooses a data and goes on the corresponding portal portal
5
4 SE return a list of result
Results and ongoing activities The Federation in terms of: • shared knowledge, experiences and relationships among the Partners
• open hundreds of public datasets enhancing digital heritage
transparency and promoting open data culture across the Mediterranean
• looking for new stakeholders as it is possible to configure new categories and new languages
Nives Alciato – CSI Piemonte nives.alciato @csi.it
www.dati.piemonte.it www.homerproject.eu
Thank you !
Step 4- technical requirements API Web Services like ‘www.dati.piemonte.it’ An open data portal like dati.piemonte.it exposes 2 urls
url1 that returns a xml file1 with the list of the data id: Package List http://www.dati.piemonte.it/index.php?option=com_rd&view=pceli_list2&format=xml&layout=xml
<urlOggetti totale="434" baseUrl="http://www.dati.piemonte.it/index.php?option=com_rd&view=pceli_item2&format=xml&layout=xml&itemid=" data=""> <urlOggetto>1083</urlOggetto>
1
url2 that returns a xml file2 with the attributes for the single data Package Dataset http://www.dati.piemonte.it/index.php?option=com_rd&view=pceli_item2&format=xml&layout=xml&itemid=1083
<package> <package_id>1083</package_id> <url>http://www.dati.piem..</url> <title>DWUMA DW Utenti ..</title> <description> Base dati decisionale ... </description>
2
Step 4 - technical requirements API set interface like CKAN A Ckan compliant API expects 2 urls
url1 that returns a json file1 with the list of the data id: Package List http://data.gov.uk/api/rest/package
[ "human-resources-datasets", "veterinary-residues-data", ... ]
1
url2 that returns a json file2 with the attributes for the single data Package Dataset http://data.gov.uk/api/rest/package/human-resources-datasets
{ license_title: "", maintainer: null, maintainer_email: null, id: "00029d8d-1be7-4435-9ef8", metadata_created: "2013-08-30", relationships: [ ], ...
2
Step 4 - technical requirements Catalogue Services for the Web (CSW) A Geoportals exposing metadata with 2 methods of CSW protocols:
url1 that returns a csw file1 with the list of the data id: Package List http://webgis.arpa.piemonte.it/geoportalserver_arpa/csw?REQUEST=GetRecords
1
url2 that returns a csw file2 with the attributes for the single data Package Dataset http://webgis.arpa.piemonte.it/geoportalserver_arpa/csw?request=GetRecordById&service=CSW&version=2.0.2&id=ARLPA_TO_16.08.01-D_2011-11-03-9:58
<csw:GetRecordByIdResponse> <gmd:MD_Metadata xsi:schemaLocat <gmd:fileIdentifier> <gco:CharacterString> ARLPA_TO_16.08.01-D_2011-11-03-9:58 </gco:CharacterString> </gmd:fileIdentifier> <gmd:language> ...
2
<csw:GetRecordsResponse> <csw:SearchStatus timestamp="201 <csw:SearchResults ... <gmd:MD_Metadata> <gmd:fileIdentifier> <gco:CharacterString> ARLPA_TO_16.08.01-D_2011-11-03-9:58