Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Sm
Deli Keyw
Harmart
iverabl
words: da
Li
rmontOp
e D3.5
ata harm
nked
nisaenDite
:: Publ
monisatio
d Opeprot
This p
deve
ationData erati
lic
on, ORM,
en Datectio
project has receSeventh
elopment and d
n of moon
RDF, RD
ata foon in
eived funding froh Programme foemonstration un
dataodel.
DFS, Linke
or envSma
om the Europeaor research, technder grant agre
a to. Fin
ed Data
vironrt Re
an Union’s hnological ement No
603824.
o nal
nmenegions
t s
D3.5 Final Data Harmonisation SmartOpenData project (Grant no.: 603824)
Version 1.0 Page 2 of 78 © SmartOpenData Consortium 2015
TableofContents
1 Introduction ............................................................................................................................ 8
2 Data Harmonisation .............................................................................................................. 11
2.1 CSV‐to‐RDF ..................................................................................................................... 11
2.1.1 Italian pilot .............................................................................................................. 12
2.1.2 Portuguese‐Spanish Pilot ........................................................................................ 20
2.1.3 Irish pilot ................................................................................................................. 27
2.1.4 Transforming Data with Grafterizer and the Jarfter Service .................................. 33
2.2 XML (GML) ‐TO‐RDF transformations ............................................................................ 41
2.2.1 Slovak pilot .............................................................................................................. 41
2.3 Relational DB‐to‐RDF transformations .......................................................................... 55
2.3.1 Czech pilot ............................................................................................................... 55
3 Harmonising Observations and Measurements ................................................................... 61
3.1 RDF Data Cube: Example ................................................................................................ 61
3.1.1 Data Cube Components .......................................................................................... 61
3.1.2 Data Cube Datasets ................................................................................................. 64
3.1.3 Data Cube Structures .............................................................................................. 64
4 Conclusion ............................................................................................................................. 66
5 References ............................................................................................................................ 69
Annex A: Generating RDF with OpenRefine: Challenges and Solutions .................................. 70
Language Tag Customisation ........................................................................................... 70
RDF out of a List of Values ............................................................................................... 72
More than one Root Nodes ............................................................................................. 72
Annex B: Portuguese‐Spanish pilot: ORM and RDF Models .................................................... 74
Chemical Characteristics ...................................................................................................... 74
Climatology .......................................................................................................................... 75
Forestry Tile ......................................................................................................................... 76
Geometry ............................................................................................................................. 76
Work Unit Ecosystem ........................................................................................................... 77
Work Unit Location .............................................................................................................. 78
D3.5 Final Data Harmonisation SmartOpenData project (Grant no.: 603824)
Version 1.0 Page 3 of 78 © SmartOpenData Consortium 2015
ListofFigures
Figure 1: Workflow of OpenRefine‐based data harmonisation .............................................. 11 Figure 2: RDF model of Protected Sites ................................................................................... 13 Figure 3: RDF model of Monitoring Stations ........................................................................... 14 Figure 4: RDF model of Hazardous Substances ....................................................................... 14 Figure 5: Portuguese‐Spanish Pilot, data harmonisation methodology .................................. 21 Figure 6: Original Sample ARPA Data ....................................................................................... 33 Figure 7: RDF mapping for ARPA data ..................................................................................... 34 Figure 8: Generated RDF graph for ARPA data ........................................................................ 35 Figure 9: User interface for Jarfter .......................................................................................... 35 Figure 10: Jarfter compiler services ......................................................................................... 36 Figure 11: Jarfter transformation web service ........................................................................ 37 Figure 12: Dynamic deployment of data transformations ...................................................... 38 Figure 13: CloudML deployment template .............................................................................. 38 Figure 14: List of updated GeoKnow XSLT stylesheets ............................................................ 47 Figure 15: Landing page for Unified Views ............................................................................. 48 Figure 16: List of created pipelines .......................................................................................... 48 Figure 17: Section with DPU templates ................................................................................... 49 Figure 18: Pipelines execution monitor ................................................................................... 49 Figure 19: Scheduler with the possibility to define the schedules for pipelines execution .... 50 Figure 20: Section with additional settings ............................................................................. 50 Figure 21: Example of pipeline details ..................................................................................... 50 Figure 22: Example of further DPU settings ............................................................................ 51 Figure 23: Example of the interlinking pipeline ....................................................................... 52 Figure 24: CKAN interface with the list of metadata for the open linked data from Slovak pilot .......................................................................................................................................... 53 Figure 25: Parliament web application interface .................................................................... 53 Figure 26: Czech pilot Data model ........................................................................................... 56 Figure 27: RDF plugin of OpenRefine, language tag ................................................................ 70 Figure 28: Excerpt from the aux_040400_municipality.csv .................................................... 71 Figure 29: RDF plugin of OpenRefine, literal node customisation .......................................... 71 Figure 30: Excerpt from ObservationTiles.csv file ................................................................... 72 Figure 31: Excerpt from ObservationTiles.csv file ................................................................... 73 Figure 32: Chemical Characteristics: ORM Model ................................................................... 74 Figure 33: Chemical Characteristics: RDF model ..................................................................... 74 Figure 34: Climatology: ORM Model ........................................................................................ 75 Figure 35: Climatology: RDF Model ......................................................................................... 75 Figure 36: Forestry Tile: ORM Model ....................................................................................... 76 Figure 37: Forestry Tile RDF: Model ........................................................................................ 76 Figure 38: Geometry: ORM Model .......................................................................................... 76 Figure 39: Geometry: RDF Model ............................................................................................ 77 Figure 40: Work Unit Ecosystem: ORM Model ........................................................................ 77 Figure 41: Work Unit Ecosystem: RDF Model .......................................................................... 77 Figure 42: Work Unit Location: ORM Model ........................................................................... 78 Figure 43: Work Unit Location RDF model .............................................................................. 78
D3.5 Final Data Harmonisation SmartOpenData project (Grant no.: 603824)
Version 1.0 Page 4 of 78 © SmartOpenData Consortium 2015
ListofTables
Table 1: Data transformation approaches ................................................................................. 9 Table 2: Italian Pilot: summary of classes ............................................................................... 15 Table 3: Italian Pilot: summary of data harmonisation .......................................................... 20 Table 4: Portuguese‐Spanish Pilot: ORM constructs mapped to classes ............................... 23 Table 5: Portuguese‐Spanish Pilot: ORM constructs mapped to properties .......................... 24 Table 6: An overview of the datasets and vocabularies used in SK Pilot ............................... 44 Table 7: List of phases and tasks extracted and deployed from the COMSODE methodology for Open Data publishing ......................................................................................................... 45 Table 8: Vocabulary usage by pilot ......................................................................................... 67
D3.5 Final Data Harmonisation SmartOpenData project (Grant no.: 603824)
Version 1.0 Page 5 of 78 © SmartOpenData Consortium 2015
Document Metadata
Contractual Date of Delivery to the EC: August 2015
Actual Date of Delivery to the EC: October 7th 2015
Editor(s): Tatiana Tarasova, SpazioDati
Contributor(s): Martin Tuchyňa (SAŽP), Jindřich Mynarz (SAŽP), Peter Mozolík (SAŽP), Dumitru Roman (SINTEF), Nikolay Nikolov (SINTEF), Antoine Pultier (SINTEF), Dina Sukhobok (SINTEF), Håvard H. Holm (SINTEF), Jan Bojko (UHUL FMI), John O’Flaherty (MAC), Gregorio Urquía (TRAGSA), Jesús Estrada (TRAGSA)
Document History
Version Version date Responsible Description
0.0 20/07/2015 SpazioDati Outline and call for contributions
0.1 30/07/2015 UHUL FMI, HSRS Czech pilot contributions
0.2 15/08/2015 SAŽP Slovak pilot contributions
0.3 21/08/2015 SpazioDati, TRAGSA contribution to data harmonisation of the Italian and Portuguese‐Spanish pilots
0.4 21/08/2015 SpazioDati restructuring the report
0.5 24/08/2015 SINTEF contribution on Grafterizer and comparison of Grafterizer with OpenRefine
0.6 27/08/2015 UHIL FMI, SAŽP, SINTEF Sections 2.2, 2.3 on the Slovak and Czech pilots finalised Section 2.1.4 about Grafterizer completed
0.7 28/08/2015 SpazioDati Final version of the report with missing contribution from the Irish Pilot; submitted to the project coordinator.
1.0 7/10/2015 TRAGSA Editorial review
D3.5 Final Data Harmonisation SmartOpenData project (Grant no.: 603824)
Version 1.0 Page 6 of 78 © SmartOpenData Consortium 2015
The information and views set out in this publication are those of the author(s) and do not necessarily reflect the official opinion of the European Communities. Neither the European Union institutions and bodies nor any person acting on their behalf may be held responsible for the use which may be made of the information contained therein. Copyright © 2015, SmartOpenData Consortium.
D3.5 Final Data Harmonisation SmartOpenData project (Grant no.: 603824)
Version 1.0 Page 7 of 78 © SmartOpenData Consortium 2015
Executive Summary
Task 3.5 is dedicated to harmonising pilots data to the Final SmOD model delivered in D3.4 [SMODD34]. The model is based on several INSPIRE topics and provides a basis for geospatial and environmental data interoperability. Being such, the model does not cover domain‐specific concepts of the pilots. Hence, initial activities of the data harmonisation task included evaluation of the SmOD model in the context of the pilots. Whenever the model was not sufficient to represent the domain of interest, a search for the existing commonly accepted or standard vocabularies was performed, and if no suitable vocabulary was found, custom terms were developed. These custom terms constituted one of the main outcomes of the current task, the custom SmOD vocabulary. The vocabulary is published at http://www.w3.org/2015/03/inspire/smod#. Operational aspects of the data harmonisation task concern data transformations from input data structures to RDF. 3 different approaches were identified based on the pilots’ requirements:
● CSV‐to‐RDF (Spanish‐Portuguese, Italian and Irish pilots) ● XML‐to‐RDF (Slovak) ● RDBMS‐to‐RDF (Czech pilot)
This document explains the approaches, discusses tools and technologies being used to realize them and summarizes the results of the data harmonisation task per pilot.
D3.5 Final Data Harmonisation SmartOpenData project (Grant no.: 603824)
Version 1.0 Page 8 of 78 © SmartOpenData Consortium 2015
1 Introduction Final SmartOpenData model has been delivered with D3.4 [SMODD34]. It is based on the INSPIRE themes being selected specifically to cover domains of the pilots:
● The Generic Concept Model ● Protected Sites ● Land Use ● Administrative Units ● Bio‐Geographical Units ● Species Distribution ● Corine Land Cover ● Environmental Monitoring Facilities ● Cadastral Parcels1
The model serves as a basis for harmonising data in the SmOD pilots. It defines basic concepts that are shared among the pilots, such as Protected Site or Cadastral Parcel. However, every pilot in addition to these basic concepts contains those specific to the domain of the pilot, which are not covered by the model. As a result, every pilot had to extend the model with domain specific terms. These terms were searched in the existing resources, such as the Linked Open Vocabularies repository2, schema.org and DBpedia OWL ontology. Whenever existing resources were not sufficient for the pilot’s needs, custom terms were introduced. We accumulated these custom terms in the SmOD Custom Vocabulary published at http://www.w3.org/2015/03/inspire/smod#. The rest of the document is structured as follows. We split Section 2 into three blocks each of which corresponds to a different data transformation approach:
Section Approach Tools, Technologies
Section 2.1 CSV‐to‐RDF OpenRefine3, RDF plugin for OpenRefine4, Fusepool BatchRefine API5
Section 2.2 XML‐to‐RDF XSLT (based on customised GeoKnow stylesheets6)
1 Cadastral Parcel theme has been added to the SmOD model after D3.4 had been finalised. 2 http://lov.okfn.org/dataset/lov/ 3 http://openrefine.org/ 4 http://refine.deri.ie/ 5 https://github.com/fusepoolP3/p3‐batchrefine 6 https://web.imis.athena‐innovation.gr/redmine/projects/geoknow_public/wiki/Inspire2RDF
D3.5 Final Data Harmonisation SmartOpenData project (Grant no.: 603824)
Version 1.0 Page 9 of 78 © SmartOpenData Consortium 2015
and SmOD INSPIRE Vocabularies7 using OpenDataNode8
Section 2.3 Relational DB‐to‐RDF D2RQ, r2rml parser
Table 1: Data transformation approaches
CSV‐to‐RDF approach has been discussed in detail in D3.3 [SMODD33]. We have included a tutorial on using the RDF plugin of OpenRefine for mapping CSV files into RDF and discussed preliminary results of transforming Italian and Portuguese‐Spanish data into RDF. In this document we present the results of data harmonisation of the Italian pilot in Section 2.1.1, and the results of the Portuguese‐Spanish data harmonisation in Section 2.1.2. We discuss input datasets, models of the pilots and the vocabularies used to encode data in RDF. The latter include the SmOD model, vocabularies developed by third parties and the custom SmOD vocabulary. We conclude discussion of the CSV‐to‐RDF approach by presenting Grafterizer, the tool that performs transformations on tabular data. Section 2.1.4 contains a demonstration of how to use the tool on the example of the data from the Italian pilot and a comparison of the Grafterizer features with the RDF plugin of OpenRefine. XML‐to‐RDF has also been introduced in the previous deliverable D3.3. In the current report in Section 2.2.1 we discuss customisation of the GeoKnow XSL transformations to use the SmOD model as the target schema with the support of the Open Data Node platform and elements of the COMSODE methodology framework9 in the settings of the Slovak pilot. In Section 2.3 we explain the Relational‐to‐RDF approach followed in the Czech pilot. Section 3 discusses application of the RDF Data Cube vocabulary to harmonise environmental observations and measurements. Section 4 concludes the report.
7 http://www.w3.org/2015/03/inspire/ 8 http://opendatanode.org/ 9 http://www.comsode.eu/wp‐content/uploads/D5.1‐Methodology_for_publishing_datasets_as_open_data.pdf
D3.5 Final Data Harmonisation SmartOpenData project (Grant no.: 603824)
Version 1.0 Page 10 of 78 © SmartOpenData Consortium 2015
Namespaces used in the report:
Schema Prefix Namespace
SmOD Protected Sites ps http://www.w3.org/2015/03/inspire/ps#
SmOD Administrative Units au http://www.w3.org/2015/03/inspire/au#
SmOD Environmental Monitoring Facility
ef http://www.w3.org/2015/03/inspire/ef#
SmOD Custom Vocabulary smod http://www.w3.org/2015/03/inspire/smod#
SmOD Cadastral Parcels Vocabulary cp http://www.w3.org/2015/03/inspire/cp#
SKOS skos http://www.w3.org/2004/02/skos/core#
Friend of a Friend foaf http://xmlns.com/foaf/0.1/
DC Terms dcterms http://purl.org/dc/terms/
GeoSPARQL gsp http://www.opengis.net/ont/geosparql#
DBpedia Ontology dbpedia‐owl
http://www.w3.org/2002/07/owl#
RDF Data Cube Vocabulary qb http://purl.org/linked‐data/cube#
Time Ontology time http://www.w3.org/2006/time#
QUDT Units qudt‐unit http://qudt.org/1.1/vocab/unit#
QUDT Schema qudt http://qudt.org/schema/qudt#
RDF Schema rdfs http://www.w3.org/2000/01/rdf‐schema#
Corine Land Cover Nomenclature in SKOS
clc http://www.w3.org/2015/03/corine#
Asset Description Metadata Schema (ADMS)
adms http://www.w3.org/ns/adms#
RAMON schema ramon http://rdfdata.eionet.europa.eu/ramon/ontology/
D3.5 Fina
Version 1
2 Da
2.1 CSOpenReand Poinput cseveral Spanishpublic bpublic d We preFigure 1
Data PrIn both them tothis, unapplied Mappin
al Data Harmo
1.0
ata Ha
SV‐to‐RDefine togethrtuguese‐Spconditions: independe
h pilot aggrbodies. Inpudatabases, a
esented the1 illustrates
re‐processinpilots thero RDF. In canlike in som to input da
ngs Creation
onisation
rmoni
DF her with thpanish piloinput data
ent data souregates inpuut datasets and it is pla
e workflow s the proces
Figure 1: W
ng e was a neease of the Itme cases of atasets befo
n
Page 11 o
sation
e RDF plugits into RDFsets ‐ CSV urces and adut data froof the Italinned to inc
of OpenRefsses and too
Workflow of O
ed to prepatalian pilot,the Portugore loading
of 78
n
in were selF. Main mo files or Xdded to them multipleian pilot at clude more
fine‐based ols involved
OpenRefine‐ba
are input da, functionaluese‐Spanisthem to Op
SmartOp
© S
ected to motivation foLS spreadse pilot in th sources ofthe momedata from o
data harmo in the wor
ased data har
atasets befoities of Opesh pilot, whpenRefine.
penData proje
SmartOpenDa
ap and conor this choiheets ‐ wee course off the Spanisnt include dother data s
onisation inkflow.
rmonisation
ore mappingenRefine wehere ad‐hoc
ect (Grant no.
ata Consortium
nvert data oice was theere extractef work. Portsh and Pordata from dsources.
n D3.3 [SMO
g and transere sufficiec bash scrip
: 603824)
m 2015
of Italian e pilots’ ed from tuguese‐rtuguese different
ODD33].
sforming nt to do pts were
D3.5 Final Data Harmonisation SmartOpenData project (Grant no.: 603824)
Version 1.0 Page 12 of 78 © SmartOpenData Consortium 2015
RDF mappings were created using the GUI of the RDF plugin for OpenRefine. All the RDF mappings are available in the corresponding projects of the OpenRefine instance, which was deployed for the project at https://smod‐refine.spaziodati.eu/. To access the instance, use the following username/password as credentials:
smod/EnterSmartOpenData
Using OpenRefine for transforming pilots data posed several challenges. We discuss them and present our solutions in Annex A. Data Transformation OpenRefine was designed primarily as a personal desktop application, and is meant to be used in an interactive mode. Within the scope of another EU FP7 project, Fusepool10, a batch version of OpenRefine was developed. APIs of the BatchRefine11 transformer enable programmatic access to the OpenRefine engine, which makes it possible to incorporate BatchRefine into an automatic Extract‐Transform‐Load procedure. In D3.3, Section 5.1 “BatchRefine Example using cURL” we demonstrated the usage of BatchRefine API. At the current stage of the pilots we performed all the transformations using the export “RDF as RDF/XML” functionality of OpenRefine. In the rest of this section we discuss in detail the data harmonisation processes held in the Italian pilot (Section 2.1.1) and Portuguese‐Spanish pilot (Section 2.1.2). We introduce input datasets, discuss RDF modeling and vocabularies used in order to generate RDF representation of the pilots’ data. In Annex A we report on our experience from using the RDF plugin of OpenRefine. We describe several cases, in which RDF generation task was not trivial, and present our solutions.
2.1.1 Italian pilot The Italian pilot is led by ARPA, the Environmental Protection Agency of the Sicilian Region. Following the pilot’s objectives, ARPA identified several user queries that underlie the baseline use case scenario of the pilot12. These queries guided the process of selecting input datasets, as well as the process of creating RDF models of them. In the current document we present one of these queries which, at the moment of writing this report, was fully implemented:
● Which rivers and lakes (upstream, within or crossing, and downstream) are linked to the environment of a Protected Site?
10 http://fusepoolp3.github.io/ 11 https://github.com/fusepoolP3/p3‐batchrefine 12 Refer to [SMODD52] for more information about the pilot’s objectives
D3.5 Fina
Version 1
Input DNatura2Natura2accumupubliclywith CS2014, winclusivThe daNATURAWaterbEEA Wstationsdatabaslakes awater. A
RDF MoFigures Substanexamplusing th
13 http:/14 http://15 http://16 http://
al Data Harmo
1.0
atasets 2000 datab2000 databulates infory available fSV files. Forwhich reflecve. tabase conA2000SITESbase ‐ Lakesaterbase ‐ s of lakes ases data rend rivers aARPA added
odelling 2‐4 illustra
nces correses of their he RDF Data
//www.eea.eu/www.eea.eu/www.eea.eu/www.eea.eu
onisation
base base13 is mrmation abfor downloar the pilot cts the situ
nsists of mS table whics, WaterbasLakes15 an
and rivers aelevant for and measud geograph
ate RDF mpondingly. Tinstances.
a Cube fram
uropa.eu/dataropa.eu/ ropa.eu/dataropa.eu/data
Page 13 o
aintained bout protecading in thewe used thuation of th
multiple tabch lists and dse ‐ Riversnd Rivers16
and measurthe pilot, ired by theical coordin
odels of PrTable 2 belMeasurem
mework (in S
Figure 2: RDF
a‐and‐maps/d
‐and‐maps/da‐and‐maps/da
of 78
by the Eurocted sites fe form of a he fifth relehe protecte
bles. For thdescribes p
databasesrements ofncluding daem concentnates to som
rotected Sitow summa
ments of haSection 3 w
F model of Pr
data/natura‐5
ata/waterbasata/waterbas
SmartOp
© S
opean Envifrom all EUMS Access ease of theed sites in
he pilot’s nprotected ar
contain inwater quaata about mtrations of me stations
tes, Monitoarises the clzardous sue discuss it
rotected Sites
e‐lakes‐10 e‐rivers‐10
penData proje
SmartOpenDa
ronmental U membersdatabase ddatabase, the Europe
needs we wreas.
nformation lity. ARPA emonitoring hazardous that were m
oring Statioasses of thebstances win detail).
s
ect (Grant no.
ata Consortium
Agency (Es. The datadump or anpublished ean Union
were intere
about moextracted fstations insubstancesmissing the
ons and Hae models awe encoded
: 603824)
m 2015
EA)14. It abase is n archive on June in 2013
ested in
onitoring rom the n Sicilian s in the m.
azardous nd gives d in RDF
D3.5 Fina
Version 1
c
ps:Prote
al Data Harmo
1.0
class
classes
tectedSit
onisation
Fi
Fig
description
of Protected Site
protected sites instances
Page 14 o
gure 3: RDF m
ure 4: RDF mo
s, baseURI = <ht
baseURISITECOD
of 78
model of Mon
odel of Hazar
URI constructio
ttp://data.s
I/so/ProtectDE>
SmartOp
© S
nitoring Statio
rdous Substan
on
smartopendat
tedSite/< </I
penData proje
SmartOpenDa
ons
nces
U
a.eu/Natura2
<http://data/Natura2000/IT3110002>
ect (Grant no.
ata Consortium
URI example
2000/>
a.smartopend/so/Protecte
: 603824)
m 2015
data.euedSite/
D3.5 Final Data Harmonisation SmartOpenData project (Grant no.: 603824)
Version 1.0 Page 15 of 78 © SmartOpenData Consortium 2015
foaf:Document instances of legal foundation documents
baseURI/Document/<SITECODE>
<http://data.smartopendata.eu/Natura2000/Document/IT3110002>
gsp:Geometry geometries of protected sites
baseURI/Geometry/<SITECODE>
<http://data.smartopendata.eu/Natura2000/Geometry/IT3110002>
au:AdministrativeUnit
administrative units of protected sites
baseURI/so/AdministrativeUnit/IT
<http://data.smartopendata.eu/Natura2000/so/AdministrativeUnit/IT>
classes of Monitoring Stations,baseURI of the Waterbase Lakes dataset = <http://data.smartopendata.eu/WaterbaseLakes/> baseURI of the Waterbase Rivers dataset = <http://data.smartopendata.eu/WaterbaseRivers/>
ef:EnvironmentalMonitoringFacility
instances of lakes and rivers monitoring stations
baseURI/so/Station/<NationalStationID>
<http://data.smartopendata.eu/WaterbaseLakes/so/Station/IT19LW09318>
gsp:Geometry geometries of stations
baseURI/Geometry/<NationalStationID>
<http://data.smartopendata.eu/WaterbaseLakes/Geometry/IT19LW09318>
au:AdministrativeUnit
administrative units of the stations
baseURI/so/AdministrativeUnit/<CountryCode>
<http://data.smartopendata.eu/WaterbaseLakes/so/AdministrativeUnit/IT>
classes of Hazardous Substances,baseURI of the Waterbase Lakes dataset = <http://data.smartopendata.eu/WaterbaseLakes/> baseURI of the Waterbase Rivers dataset = <http://data.smartopendata.eu/WaterbaseRivers/>
qb:Observation instances of measurements of hazardous substances
baseURI/HazardousSubstances/Observation/<rowIndex>
<http://data.smartopendata.eu/WaterbaseRivers/HazardousSubstances/Observation/0>
qb:DataSet instances of the input datasets with hazardous substances
- <http://data.smartopendata.eu/WaterbaseRivers/HazardousSubstances/Dataset/>
smod:Determinand
chemical compounds (determinands) defined in Water Framework Directive
http://data.smartopendata.eu/WFD/Determinand/<CASNumber>
<http://data.smartopendata.eu/WFD/Determinand/71-55-6>
time:Interval time period, year, for which the values of the measurements were aggregated
http://reference.data.gov.uk/id/gregorian-interval/<Year>+”-01-01T00:00:00/P1Y”
<http://reference.data.gov.uk/id/gregorian-interval/2013-01-01T00:00:00/P1Y>
qudt:Unit units of measurements
http://data.smartopendata.eu/WFD/UnitOfMeasure/<unit_id>
<http://data.smartopendata.eu/WFD/UnitOfMeasure/9>
Table 2: Italian Pilot: summary of classes
D3.5 Final Data Harmonisation SmartOpenData project (Grant no.: 603824)
Version 1.0 Page 16 of 78 © SmartOpenData Consortium 2015
External Vocabularies In this section we summarise external vocabularies used in the pilot. We focus on the vocabularies which are not included in the final SmOD model. DC Terms and DBPedia OWL for Administrative Units of Protected Sites Protected sites (see Fig. 2) are linked to administrative units they belong to via the property dcterms:coverage. Administrative Units are described in NATURA2000 through the Nomenclature of Territorial Units for Statistics (NUTS) country code, a two‐letter code referencing the country, e.g., “IT” for Italy17. The SmOD model suggests using au:country and take values from the Metadata Registry (MDR). We constructed the MDR URIs for the Sicilian sites. For example, below is an excerpt describing one of the sites:
<http://data.smartopendata.eu/Natura2000/so/ProtectedSite/ITA070005> a ps:ProtectedSite . <http://data.smartopendata.eu/Natura2000/so/ProtectedSite/ITA070005> dcterms:coverage <http://data.smartopendata.eu/Natura2000/so/AdministrativeUnit/IT> . <http://data.smartopendata.eu/Natura2000/so/AdministrativeUnit/IT> a au:AdministrativeUnit ; au:nationalLevel <http://inspire.ec.europa.eu/codelist/AdministrativeHierarchyLevel/1stOrder/> ; au:country <http://publications.europa.eu/resource/authority/country/ITA> .
In addition to this definition, we kept textual representation of the country codes, using the DBPedia ontology property dbpedia-owl:nutsCode, as shown in the listing below:
<http://data.smartopendata.eu/Natura2000/so/AdministrativeUnit/IT> dbpedia-owl:nutsCode "IT" .
This was done mainly for the fact that the MDR URIs are currently not resolvable, hence, technically we could not obtain description of the countries by these URIs. Moreover, inspection of the SKOS description of the URI of Italy18 revealed that there is no mapping from the MDR country codes to the NUTS codes, which would be useful to have in the pilot’s case.
17 NUTS codes are identical to the ISO 3166‐1 alpha‐2 code, while MDR makes use of the ISO 3166‐3 codes http://www.iso.org/iso/home/standards/country_codes.htm 18 SKOS document describing all countries can be downloaded from http://publications.europa.eu/mdr/resource/authority/country/skos/countries‐skos.rdf
D3.5 Final Data Harmonisation SmartOpenData project (Grant no.: 603824)
Version 1.0 Page 17 of 78 © SmartOpenData Consortium 2015
Custom terms In addition to the external vocabularies developed by third parties, we introduced new terms that were implemented in the custom SmOD vocabulary http://www.w3.org/2015/03/inspire/smod#.
Term rdfs:comment
smod:areaHa This property specifies the area of the Protected Site in Ha.
smod:lengthKm This property specifies the length of the Protected Site in km.
smod:ecologicalQuality This property provides description of the Protected Site in terms of ecological quality.
smod:catchmentName This property specifies the name of major catchment or basin.
smod:featureName This property specifies the name of the feature of interest being monitored by the Environmental Facility.
smod:Determinand This class represents the class of nutrients, organic matter, hazardous substances and other chemical determinands reported in the Waterbase data of the European Environmental Agency.
Data Pre-Processing The RDF models presented in the section above illustrate also how values of certain properties were populated with physical data. For example, in the RDF model of Protected Sites, the value of rdfs:label is populated with the value of the column <SITENAME>. In several cases, population of the properties’ values was not straightforward, and additional pre‐processing steps were required. In this section we discuss some typical examples of them. Implementing domain logics It is a typical situation, when a property values is populated from more than one columns of the input dataset, following some domain logic. For example, in case with protected sites, the value of ps:legalFoundationDate was populated from three columns <DATE_SAC>, <DATE_CONF_SCI> and <DATE_SPA>. <DATE_CONF_SCI> and <DATE_SPA> are the dates when a site was designated as Site of Community Importance (SCI) and Special Protection Areas (SPA) correspondingly. Site designation is found in the column <SITETYPE> and may contain of the three values:
● “A”: the site was designated as SPA ● “B”: the site was designated as SCI ● “C”: the site was designated as both SCI, and SPA
In addition, European Commission can assign the status of Special Area of Conservation (SAC) to each site. If this happens, the column <DATE_SAC> is populated. Following consideration from the domain experts of ARPA, a rule was implemented in OpenRefine, in order to take value for ps:legalFoundationDate from <DATE_SAC>
D3.5 Final Data Harmonisation SmartOpenData project (Grant no.: 603824)
Version 1.0 Page 18 of 78 © SmartOpenData Consortium 2015
whenever it is present, and the latest date from the <DATE_CONF_SCI> or <DATE_SPA>, otherwise. Cleaning/Formatting Values Very typical examples of data preparation are data formatting and data cleaning. For example, the date values of ps:legalFoundationDate were converted from the date‐ time format to date, using simple OpenRefine rule for that. Sampling Input Dataset It is often the case that we want to generate RDF out of a subset of the input dataset. With OpenRefine it can be done in numerous ways. For example, NATURA2000 table contains protected sites from all the countries of European Union; however, for the pilot we were interested only in the Sicilian sites. To reduce the input datasets, a text facet was created in OpenRefine on the column <SITECODE>, that outputs “1” in case <SITECODE> contains value of one of the Sicilian sites (these values were provided by ARPA), and “0” otherwise. Joining Datasets Another interesting example of exploiting OpenRefine functionalities for data preparation refers to joining one dataset with another, in order to retrieve more data. For example, the dataset with hazardous substances contains units of measurements (UoM) in the column <Unit_HazSubs>. The values of the column are names of hazardous substances, such as “μg/l”, and as the target RDF model of hazardous substances suggests that the values of sdmx-attribute:unitMeasure must be URIs. The URI of “μg/l” is <http://data.smartopendata.eu/WFD/UnitOfMeasure/9>, in which “9” is an index row of “μg/l” in a dataset of UoMs19 that resides in another OpenRefine project20. Hence, in order to generate the same UoMs URIs in the project with hazardous substances, we need to join this dataset with the dataset of UoM21 and retrieve row indexes of the latter. And this kind of joins is also supported by OpenRefine22.
RDF Generation The size of the complete RDF dataset (including data structure definitions and concept scheme) of the Italian pilot is 2.1M; 14.098 triples in total:
● 223 instances of ps:ProtectedSite
19 http://dd.eionet.europa.eu/dataelements/48239 20 The project called “ARPA‐haz‐substances‐UoM” is available at https://smod‐refine.spaziodati.eu/ 21 The join is done by the UoM name that is found in the column <Unit_HazSubs> of the source dataset and <Value> in the target 22 See here the documentation of the join rule https://github.com/OpenRefine/OpenRefine/wiki/GREL‐Other‐Functions#crosscell‐c‐string‐projectname‐string‐columnname
D3.5 Final Data Harmonisation SmartOpenData project (Grant no.: 603824)
Version 1.0 Page 19 of 78 © SmartOpenData Consortium 2015
● 38 instances of ef:EnvironmentalMonitoringFacility, 14 of which are lakes monitoring stations, and 24 are rivers stations
● 906 instances of qb:Observation, 205 of which are measurements of hazardous substances in rivers, and 701 of which are in lakes
Table 3 summarises the results of data harmonisation in the Italian pilot. RDF mappings are available at the OpenRefine projects of each input dataset. Resulting RDF can be downloaded from the given links, alternatively, the data can be queried via the SPARQL endpoint http://smodlumii.sungis.lv/sparql
Input Datasets OpenRefine project’s name on https://smod‐refine.spaziodati.eu
RDF,SPARQL endpoint: http://smodlumii.sungis.lv/sparql
Natura2000 database ‐ http://www.eea.europa.eu/data‐and‐maps/data/natura‐5, table NATURA2000SITES
“ARPA‐NATURA2000SITES‐PLUS”
● RDF dump23
● graph: <http://data.smartopendata.eu/natura2000/sicily>
EEA Waterbase ‐ Lakes ‐ http://www.eea.europa.eu/data‐and‐maps/data/waterbase‐lakes‐10, ARPA extraction (enriched with coordinates)24, sheets “StationsLakes” and “HazSubstLakes_Agg”
“ARPA‐Lakes_dati2013_caricati2014”
● RDF dump25
● graph: <http://data.smartopendata.eu/waterbase‐lakes/stations/sicily>
“ARPA‐Lakes_dati2013_caricati2014‐HazSubs”
● RDF dump26
● graph: <http://data.smartopendata.eu/waterbase‐lakes/haz‐substances/sicily>
EEA Waterbase ‐ Rivers ‐http://www.eea.europa.eu/data‐and‐maps/data/waterbase‐rivers‐10, ARPA extraction27, sheets “StationsRivers” and “HazSubstRivers_Agg”
“ARPA‐Rivers_dati2013_caricati2014”
● RDF dump28
● graph: <http://data.smartopendata.eu/waterbase‐rivers/stations/sicily>
“ARPA‐Rivers_dati2013_caricati2014‐HazSubs”
● RDF dump29
● graph: <http://data.smartopendata.eu/waterbase‐rivers/haz‐substances/sicily>
23 https://s3‐eu‐west‐1.amazonaws.com/smod‐repo/arpa‐release2/ARPA‐NATURA2000SITES‐PLUS.rdf.zip 24 https://s3‐eu‐west‐1.amazonaws.com/smod‐repo/arpa‐release2/Lakes_dati2013_caricati2014+Rev1.xlsx.zip 25 https://s3‐eu‐west‐1.amazonaws.com/smod‐repo/arpa‐release2/Lakes_dati2013_caricati2014.rdf.zip 26 https://s3‐eu‐west‐1.amazonaws.com/smod‐repo/arpa‐release2/Lakes_dati2013_caricati2014‐HazSubs.rdf.zip 27 https://s3‐eu‐west‐1.amazonaws.com/smod‐repo/arpa‐release2/Rivers_2013_19_12_2014_Rev1.xlsx.zip 28 https://s3‐eu‐west‐1.amazonaws.com/smod‐repo/arpa‐release2/Lakes_dati2013_caricati2014.rdf.zip 29 https://s3‐eu‐west‐1.amazonaws.com/smod‐repo/arpa‐release2/Rivers_dati2013_caricati2014‐HazSubs.rdf.zip
D3.5 Final Data Harmonisation SmartOpenData project (Grant no.: 603824)
Version 1.0 Page 20 of 78 © SmartOpenData Consortium 2015
EEA Unit of measurement of Hazardous Substances ‐ http://dd.eionet.europa.eu/dataelements/48239
“ARPA‐haz‐substances‐UoM”
● RDF dump30
● graph: <http://data.smartopendata.eu/WFD/haz‐substances/uom>
EEA code list of determinands ‐ http://dd.eionet.europa.eu/datasets/latest/Groundwater/tables/HazSubstGW_Disagg/elements/DeterminandCode
“ARPA‐WFD‐determinand” ● RDF dump31
● graph: <http://data.smartopendata.eu/WFD/haz‐substances/determinands>
Table 3: Italian Pilot: summary of data harmonisation
Future Outlook The Italian pilot is being actively developed, and more user queries are to be addressed in the future work, for example:
● Which protected site or areas of a protected site are more or less subject to pollution?
● Which human activities in the protected site can lead to pollution of water and/or lakes (within and/or downstream)?
This will require adding more input data sources, such as those defining “pollution” in terms of the concentration of hazardous substances. For example, what is the acceptable value of the benzene concentration? When it is considered to be water pollution? As for the second user query, description of “human activities” needs to be added. New models will be developed to include new data sources. This in turn will affect the SmOD model (and vocabularies) which at the moment do not include either pollution or human activities definitions.
2.1.2 Portuguese‐Spanish Pilot Portuguese‐Spanish pilot is led by Empresa de Transformacion Agraria SA (TRAGSA). Besides TRAGSA, Portuguese partner ‐ Direção Geral do Território ‐ participates in the pilot as domain expert and data provider. A set of user queries of the pilot guided the process of data harmonisation: from choosing input datasets to conceptual modelling of the domain, to designing RDF models and extending SmOD vocabularies with domain‐specific terms. Below we present a few user queries for demonstration purposes32:
● What’s the land use and land cover (LULC) of my field units in Zêzere Watershed in the year x?
30 https://s3‐eu‐west‐1.amazonaws.com/smod‐repo/arpa‐release2/ARPA‐haz‐substances‐UoM.rdf.zip 31 https://s3‐eu‐west‐1.amazonaws.com/smod‐repo/arpa‐release2/ARPA‐lakes‐haz‐substances‐determinand.rdf.zip 32 Refer to [SMODD52] for more details on the pilot
D3.5 Fina
Version 1
● ●
Input DInput ddenormpd_060conceptData is
RDF MoIn orde[SMOD
As expldesign, Togetheconceptpresentiterativeeach suaugmenexistingfailed wreconsiWe pubdocumerelease
al Data Harmo
1.0
Which landWhat envirLULC?
atasets datasets amalized tab04_workunits of the doavailable fr
odelling/Cusr to developD33]. Figure
Fi
ained in D3and domaier with TRtual modets ORM moe approachubsequent rnt the existg ones. In pwith the tdered and mblished ORMentations. Iof the pilot
onisation
d use/cover ronmental f
accumulatedble with ron.wkt ‐ aomain. rom the FTP
stom Termsp RDF modee 5 schema
gure 5: Portu
3.3, input dn analysis aRAGSA andls using thodels of theh to the pilorelease of tting modelspractice, ththird releamodified. M models ofIn the currt in Annex B
Page 21 o
changes ocfactors can
d by TRAelationshipnd multipl
P server of T
s els of the pitically illust
guese‐Spanis
atasets of tand modellid SINTEF he Object‐e first releaot’s developthe pilot. Ts with moreat approacase, in wh
f all the relerent documB, among w
of 78
ccurred in mbe relevant
AGSA from ps betweene auxiliary
TRAGSA.
ilot we follotrates the m
sh Pilot, data
the pilot lang were newe perforRole Modease of the pment and That meant e concepts ch worked hich the m
eases http:/ment we inchich are:
SmartOp
© S
my field unitt in this fiel
different n various tables tha
owed the mmethodology
harmonisatio
ck proper deded prior rmed domelling (ORMpilot. Our iproduce bathat in eveand relatiofor the sec
models of
//smod‐fp7clude ORM
penData proje
SmartOpenDa
? d unit consi
data souconcepts at provide
methodologyy.
on methodolo
documentatto harmoniain analysM) techniqunitial intenackward comery new itenships, butcond releasthe previo
.github.io/ models of
ect (Grant no.
ata Consortium
idering the
urces incluof the dodefinitions
y presented
ogy
tion of the ising pilot’s is and deues [HM08ntion was tompatible meration we t not to mose of the pous release
together wf the lates
: 603824)
m 2015
ocurred
de one omain ‐ s of the
d in D3.3
schema data.
eveloped 8]. D3.3 o follow
models in were to odify the ilot, but es were
with their t (third)
D3.5 Final Data Harmonisation SmartOpenData project (Grant no.: 603824)
Version 1.0 Page 22 of 78 © SmartOpenData Consortium 2015
● Chemical Characteristics33 ‐ the model of chemical characteristics of soil ● Climatology34 ‐ the model of climatology measurements ● Forestry Tile35 ‐ the model of forestry maps and plant species ● Geometry36 ‐ the model of geometries ● Work Unit Ecosystem37 ‐ the model of animal species supported by observatory tiles ● Work Unit Location38 ‐ the model of topological relations between spatial objects
The ORM models served as input to the task of RDF data modelling. Following the set of conversion rules presented in D3.3, we transferred ORM models to RDF Schema. In Annex B we include all the resulting RDF models. Next in this section we go through the conversion rules and summarise the result of their application to the ORM models of the third release of the pilot. Mapping Object Types and Value Types to Classes In Table 4 we present ORM constructs ‐ object types and value types ‐ that were mapped to classes.
ORM construct Class URI constructionbaseURI = <http://data.smartopendata.eu/sp-pt-pilot/>
URI example
Work Unit smod:WorkUnit baseURI/so/WorkUnit/<idWorkUnit>
<http://data.smartopendata.eu/sp-pt-pilot/so/WorkUnit/ES1110100070100100001001>
Soil smod:Soil baseURI/so/Soil/<idLitholo> <http://data.smartopendata.eu/sp-pt-pilot/Soil/57>
Forestry Tile smod:ForestryTile
baseURI/so/ForestryTile/<idForestry>
<http://data.smartopendata.eu/sp-pt-pilot/so/ForestryTile/100001-MFE25>
Plant Species smod:PlantSpecies
baseURI/PlantSpecies/<codeSP1> <http://data.smartopendata.eu/sp-pt-pilot/PlantSpecies/Pinsyl>
Local number adms:Identifier
baseURI/Identifier/<idWorkUnit>
<http://data.smartopendata.eu/Identifier/ES1110100070100100001001>
Protected Site ps:ProtectedSi baseURI/ProtectedSite/
33 http://smod‐fp7.github.io/tragsa3/diagrams/ChemicalCharacteristics.png 34 http://smod‐fp7.github.io/tragsa3/diagrams/Climatology.png 35 http://smod‐fp7.github.io/tragsa3/diagrams/ForestryTile.png 36 http://smod‐fp7.github.io/tragsa3/diagrams/Geometry.png 37 http://smod‐fp7.github.io/tragsa3/diagrams/WorkUnitEcosystem.png 38 http://smod‐fp7.github.io/tragsa3/diagrams/WorkUnitLocation.png
D3.5 Final Data Harmonisation SmartOpenData project (Grant no.: 603824)
Version 1.0 Page 23 of 78 © SmartOpenData Consortium 2015
te
Parcel smod:Parcel baseURI/Parcel/<idParcel> <http://data.smartopendata.eu/sp-pt-pilot/so/Parcel/ES1110100070100100001>
Neighbourhood Municipality District NUTS3 NUTS2
au:AdministrativeUnit
baseURI/<AdministrativeUnit>/<idAdministrativeUnit>
<http://data.smartopendata.eu/sp-pt-pilot/so/Neighbourhood/ES11101000701>
Observatory Tile smod:ObservatoryTile
baseURI/so/ObservatoryTile/<idLandSp>
<http://data.smartopendata.eu/sp-pt-pilot/so/ObservatoryTile/29TNH15>
Animal Species smod:AnimalSpecies
baseURI/AnimalSpecies/<code> <http://data.smartopendata.eu/sp-pt-pilot/AnimalSpecies/Alaarv>
Geometry gsp:Geometry baseURI/Geometry/<idWorkUnit> <http://data.smartopendata.eu/sp-pt-pilot/Geometry/ES1110100070100100001001>
Corine Land Cover skos:Concept http://www.w3.org/2015/03/corine# + <code>
<http://www.w3.org/2015/03/corine#242>
‐ qb:Observation baseURI/<ClimatologyMeasurement/Observation/idClimatologyMeasurement>
<http://data.smartopendata.eu/sp-pt-pilot/AnnualHumidityLevel/Observation/65>
‐ qb:DataSet - <http://data.smartopendata.eu/sp-pt-pilot/WorkUnit-Climatology/Dataset/>
Table 4: Portuguese‐Spanish Pilot: ORM constructs mapped to classes
Mapping Associations and Value Types to Properties In Table 5 we present ORM constructs ‐ associations, value types and object types ‐ that were mapped to rdf:Property.
ORM Construct rdf:Property rdfs:domain rdfs:range
Chemical Characteristics
(Work Unit) has (Soil) smod:hasSoil smod:WorkUnit smod:Soil
(Soil) has (Acidity) smod:soilAcidity smod:Soil rdfs:Literal
(Soil) has (Permeability) + (Permeability) has Permeability Rate
smod:soilPermeabilityRate smod:Soil rdfs:Literal
D3.5 Final Data Harmonisation SmartOpenData project (Grant no.: 603824)
Version 1.0 Page 24 of 78 © SmartOpenData Consortium 2015
Geometry
(Work Unit) has (Geometry) (Parcel) has (Geometry)
gsp:hasGeometry gsp:SpatialObject gsp:SpatialObject
(Polygon) has (Surface) smod:areaHa gsp:SpatialObject rdfs:Literal
(Polygon) has (Perimeter) smod:lengthKm gsp:SpatialObject rdfs:Literal
Work Unit Ecosystem
(Observatory Tile) supports (Animal Species)
smod:supports smod:ObservatoryTile smod:AnimalSpecies
(Animal Species) has Conservation Status
smod:iucnConservationStatusCode smod:AnimalSpecies rdfs:Literals
Work Unit Location
(Work Unit) intersects (Protected Site)
gsp:sfIntersects gsp:SpatialObject gsp:SpatialObject
(Work Unit) is located in (Forestry Tile) (Work Unit) is located in (Observatory Tile) (Work Unit) is located in (Neighbourhood) (Work Unit) is located in (Parcel) (Neighbourhood) is located in (Municipality) (Municipality) is located in (District) (District) is located in (NUTS3) (NUTS3) is located in (NUTS2)
gsp:sfWithin gsp:SpatialObject gsp:SpatialObject
(Neighbourhood) has Name (Municipality) has Name (District) has Name (NUTS3) has Name (NUTS2) has Name
ramon:name ramon:Region rdfs:Literal
Table 5: Portuguese‐Spanish Pilot: ORM constructs mapped to properties
Mapping Objectified Associations In the first release of the pilot we had one objectified association39 ‐ “ForestryTileHasPlantSpecies” ‐ association between Forestry Tile and Plant Species that for every plant species of a forestry tile allows to specify representativity level of the plant species (primary, secondary or tertiary) and its density.
39 Objectified associations in ORM allow to express additional qualifying information on the relationship between two entities.
D3.5 Final Data Harmonisation SmartOpenData project (Grant no.: 603824)
Version 1.0 Page 25 of 78 © SmartOpenData Consortium 2015
In D3.3 (Section 3.3.2) we discussed different approaches to express objectified associations in RDF: from RDF reification to introducing custom properties. The latter approach was chosen to represent “ForestryTileHasPlantSpecies” in RDF. We took into account the fact, that no more representative levels were to be added to the data, and opted for a less verbose and simpler way of encoding and querying of the data as opposed to RDF reification. As a result, we introduced 6 properties:
Objectified Association rdf:Property rdfs:domain rdfs:range
(ForestryTileHasPlantSpecies) has (Representative Level)
smod:hasPrimaryPlantSpeciessmod:hasSecondaryPlantSpecies smod:hasTertiaryPlantSpecies
smod:ForestryTile
smod:PlantSpecies
(ForestryTileHasPlantSpecies) has (Density)
smod:primaryPlantSpeciesDensitysmod:secondaryPlantSpeciesDensity smod:tertiaryPlantSpeciesDensity
smod:ForestryTile
smod:PlantSpecies
In the third release of the pilot one more objectified association was added that link Work Unit and Corine Land Cover ‐ “WorkUnitHasCorineLandcover”. This association for every work unit allows to specify the code of Corine Land Cover in three years: 1990, 2000 and 2006. When choosing an RDF model for “WorkUnitHasCorineLandcover”, we followed similar logic as for “ForestryTileHasPlantSpecies”, and introduced the following three properties:
Objectified Association rdf:Property rdfs:domain rdfs:range
(WorkUnitHasCorineLandCover) in (Year)
smod:corineLandCover1990smod:corineLandCover2000 smod:corineLandCover2006
gsp:SpatialObject
skos:Concept
We chose this design solution, as this temporal aspect of Corine Land Cover codes has informative purpose rather than the purpose of combining these values with some other data sources.
External Vocabularies NUTS‐RDF and the RAMON Ontology for Administrative Regions To locate an administrative unit in the pilot, topological relations between work units and administrative units is used. To encode instances of the NUTS region, we re‐used the NUTS classification vocabulary published as Linked Data at this location http://nuts.geovocab.org/ For example, the id of the "Baixo Mondego", sub‐region of Portugal, is http://nuts.geovocab.org/id/PT162.html. Below is the definition of the sub‐region from the NUTS Linked Data set:
@prefix nuts: <http://nuts.geovocab.org/id/> .
D3.5 Final Data Harmonisation SmartOpenData project (Grant no.: 603824)
Version 1.0 Page 26 of 78 © SmartOpenData Consortium 2015
@prefix ramon: <http://rdfdata.eionet.europa.eu/ramon/ontology/> . @prefix ngeo: <http://geovocab.org/geometry#> . @prefix spatial: <http://geovocab.org/spatial#> . @prefix owl: <http://www.w3.org/2002/07/owl#> . nuts:PT162 rdf:type ramon:NUTSRegion, spatial:Feature . nuts:PT162 rdfs:label "PT162 - Baixo Mondego" . nuts:PT162 ramon:name "Baixo Mondego" . nuts:PT162 ramon:level "3"^^<http://www.w3.org/2001/XMLSchema#integer> . nuts:PT162 ramon:code "PT162" . nuts:PT162 ngeo:geometry nuts:PT162_geometry . nuts:PT162 spatial:PP nuts:PT16 . nuts:PT162 owl:sameAs <http://rdfdata.eionet.europa.eu/ramon/nuts2008/PT162> . nuts:PT162 owl:sameAs <http://ec.europa.eu/eurostat/ramon/rdfdata/nuts2008/PT162> . nuts:PT162 owl:sameAs <http://estatwrap.ontologycentral.com/dic/geo#PT162> . nuts:PT162 owl:sameAs <http://nuts.psi.enakting.org/id/PT162> .
Having URIs from this NUTS Linked Data set allows us to re‐use definitions of the NUTS regions (i.e., their names, levels, codes) and the topological relations between them. In the definition above the triple in bold tells us that the "Baixo Mondego" sub‐region is contained in the “Centro” region (http://nuts.geovocab.org/id/PT16.html) Neighbourhoods, Municipalities and Districts are units in the local administrative divisions of Spain and Portugal. To represent them in RDF, we re‐used the Administrative Units vocabulary40 and the RAMON Ontology http://rdfdata.eionet.europa.eu/ramon/ontology/. For example below is the definition of the “Coimbra” district in Portugal, which is contained in the "Baixo Mondego" sub‐region (http://nuts.geovocab.org/id/PT162.html):
@prefix ramon: <http://rdfdata.eionet.europa.eu/ramon/ontology/> . @prefix au: <http://www.w3.org/2015/03/inspire/au#> . @prefix gsp: <http://www.opengis.net/ont/geosparql#> .
<http://data.smartopendata.eu/sp-pt-pilot/so/District/PT16211> gsp:sfWithin <http://nuts.geovocab.org/id/PT162> . <http://data.smartopendata.eu/sp-pt-pilot/so/District/PT16211> a au:AdministrativeUnit , ramon:LAURegion ; ramon:name "Coimbra" ; ramon:level "2"^^xsd:int ; au:nationalLevel <http://inspire.ec.europa.eu/codelist/AdministrativeHierarchyLevel/4thOrder/> ; au:country <http://publications.europa.eu/resource/authority/country/PRT> ; au:nationalCode "PT16211" .
Data Pre-Processing Input to many target RDF models was the same file ‐ pd_0604_workunion.wkt ‐ that contains relationships between most of the concepts of the pilot, such as:
● all links between Work Unit and Climatology measurements ● all topological relationships between Work Unit and other spatial objects of the
domain: Forestry Tile, Observatory Tile, and others.
40 http://www.w3.org/2015/03/inspire/au#
D3.5 Final Data Harmonisation SmartOpenData project (Grant no.: 603824)
Version 1.0 Page 27 of 78 © SmartOpenData Consortium 2015
For example, the link gsp:sfWithin between Work Unit and Forestry Tile is generated using values from the two columns of the input dataset: idWorkUnit and idForestry. However, the file contains duplicate records of the same pairs of idWorkUnit‐idForestry. We ran the following bash commands on the input file to sort its records and remove duplicates based on the two given columns:
header=”$col1;$col2” echo $header >“$outputfile” sed 1d ./pd_0604_workunion.wkt | cut -d’;’ -f “$coln1”,”$coln2” | tr -d ’”’ | awk -F’;’ ’NF==2’ | sort -t’;’ -u >“$outputfile”
where $coln1 is the sequential number of the first column in the dataset and $coln2 is the sequential number of the second column. For example, the following command outputs the input file to generate gsp:sfWithin relationship:
header=idWorkUnit;idForestryTile echo $header >sorted_idWorkUnit_located_idForestry.wkt sed 1d ./pd_0604_workunion.wkt | cut -d’;’ -f 3,19 | tr -d ’”’ | awk -F’;’ ’NF==2’ | sort -t’;’ -u >sorted_idWorkUnit_located_idForestry.wkt
Input datasets after pre‐processing are available.
RDF Generation The RDF dump of the pilot is available for downloading41. All RDF mappings can be found in OpenRefine projects on https://smod‐refine.spaziodati.eu, the names of the projects start with “TRAGSA3” and continues with the name of the input file.
Future Outlook As a future work, RDF representation of Geometries needs to be generated.
2.1.3 Irish pilot The Irish pilot, which is led by MAC, is focused on European protected areas and its National Parks, starting with the Burren National Park in Ireland. The pilot aims to demonstrate the value of SmartOpenData in helping Researchers and Decision Makers to better manage, preserve, sustain and use this unique ecosystem. The pilot’s primary objective is to create the following sustainable services that will continue beyond the life of the project42.
1. SmartOpenData enabled European Tourism Indicator System (ETIS) Webservice for the Burren and European GeoParks Network.
2. SmartOpenData enabled App to Ground‐Truth potential Protected Monument sites 41 https://s3‐eu‐west‐1.amazonaws.com/smod‐repo/tragsa‐release3/rdf.zip 42 See [SMOD52] for a more in‐depth discussion of the pilot and its objectives.
D3.5 Final Data Harmonisation SmartOpenData project (Grant no.: 603824)
Version 1.0 Page 28 of 78 © SmartOpenData Consortium 2015
ETIS is a survey‐based generation service used to provide real‐time statistical information on the GeoPark performance in relation to the performance criteria defined by the Geopark’s Management. However the ETIS model does not yet present useful links to the SmartOpenData data model43, until the service is operational in many GeoParks, and use of a common data model will enable potential eco‐tourists to benchmark, compare and contrast the progress of various sustainable destinations in achieving their objectives, before deciding which to visit44. So it was decided to focus on the second service for now, and use the OpenRefine approach, described above and proven in the Italian pilot, to transform the Irish national heritage services as defined at http://webgis.archaeology.ie/nationalmonuments/flexviewer/ to capture data for the protected sites within the Burren Geopark region. The query generated was to reconcile the data retrieved with the official places names stored by the Logainm dataset.
Input Data Sets
The primary input data sets are the Irish Record of Monuments and Places (RMP), and Logainm, the official Irish Placenames.
In Ireland archaeological monuments are protected under the Irish National Monuments Acts 1930 ‐ 2004. The National Monuments Service of the Irish Government’s Department of Arts, Heritage and the Gaeltacht maintains a record of all known monuments and this forms the Record of Monuments and Places (RMP)45. The aim of the ground truthing service is to provide a new crowd‐sourcing way to report on and help protect such monument sites, focusing on the Burren initially. The monuments are recorded in the Irish RMP, which is available as a series of PDF documents46 and as CSV files47, i.e. One Star and Three Star. The aim was transform it to 5 Star open data48.
Logainm provides the definitive standard authorised forms of all Irish place names in both English and Irish49. It has recently been made available in linked open data format as Linked Logainm, in various formats including RDF, XML and JSON50.
RDF Graph and Table
The following summarises the Protected Monuments Sites data and its linking with the Linked Logainm:
43 As discussed in [SMOD33] and [SMOD34] 44 As discussed in D5.1 “Rationale of the Pilots”. 45 www.archaeology.ie 46 Available at http://www.archaeology.ie/publications‐forms‐legislation/record‐of‐monuments‐and‐places 47 https://data.gov.ie/data/search?q=monuments&theme‐primary=Arts 48 as described at http://5stardata.info/en/ 49 www.logainm.ie/en 50 www.logainm.ie/en/inf/proj‐machines
D3.5 Fina
Version 1
Class
dc:iden
ps:site
foaf:na
owl:sa
geo:lo
geo:lat
Data Pr
Multiplidentifie
Joining
In ordeadd the‘Add reinforma
al Data Harmo
1.0
ntifier
eDesignation
ame
meAs
cation
t_long
re-processin
e records ed using str
DataSets
r to extende Logainm Reconciliatioation:
onisation
n
ng
within the raightforwa
d the heritaRDF reconcin service’
Page 29 o
dataset wrd OpenRef
age dataset liation serv> ‘Based o
of 78
Desc
Uniq
The c
The T
The r
ITM
Irish
were recordfine rules.
OpenRefinvice in Openon SPARQL
SmartOp
© S
cription
ue Identifie
classificatio
Townland n
reconciliatio
Reference (
Grid Refere
ded as red
ne’s RDF recnRefine, useL endpoint
penData proje
SmartOpenDa
er for site
n of the Site
name of the
on link to Lo
(E,N)
ence (E,N)
undant. Th
conciliationers need to ...’, and fil
ect (Grant no.
ata Consortium
e
e site
ogAinm
hese record
n tool was unavigate toll in the fo
: 603824)
m 2015
ds were
used. To o ‘RDF’ > ollowing
D3.5 Final Data Harmonisation SmartOpenData project (Grant no.: 603824)
Version 1.0 Page 30 of 78 © SmartOpenData Consortium 2015
Name: Logainm
Endpoint URL: http://data.logainm.ie/sparql Type:
Virtuoso Label Properties:
Also check ‘foaf:name’
The reconciliation was run against the “Townland” name. Once reconciliation was complete manual manipulation was required to resolve the correct townland. Once this process was complete the sameAs link to the Logainm URI’s needed to be added to the RDF. This was accomplished by editing the RDF skeleton and associating the sameAs property to the URI column. A sample of the RDF output is shown below.
#<?xml version="1.0" encoding="UTF‐8"?><rdf:RDF xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos" xmlns:rdf="http://www.w3.org/1999/02/22‐rdf‐syntax‐ns#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" xmlns:rdfs="http://www.w3.org/2000/01/rdf‐schema#" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:dc="http://purl.org/dc/elements/1.1/"> <rdf:Description rdf:about="http://localhost:3333/0"> <dc:description>Anomalous stone group</dc:description> <foaf:name>CARHEENYBAUN</foaf:name> <owl:sameAs>http://data.logainm.ie/place/19220</owl:sameAs> <location xmlns="http://www.w3.org/2003/01/geo/wgs84_pos#">143036, 192476</location> <dc:identifier>GA133‐003‐‐‐‐</dc:identifier> </rdf:Description> <rdf:Description rdf:about="http://localhost:3333/1"> <dc:description>Anomalous stone group</dc:description> <foaf:name>CARHEENYBAUN</foaf:name> <owl:sameAs>http://data.logainm.ie/place/19220</owl:sameAs> <location xmlns="http://www.w3.org/2003/01/geo/wgs84_pos#">142686,
D3.5 Final Data Harmonisation SmartOpenData project (Grant no.: 603824)
Version 1.0 Page 31 of 78 © SmartOpenData Consortium 2015
192200</location> <dc:identifier>GA133‐004‐‐‐‐</dc:identifier> </rdf:Description> <rdf:Description rdf:about="http://localhost:3333/2"> <dc:description>Architectural fragment</dc:description> <foaf:name>BALLYMAHONY</foaf:name> <owl:sameAs>http://data.logainm.ie/place/5830</owl:sameAs> <location xmlns="http://www.w3.org/2003/01/geo/wgs84_pos#">119634, 198730</location> <dc:identifier>CL009‐014003‐</dc:identifier> </rdf:Description> <rdf:Description rdf:about="http://localhost:3333/3"> <dc:description>Architectural fragment</dc:description> <foaf:name>FANTA GLEBE</foaf:name> <owl:sameAs>http://data.logainm.ie/place/6718</owl:sameAs> <location xmlns="http://www.w3.org/2003/01/geo/wgs84_pos#">116138, 195021</location> <dc:identifier>CL009‐085003‐</dc:identifier> </rdf:Description> <rdf:Description rdf:about="http://localhost:3333/4"> <dc:description>Architectural fragment</dc:description> <foaf:name>BALLYCONNOE NORTH</foaf:name> <owl:sameAs>http://data.logainm.ie/place/5796</owl:sameAs> <location xmlns="http://www.w3.org/2003/01/geo/wgs84_pos#">116818, 200467</location> <dc:identifier>CL009‐004006‐</dc:identifier> </rdf:Description> <rdf:Description rdf:about="http://localhost:3333/5"> <dc:description>Architectural fragment</dc:description> <foaf:name>KILMOON WEST</foaf:name> <owl:sameAs>http://data.logainm.ie/place/6627</owl:sameAs> <location xmlns="http://www.w3.org/2003/01/geo/wgs84_pos#">114875, 200000</location> <dc:identifier>CL008‐049006‐</dc:identifier> </rdf:Description> <rdf:Description rdf:about="http://localhost:3333/6"> <dc:description>Architectural fragment</dc:description> <foaf:name>LISHEENEAGH</foaf:name> <owl:sameAs>http://data.logainm.ie/place/5808</owl:sameAs> <location xmlns="http://www.w3.org/2003/01/geo/wgs84_pos#">116508, 203537</location>
D3.5 Final Data Harmonisation SmartOpenData project (Grant no.: 603824)
Version 1.0 Page 32 of 78 © SmartOpenData Consortium 2015
<dc:identifier>CL005‐063004‐</dc:identifier></rdf:Description> <rdf:Description rdf:about="http://localhost:3333/7"> <dc:description>Architectural fragment</dc:description> <foaf:name>KILMOON WEST</foaf:name> <owl:sameAs>http://data.logainm.ie/place/6627</owl:sameAs> <location xmlns="http://www.w3.org/2003/01/geo/wgs84_pos#">115017, 200061</location> <dc:identifier>CL008‐049007‐</dc:identifier> </rdf:Description> <rdf:Description rdf:about="http://localhost:3333/8"> <dc:description>Architectural fragment</dc:description> <foaf:name>CLOONEY SOUTH</foaf:name> <owl:sameAs>http://data.logainm.ie/place/6653</owl:sameAs> <location xmlns="http://www.w3.org/2003/01/geo/wgs84_pos#">119259, 188007</location> <dc:identifier>CL016‐105005‐</dc:identifier> </rdf:Description> <rdf:Description rdf:about="http://localhost:3333/9"> <dc:description>Architectural fragment</dc:description> <foaf:name>KILFENORA</foaf:name> <owl:sameAs>http://data.logainm.ie/place/6720</owl:sameAs> <location xmlns="http://www.w3.org/2003/01/geo/wgs84_pos#">118338, 193926</location> <dc:identifier>CL016‐171‐‐‐‐</dc:identifier> </rdf:Description>
Conclusion
OpenRefine and use of the standard data.smartopendata.eu vocabularies (SmartOpenData Protected Sites, FOAF and Dublin Core)51 enabled the transformation to be completed.
Transforming the Monuments dataset to RDF was completed using the OpenRefine Tools. This allowed the data to be mashed together with the Linked Logainm source to produce the National Monument locations linked with the definitive Irish placenames of those locations. The exercise has ensured that both the Logainm and National Monuments teams will collaborate more closely in the future, and help to ensure the wider use of both.
The first approach to doing this was to build on the Slovakian pilot approach and used the National Monuments datasets as transformed to the INSPIRE Protected Sites theme52,
51 See Table 2
D3.5 Fina
Version 1
howeveversiontransfoimprove
2.1.4 TGrafterup so cnumberpublish transfoincremeand RDClojure In Grafperformin such operatiHere istabular
In ordeupload require
52 Availabhttps://wA00A‐BC
al Data Harmo
1.0
er the tran of the nrmed. Thesement of th
Transformizer is an incalled pipelir of consecdatasets
rmations apentally. WitF mappingscode. fterizer, eams simple d way that oons gives a given a shdata. Exam
er to see thit in a rawd functions
ble at www.geoporta45AB9F55F6%
onisation
nsformationational mose weaknehe quality o
ming Data nteractive tines and RDcutive actionas linked pplied to a th Grafterizs are displa
ch single tdata conversoutput of ogreat flexibhort demonmple is taken
he instant w tabular fs to a pipe
al.ie/geoporta%7d
Page 33 o
n to RDF foonuments esses are nf the datase
with Grafool for creaDF mappingns that are data. The subset of ther, data trayed both in
transformatsion on its one pipe acbility and allnstration ofn from the A
Figure 6: Or
preview offormat. Neeline. Each
al/catalog/sea
of 78
ound that dataset, wnow being et involved.
fterizer anating data tgs. Pipeline applied to user intehe chosen dnsformation tabular fo
tion step iinput. Nextcts as an inlows to perf how GrafARPA data o
riginal Sample
f created text, the tratime a pip
arch/resource
SmartOp
© S
there werewhich is whaddressed.
nd the Jarftransformats are essen datasets. Rerface provdataset thans can be uorm in a grid
s defined t, these funnput for anoform ratherterizer toolof the Italia
e ARPA Data
ransformatnsformatiopeline is mo
e/details.page
penData proje
SmartOpenDa
e weaknesshy the ori, so the e
fter Servictions. It allontially scriptRDF mappinvides a livet allows useuploaded and, and in th
as a pipe ctions are cother. This r complex dl performs n Pilot.
ion on then itself is odified, the
?uuid=%7bF6
ect (Grant no.
ata Consortium
ses in the ginal datasexercise led
ce ows the usets that conngs can be e preview ers to specind stored. Pheir script fo
– a functicombined tway of com
data convertransforma
e data, onecreated bye transform
6DE3EBB‐FC5C
: 603824)
m 2015
INSPIRE set was d to an
er to set sist of a used to of the
ify them Pipelines orm ‐ as
ion that together mposing sions. ation on
e should y adding mation is
C‐4D79‐
D3.5 Fina
Version 1
appliedstep. At
When tskeleto
Both pimay beapply itsupportRDF/JSO
al Data Harmo
1.0
to the prevt any stage
tabular datn being cre
pelines ande easily shat to the targted formatsON(.rj).
onisation
viewed datof the trans
a is in desiated is clea
d RDF mappred and reuget datasets include RD
Page 34 o
aset immedsformation
Figure 7: RD
irable formrly visualize
pings are stoused. After and downlDF/XML(.rd
of 78
diately, so othe modifie
DF mapping fo
mat, one caned, showing
ored togethtransformaload resultsdf), n‐triple(
SmartOp
© S
one can seeed tabular d
or ARPA data
n start creag nodes and
her as compation is cons locally in d.nt), turtle(
penData proje
SmartOpenDa
e the effect data can be
ating RDF md correspon
plete data trnstructed andesired RDF.ttl), n3(.n3
ect (Grant no.
ata Consortium
of each peexported.
mappings. Tding relatio
ransformatind saved, oF format. C3), nquads(.
: 603824)
m 2015
rformed
The RDF ons.
ions and one may Currently .nq) and
D3.5 Final Data Harmonisation SmartOpenData project (Grant no.: 603824)
Version 1.0 Page 35 of 78 © SmartOpenData Consortium 2015
Figure 8: Generated RDF graph for ARPA data
Jarfter In order to apply the transformations on the complete datasets, we have developed Jarfter for the purpose of SmartOpenData. Jarfter is a set of web services that allow for server side compilation of data transformations (serialized as Clojure code) as well as execution of transformations on uploaded datasets. They can be accessed through the user interface shown in Figure 9, giving the users two options for how to transform their data.
Figure 9: User interface for Jarfter
The Execute transformation operation performs the complete transformation of the entire dataset based on the generated Clojure code corresponding to the transformation. The code and data are uploaded to the server, and when the transformation is complete the browser downloads a file containing the transformed data. The second option is to use the Download transformation executable operation, which only does half the job compared to “Execute transformation”. The server receives only the generated Clojure source code and not the dataset. The Clojure code is compiled to an
D3.5 Fina
Version 1
executa"transfothe clouwhere t
Jarfter AJarfter icompilaimplemthe dat
A schemexecutageneratend whuser (wthe ClojJarfter a"Executservices
al Data Harmo
1.0
able JAR ormation.jaud. The JARthe file is lo
$java -jar t
Architecturis a set of Ration of Clomented a waabase.
matic overvable" capabted from thhere it is cowhere it can jure source also supporte transforms:
onisation
file, whicar". This fileR file can bocated as fo
transformati
re RESTful webojure code aay that allo
view of thebility, is shohe user‐specompiled to be executecode and trts executiomation" cap
Page 36 o
h is thene can then be ran by ullows:
ion.jar <inp
b services wand executiows Jarfter
Figure 10: J
compiler sown in thecified transfan executaed locally), ohe executaon of the trapability. Figu
of 78
n automatbe used tosing the co
put-file.csv
with a back‐on of transto be used
Jarfter compi
service, acce figure abformationsble JAR fileor if the datble JAR are ansformatioure 11 prov
SmartOp
© S
tically dowo transform ommand lin
v> <output-f
end databasformationsd both with
iler services
cessed by thove. The Cin Grafteriz
e. The JAR ftabase interstored in thons on the svides an ove
penData proje
SmartOpenDa
wnloaded datasets lo
ne interface
ile.(nt|rdf|
ase which al on datasetand witho
he "DownloClojure sourzer, is sent file can theractive servhe back‐endserver side,erview of th
ect (Grant no.
ata Consortium
by the uocally instee from the
|n3|ttl)>
llow for serts. The servout interact
oad transforce code, wto the serven sent bacvices are used database , as exposedhe workflow
: 603824)
m 2015
user as ad of in location
rver side vices are ing with
ormation which is er back‐k to the ed, both as well. d by the w for the
D3.5 Fina
Version 1
As showfor a trand theexecutaThe tradatasetdataset WarfteThe aptransfotransfotransfodemandlevel ov12:
al Data Harmo
1.0
wn on the dransformatie dataset able. In casnsformed dt itself is givt before it se
r: Dynamic pproach imrmations armation prrmation exd, which caverview of t
onisation
Fig
diagram, theon and theitself. Any e the user dataset is aven as inputends the tra
Deploymemplementedallows for rocess. In xecutable. Tn then be uthe intende
Page 37 o
gure 11: Jarfte
e client mue dataset thprovided provides a also downlot. The back‐ansformed
nt of Data Td with the the realizaparticular, This properused to dyned process
of 78
er transforma
st either prhat will be tsource codreference,
oaded from‐end then exdata back t
Transforma Jarfter seation of a this is durty allows namically foof forming
SmartOp
© S
ation web serv
rovide refertransformede is dynathe JAR is the databaxecutes theto the user.
ations (Jarftervice withvery high e to the sfor the crerm cloud da simple to
penData proje
SmartOpenDa
vice
rences to thd, or the Cmically comextracted fase, if an ee JAR and tr
ter extensioh regard tolevel of astatelessneeation of teployment opology is i
ect (Grant no.
ata Consortium
he databaseClojure sourmpiled intofrom the dantry instearansforms th
on) o generatinautomationss of all rtransformattopologies.llustrated i
: 603824)
m 2015
e entries rce code o a JAR atabase. d of the he given
ng data of the resulting tions on . A high‐n Figure
D3.5 Fina
Version 1
First, ususing thsent to JAR filecloud retime enaccessedatasetAs mento use Cmodellimodellisoftwar
al Data Harmo
1.0
sers need tohe Grafterithe Jarfter e, and forwesources annvironmented by the trts. ntioned, in oCloudML. Cing and ening languagre and hard
onisation
Figure 12
o specify thzer tool. Wback‐end w
wards it to and deployin. Finally, thansformatio
order to imloudML comnacting thege allows fware resou
F
Page 38 o
2: Dynamic de
he transformWhen the trwhere the Ca "deployerng applicatihe transformon owner o
plement thmprises a see provisionfor the speurces as sho
Figure 13: Clou
of 78
eployment of
mation thatransformatiClojure comr" componeions (in ourmation thator other use
he dynamic et of tools, ning and decification own in Figur
udML deploy
SmartOp
© S
f data transfo
t needs to bion is readypiler serviceent, capablr case, we pt has been ers to apply
deploymenand a domdeploymentof cloud tre 13:
ment templat
penData proje
SmartOpenDa
rmations
be deployedy, the transe forms an ee of dynamplan to usedeployed inthe transfo
nt of transfoain‐specifict of cloud opologies a
te
ect (Grant no.
ata Consortium
d on the clisformation executable mically prove the Cloudn the cloudormation to
ormations, c language (applicatio
and the ne
: 603824)
m 2015
ent‐side code is WAR or visioning ML run‐d can be o various
we plan (DSL) for ns. The ecessary
D3.5 Final Data Harmonisation SmartOpenData project (Grant no.: 603824)
Version 1.0 Page 39 of 78 © SmartOpenData Consortium 2015
The CloudML template in the figure represents a simple deployment topology which comprises of two software components – the aforementioned executable transformation (labelled "Transformation"), a generic servlet container (labelled "SC"). Additionally, the figure illustrates also a specification of a hardware component – a virtual machine (labelled "VM"). In CloudML virtual machines are specified in a provider‐agnostic way through a set of hardware requirements, which can then be matched by the available flavors of virtual machines, available from each particular provider. The CloudML template can be programmatically edited to inject details on how to deploy a particular transformation that has been generated by the Clojure compiler. This resulting model can be sent to the CloudML engine, which can then enact the provisioning and deployment of the necessary resources, through a process of matching the hardware and software requirements with the available capabilities. Grafterizer vs. OpenRefine – An Overview This section gives an analysis of the data transformation process for ARPA and DGT‐TRAGSA pilot use cases. Transformations have been performed with help of two data cleaning and transformation tools – OpenRefine and Grafterizer. Below there is given a comparison of transformation construction process for these tools. The first difference that significantly affects the data transformation process is possibility to create utility functions in Grafterizer. This allows to separate computational logic from data it operates on. Thus, the formula for computing geographical coordinates for ARPA pilot Lakes/Rivers Monitoring Stations in OpenRefine project is defined twice: for computing latitude and for computing longitude operations. Grafterizer allows it to be encapsulated in separate function which can be called as many times as needed. Another difference lies in possibility to keep original cell value if an error occurs during transformation in OpenRefine – the feature that is not currently available in Grafterizer. Some transformations in tested use cases require cross‐dataset operations. This feature has been introduced in OpenRefine, but Grafterizer currently doesn't allow to read several datasets at the same time in one pipeline. One rather useful feature of Grafterizer data transformation is the possibility to edit parameters of each transformation step and change step order at any moment of creating the transformation, that is impossible to do with help of OpenRefine. At the same time OpenRefine provides transformation history with Undo/Redo options. The functionality for the RDF mapping construction is similar for both tools with some small differences. One of them is that at its current stage Grafterizer doesn't provide functionality for creating language‐tagged nodes.
D3.5 Final Data Harmonisation SmartOpenData project (Grant no.: 603824)
Version 1.0 Page 40 of 78 © SmartOpenData Consortium 2015
The short summary of differences in functionality of mentioned tools is given in the table below.
Feature OpenRefine Grafterizer
Basic functionality
Encapsulating and reusing utility functions in one transformation ‐ +
Ignore errors(leave original data on error) + ‐
Cross‐dataset operations(join datasets) + ‐
Transformation operations management
Edit transformation operation ‐ +
Change operation order ‐ +
Transformation history with undo/redo options + ‐
RDF mapping
Language‐tagged nodes + ‐
D3.5 Final Data Harmonisation SmartOpenData project (Grant no.: 603824)
Version 1.0 Page 41 of 78 © SmartOpenData Consortium 2015
2.2 XML (GML) ‐TO‐RDF transformations
2.2.1 Slovak pilot
Main motivation for the selected approach Based on the further analysis of the available datasets, technologies, knowledge capacities in the field of the data harmonisation and previous experience documented in D3.3 (Chapter 6.4) [SMODD33] additional datasets have been harmonised in order to implement SmartOpenData modelling framework to support Slovakian pilot. Main motivation of this approach was to investigate the possibilities to expose both INSPIRE compliant as well as other datasets into the Web of (Linked) Data and enrich these resources with external knowledge.
Data model and storage In addition to the initial SK INSPIRE Protected Sites dataset53 transformed following GeoKnow XSL stylesheets further list of dataset has been identified and prepared for the transformation with the support of the COMSODE project54. Activity covered several of the datasets that SAZP publishes to comply with the European Union's INSPIRE directive55, including data on protected sites, species distribution, bio‐geographical regions, and land cover; and an additional dataset on contaminated sites registered as environmental burdens. The INSPIRE datasets were described with the INSPIRE XML schemas, while the latter dataset used a custom XML schema. The source data is available in the Geography Markup Language (GML) via an API provided by the Web Feature Service (WFS). Note for “Input dataset hyperlinks” in following table: Instructions in this column are related to the bash script56, that downloads individually datasets from WFS (script requires curl http client). In the output of the script you see dataset title and the relevant request WFS. Request URL is closed between the characters '<' and '>'. It necessary to copy it as a whole (not recommended open queries in browser). Bash script downloads dataset into the file system. Most of the requests contains cql_filter where, selecting the data only for the Slovakia (Database contains also data for Czech Republic). Request 'Corine landcover' may take a few minutes as it contains about 22000 features a transformation from relational DB into GML is happening "on the fly". All data are in EPSG: 4258 (ETRS89) geographic coordinates.
53 http://ckan.sazp.sk/dataset/inspire‐protected‐sites‐linked‐data/resource/fba4d3b8‐195c‐4224‐a7b9‐ab734c6e933d 54 http://www.comsode.eu/ 55 http://inspire.ec.europa.eu/index.cfm/pageid/3 56 http://redmine.sazp.sk/attachments/download/136/retrieve‐smod‐datasets.sh
D3.5 Final Data Harmonisation SmartOpenData project (Grant no.: 603824)
Version 1.0 Page 42 of 78 © SmartOpenData Consortium 2015
No. Input dataset Target vocabulary Output dataset
1 National parks and protected landscape areas57
SmOD Protected Sites 58
SK LD INSPIRE Protected Sites59
2 Small scale protected areas60 SmOD Protected Sites
SK LD INSPIRE Protected Sites
3 Protected natural monuments61 SmOD Protected Sites
SK LD INSPIRE Protected Sites
4 Special protection areas ‐ Bird directive62
SmOD Protected Sites
SK LD INSPIRE Protected Sites
5 Sites of community importance ‐ Habitat Directive63
SmOD Protected Sites
SK LD INSPIRE Protected Sites
57 WFS, GML> Dowloading '01. National parks and protected landscape areas' ... URL:<http://inspire.geop.sazp.sk/geoserver/wfs?request=GetFeature&version=1.1.0&outputFormat=gml32&typeName=ps:ProtectedSite&cql_filter="ps:siteDesignation/ps:DesignationType/ps:designation" in ('nationalPark','ProtectedLandscapeOrSeascape') and "ps:inspireID/base:Identifier/base:namespace" = 'SK:GOV:MOE:SEA:PS'> GML output saved in `01_NP_LPA.gml'. 58 http://www.w3.org/2015/03/inspire/ps# 59 http://ckan.sazp.sk/dataset/inspire‐protected‐sites‐linked‐data/resource/1d6e0fdf‐df3d‐4a69‐bd5e‐d49aa16d6596 60 WFS, GML> Dowloading '02. Small scale protected areas' ... URL: <http://inspire.geop.sazp.sk/geoserver/wfs?request=GetFeature&version=1.1.0&outputFormat=gml32&typename=ps:ProtectedSite&cql_filter="ps:siteDesignation/ps:DesignationType/ps:designation" in ('managedResourceProtectedArea','strictNatureReserve','wildernessArea') and "ps:inspireID/base:Identifier/base:namespace" = 'SK:GOV:MOE:SEA:PS'>GML output saved in `02_SSPA.gml' 61 WFS, GML>Dowloading '03. Protected natural monuments' ... URL: <http://inspire.geop.sazp.sk/geoserver/wfs?request=GetFeature&version=1.1.0&outputFormat=gml32&typename=ps:ProtectedSite&cql_filter="ps:siteDesignation/ps:DesignationType/ps:designation" in ('naturalMonument') and "ps:inspireID/base:Identifier/base:namespace" = 'SK:GOV:MOE:SEA:PS'> GML output saved in `03_PNM.gml' 62 WFS, GML>Dowloading '04. Special protection areas ‐ Bird directive' ... URL: <http://inspire.geop.sazp.sk/geoserver/wfs?request=GetFeature&version=1.1.0&outputFormat=gml32&typename=ps:ProtectedSite&cql_filter="ps:siteDesignation/ps:DesignationType/ps:designation" in ('specialProtectionArea') and "ps:inspireID/base:Identifier/base:namespace" = 'SK:GOV:MOE:SEA:PS'> GML output saved in `04_SPA.gml 63 WFS,GML>Dowloading '05. Sites of community importance ‐ Habitat Directive' ... URL: <http://inspire.geop.sazp.sk/geoserver/wfs?request=GetFeature&version=1.1.0&outputFormat=gml32&typename=ps:ProtectedSite&cql_filter="ps:siteDesignation/ps:DesignationType/ps:designation" in ('siteOfCommunityImportance') and "ps:inspireID/base:Identifier/base:namespace" = 'SK:GOV:MOE:SEA:PS'> GML output saved in `05_SCI.gml'
D3.5 Final Data Harmonisation SmartOpenData project (Grant no.: 603824)
Version 1.0 Page 43 of 78 © SmartOpenData Consortium 2015
6 Biosphere reserves64 SmOD Protected Sites
SK LD INSPIRE Protected Sites
7 Ramsar65 SmOD Protected Sites
SK LD INSPIRE Protected Sites
8 UNESCO world nature heritage sites66
SmOD Protected Sites
SK LD INSPIRE Protected Sites
9 Protected landscape elements67 SmOD Protected Sites
SK LD INSPIRE Protected Sites
10 Corine Land Cover68 SmOD Land Cover69 SK LD Land Cover
11 Contaminated sites / Environmental burdens
SK Contaminated sites / Environmental
SK LD Contaminated sites / Environmental
64 WFS, GML>Dowloading '06. Biosphere reserves' ... URL: <http://inspire.geop.sazp.sk/geoserver/wfs?request=GetFeature&version=1.1.0&outputFormat=gml32&typename=ps:ProtectedSite&cql_filter="ps:siteDesignation/ps:DesignationType/ps:designation" in ('biosphereReserve') and "ps:inspireID/base:Identifier/base:namespace" = 'SK:GOV:MOE:SEA:PS'> GML output saved in `06_BR.gml' 65 WFS, GML>Dowloading '07. Ramsar' ... URL: <http://inspire.geop.sazp.sk/geoserver/wfs?request=GetFeature&version=1.1.0&outputFormat=gml32&typename=ps:ProtectedSite&cql_filter="ps:siteDesignation/ps:DesignationType/ps:designationScheme" in ('ramsar') and "ps:inspireID/base:Identifier/base:namespace" = 'SK:GOV:MOE:SEA:PS'> GML output saved in `07_RAMSAR.gml 66 WFS, GML>Dowloading '08. UNESCO world nature heritage sites' ... URL: <http://inspire.geop.sazp.sk/geoserver/wfs?request=GetFeature&version=1.1.0&outputFormat=gml32&typename=ps:ProtectedSite&cql_filter="ps:siteDesignation/ps:DesignationType/ps:designationScheme" in ('UNESCOWorldHeritage') and "ps:inspireID/base:Identifier/base:namespace" = 'SK:GOV:MOE:SEA:PS'> GML output saved in `08_UNESCO.gml 67 WFS, GML>Dowloading '09. Protected landscape elements' ... URL: <http://inspire.geop.sazp.sk/geoserver/wfs?request=GetFeature&version=1.1.0&outputFormat=gml32&typename=ps:ProtectedSite&cql_filter="ps:siteDesignation/ps:DesignationType/ps:designation" in ('naturalMonument') and "ps:inspireID/base:Identifier/base:namespace" = 'SK:GOV:MOE:SEA:PS'> GML output saved in `09_PLE.gml 68 WFS, GML>Dowloading '10. Corine landcover' ... URL: <http://inspire.geop.sazp.sk/geoserver/wfs?request=GetFeature&version=1.1.0&outputFormat=gml32&typename=lcv:LandCoverUnit&cql_filter="lcv:inspireId/base33:Identifier/base33:namespace" = 'SK:GOV:MOE:SEA:LC'> GML output saved in `10_CLC.gml 69 http://www.w3.org/2015/03/inspire/lc#
D3.5 Final Data Harmonisation SmartOpenData project (Grant no.: 603824)
Version 1.0 Page 44 of 78 © SmartOpenData Consortium 2015
burdens vocabulary70
burdens71
12 Biogeographical regions72 SmOD Biogeographical regions73
SK LD Biogeographical regions74
13 Species distribution (Selected taxons)75
SmOD Species distribution76
SK LD Species distribution77
Table 6: An overview of the datasets and vocabularies used in SK Pilot
Process, tools and technologies The whole process of data harmonisation was driven by the development of the related components of the SmartOpenData infrastructure as well as by the selected elements of the COMSODE methodology for Open Data publishing78.
70 http://data.sazp.sk/vocab/contaminated‐sites 71 http://ckan.sazp.sk/dataset/sk‐environmental‐burdens‐contaminated‐sites/resource/a33b9933‐937a‐4cca‐89d7‐223703bb1187 72 WFS, GML>Dowloading '12. Biogeographical regions' ... URL: <http://inspire.geop.sazp.sk/geoserver/wfs?request=GetFeature&version=1.1.0&outputFormat=gml32&typeName=br:Bio‐geographicalRegion&cql_filter="br:inspireId/base33:Identifier/base33:namespace" = 'SK:GOV:MOE:SEA:BR'> GML output saved in `12_BIO_REGIONS.gml 73 http://www.w3.org/2015/03/inspire/br# 74 http://ckan.sazp.sk/dataset/sk‐inspire‐bio‐geographical‐regions‐linked‐data 75 WFS, GML>Dowloading '13. Species distribution' ... URL: <http://inspire.geop.sazp.sk/geoserver/wfs?request=GetFeature&version=1.1.0&outputFormat=gml32&typeName=sd:SpeciesDistributionUnit&cql_filter="sd:inspireId/base33:Identifier/base33:namespace" = 'SK:GOV:MOE:SEA:SD'> GML output saved in `13_SD.gml' 76 http://www.w3.org/2015/03/inspire/sd# 77 http://ckan.sazp.sk/dataset/sk‐inspire‐species‐distribution‐linked‐data 78 http://opendatanode.org/product/methodology‐for‐od‐publishing/
D3.5 Final Data Harmonisation SmartOpenData project (Grant no.: 603824)
Version 1.0 Page 45 of 78 © SmartOpenData Consortium 2015
Table 7: List of phases and tasks extracted and deployed from the COMSODE methodology for Open Data
publishing
In brief the whole process was initiated with the task related to the harvesting the data from the WFS and converting it to RDF. During the process of conversion alignment of the data with selected RDF vocabularies and code lists took place. Some of these newly created linked data were interlinked with the third‐party data in order to enrich it.
Creating linked data Whole process of the data transformation have been undertaken with the support of the Unified Views Extract‐Transform‐Load (ETL) framework79 creating the core component of Open Data Node (ODN) – publication platform for Open data where it ensures extraction, transformation, and publishing of (Linked) Open Data. This environment allows to define, execute, monitor, debug, schedule, and share RDF data processing tasks.
79 http://opendatanode.org/product/unifiedviews/
D3.5 Final Data Harmonisation SmartOpenData project (Grant no.: 603824)
Version 1.0 Page 46 of 78 © SmartOpenData Consortium 2015
A data processing task (or simply task) consists of one or more data processing units. This tasks may use custom plugins ‐ data processing units (DPU) created by users. A data processing unit (DPU) encapsulates certain business logic needed when processing data (e.g., one DPU may extract data from an RDF database or apply a SPARQL query). Every DPU has its inputs, outputs, business logic and configuration. UnifiedViews differs from other ETL frameworks by natively supporting RDF data and ontologies. UnifiedViews has a graphical user interface for the administration, debugging, and monitoring of the ETL process. Since GML is an XML format harvested data were converted to RDF/XML via XSL transformations. In order to do this XSL transformations developed by the GeoKnow project80 were reused. To reflect recent development extensive set of GeoKnow XSLT style sheets have been updated81: These updates contained aside some bug fixes also changes related to mapping against the SmOD vocabularies82 as well as specific modifications related to the UnifiedViews.
80 http://geoknow.eu 81 https://github.com/jindrichmynarz/TripleGeo/tree/sazp/xslt 82 http://www.w3.org/2015/03/inspire
D3.5 Final Data Harmonisation SmartOpenData project (Grant no.: 603824)
Version 1.0 Page 47 of 78 © SmartOpenData Consortium 2015
Figure 14: List of updated GeoKnow XSLT stylesheets
For each dataset a data processing pipeline has been built in the UnifiedViews component of the ODN. The pipelines harvested the data from the WFS and converted it to RDF via XSL transformations.
D3.5 Final Data Harmonisation SmartOpenData project (Grant no.: 603824)
Version 1.0 Page 48 of 78 © SmartOpenData Consortium 2015
Figure 15: Landing page for Unified Views
Figure 16: List of created pipelines
Transformation of the SK datasets have been designed and executed with the following list of DPUs:
e‐distributionMetadata t‐geonamesOrgToRdfFile
e‐filesDownload t‐gunzipper
e‐sparqlEndpoint t‐rdfToFiles
l‐filesToCkan t‐sparqlConstruct
l‐filesToParliament t‐sparqlUpdate
l‐filesToVirtuoso t‐unzipper
l‐filesUpload t‐xslt
l‐rdfToCkan t‐zipper
t‐filesToRdf
D3.5 Final Data Harmonisation SmartOpenData project (Grant no.: 603824)
Version 1.0 Page 49 of 78 © SmartOpenData Consortium 2015
Figure 17: Section with DPU templates
Each DPU comes with specific functionality, eg. l‐filesToParliament is a DPU that loads RDF serialized in files to the Parliament RDF store via its HTTP API for bulk upload83, whist t‐geonamesOrgToRdfFile is a DPU that transforms dump of Geonames.org data into RDF. The dump is not valid RDF, since it consists of line‐separated pairs of URIs and corresponding descriptions of the URIs serialized in RDF/XML. This DPU parses the dump format and outputs valid RDF file84.
Figure 18: Pipelines execution monitor
83 https://github.com/UnifiedViews/Plugins/blob/master/l‐filesToParliament/doc/About.md 84 https://github.com/comsode‐uv‐plugins/t‐geonamesOrgToRdfFile/blob/develop/README.md
D3.5 Final Data Harmonisation SmartOpenData project (Grant no.: 603824)
Version 1.0 Page 50 of 78 © SmartOpenData Consortium 2015
Figure 19: Scheduler with the possibility to define the schedules for pipelines execution
Figure 20: Section with additional settings
Figure 21: Example of pipeline details
D3.5 Final Data Harmonisation SmartOpenData project (Grant no.: 603824)
Version 1.0 Page 51 of 78 © SmartOpenData Consortium 2015
Figure 22: Example of further DPU settings
To fulfill the requirements of the project, ODN has been enhanced with a loader to the Parliament RDF store (http://parliament.semwebcentral.org) and an extractor for Geonames.org. Parliament is used by SAZP to store RDF data because it supports geospatial features. Extractor for Geonames.org was needed in order to be able to link to this dataset.
Interlinking In order to provide the linkages to the external resources following enrichment of the generated linked data have been identified and in addition to the data transformation pipelines, there has been created pipelines for enriching the datasets with links to external datasets including Geonames.org and 3 datasets from the European Environmental Agency (Biogeographical regions 2011, Natura 2000 and EUNIS). :
● SK Protected Sites <> GeoNames85 ● SK Protected Sites <> EEA Natura 200086 ● SK Contaminated Sites <> GeoNames
85 http://www.geonames.org/ 86 http://natura2000.eea.europa.eu/rdf/
D3.5 Final Data Harmonisation SmartOpenData project (Grant no.: 603824)
Version 1.0 Page 52 of 78 © SmartOpenData Consortium 2015
Figure 23: Example of the interlinking pipeline
Publishing Main outcomes of this harmonisation activities available via:
● Human readable SAZP Open Data portal interface based on CKAN ‐ providing the possibility to search metadata and visualise all harmonised SK linked data resources87
● Machine readable GeoSparql API88 ● Web application interface supporting GeoSparql queries89
Visualizations will be supported with the extensions of LDVMi90.
87 http://data.sazp.sk/ 88 http://data.sazp.sk/parliament/sparql 89 http://data.sazp.sk/parliament/query.jsp 90 http://ldvm.net
D3.5 Final Data Harmonisation SmartOpenData project (Grant no.: 603824)
Version 1.0 Page 53 of 78 © SmartOpenData Consortium 2015
Figure 24: CKAN interface with the list of metadata for the open linked data from Slovak pilot
Figure 25: Parliament web application interface
D3.5 Final Data Harmonisation SmartOpenData project (Grant no.: 603824)
Version 1.0 Page 54 of 78 © SmartOpenData Consortium 2015
Observed benefits and limitations A key benefit of the RDF version of the SAZP datasets is that it is straightforward to combine it with third‐party datasets. In this way, large and rich datasets, such as Geonames.org, can be linked and additional features may be drawn from them to frame the original data into a broader context. During the activities on creation of the visualizations use of the Coordinate reference systems (CRS) has been identified as stumbling block. Even though the CRS was changed to a more common one, most visualization tools cannot directly reuse data projected in this CRS because the inverse order of coordinates is expected. This is also the case for OpenLayers 3 (http://openlayers.org/), the visualization library which has been used and required re‐projection of the coordinates on the client‐side. Ultimately, visualization of the data by projecting it on the map allowed for visual inspection that revealed errors in its coordinates, which were fixed subsequently. In this way, this exercise helped to improve the quality of the primary data. It turned out that transforming data and viewing it from different perspectives can detect errors and thus contribute to better data quality.
Recommendations & Future outlook When publishing the data adhering to common standards, such as the INSPIRE schemas, make it more reusable. In the case of SAZP datasets, standardization allowed to reuse parts of the GeoKnow XSL transformations that were made for INSPIRE‐compliant data without creating our own from scratch. This helped us learnt a similar lesson for the CRS. In order to improve reusability of geospatial data on the Web, it should be available at least in the WGS 84/Pseudo‐Mercator ‐ Spherical Mercator CRS, which is supported natively in most tools. When it comes to the formats for geographic geometries, it was identified that encoding them as Well‐Known Text (WKT) RDF literals offer a good trade‐off between granularity and data volume. Based on this experience further investigation will take place to identify, which datasets shall be extended in their coverage, which new ones will be the best candidates for further harmonisation as well as possible linking and enrichment with external third ‐ party linked data resources.
D3.5 Final Data Harmonisation SmartOpenData project (Grant no.: 603824)
Version 1.0 Page 55 of 78 © SmartOpenData Consortium 2015
2.3 Relational DB‐to‐RDF transformations
2.3.1 Czech pilot The goal of the czech pilot is the transformation of the NFI (National Forest Inventory) data from the relational database into an RDF/XML or a TURTLE and publish these LOD on the web. During the beginning of the SmartOpenData project the UHUL FMI was supposed to establish SPARQL endpoint and use on‐the‐fly transformation. However any suitable lightweight tool for the UHUL FMI production environment hadn't been found, one of the issues was the technological dependency on the java platform with most of available tools. The UHUL FMI has decided to use static transformation into the file in order to publish the data statically at least. The approach is not completely wrong, because the NFI data are created at one moment and are stable for approximately a year period.
Data model and storage The UHUL FMI uses PostgreSQL/PostGIS as a key component for data storage and also data analyses, using it on the side of the NFI source database and also the public data store allow us to replicate/copy the necessary data from the private database server to the public database server. So for the infrastructure two separate PostgreSQL databases are used, for the transformation itself the public database is used. This pilot description is focused on transformation of a data from the public database. The data model below represents the NFI type of information that is being published. In the middle is the main table t_nfi_estimate, which represents an estimate. Every estimate has its point estimate (a value), lower and upper limit (a confidence interval). The estimate is far more defined in lookup tables (a type, a unit of measure, an attribute filter, a geographic domain etc.), it could be for example forest cover in hectares in the Czech Republic divided by a forest owner etc. The relation to the geographic domain is important, because the UHUL FMI uses mostly the NUTS regions which are commonly used among partners across EU and moreover it appears as appropriate entity for linkage with other data sources. Another possible linkage are the NFI outcomes or attributes themselves, because in EU there are a lot of other countries providing the NFI outcomes same as the Czech republic and also initiatives, which try to define common attributes among them NFI's e.g. ENFIN91.
91 http://www.enfin.info/
D3.5 Final Data Harmonisation SmartOpenData project (Grant no.: 603824)
Version 1.0 Page 56 of 78 © SmartOpenData Consortium 2015
Figure 26: Czech pilot Data model
Relations, Links & Mappings In order to create proper links and define the NFI data in a broader space (internet) links (URLs) with the same meanings and definitions had to be found. Some of these URLs were sufficient for the NFI data, but it was also necessary to create specific vocabulary for the NFI “forest” attributes, which is not available on the internet. The UHUL FMI had created first draft of the NFI vocabulary in the RDF for this purpose, which possess short description of the estimates. However, it will be desirable to find responsible body, which will be taking care of this vocabulary. During SmOD we are expecting, that it will be the UHUL FMI. Example of the vocabulary, which will be available from http://nil.uhul.cz/lod/ns/nfi/ follows: @prefix nfi: <http://nil.uhul.cz/nfi.ttl> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . ... nfi:ObAvGrowingStockPerHa rdfs:subClassOf qb:Observation ;
D3.5 Final Data Harmonisation SmartOpenData project (Grant no.: 603824)
Version 1.0 Page 57 of 78 © SmartOpenData Consortium 2015
rdfs:label "Observed average growing stock per hectare"@en; rdfs:comment "The observed average growing stock in cubic metres per hectare."@en . nfi:ObForestOwner rdfs:subClassOf qb:Observation ; rdfs:label "Observed forest owner" ; rdfs:comment "The observed type of forest owner. Each observation is linked to the relevant owner type defined in http://nil.uhul.cz/lod/ns/fot" . nfi:ObGrowingStock rdfs:subClassOf qb:Observation ; rdfs:label "Observed growing stock"@en; rdfs:comment "The observed growing stock (in cubic metres) within the specified area."@en . ... … For visualisation the UHUL FMI also needs a geometric representation of a geographic domain and therefore on the webpage (http://nil.uhul.cz) there are also published NUTS regions in the WKT form. Of course there are some sources for the NUTS regions already available on the web, however the NFI uses own generalisation of the geometry for the map client. It is faster for a web map window to just use the geometry than try to generalize it dynamically on a client side for every request for the geometry representation. If there will be proper NUTS 3 geometry representation available on the web, then the vocabulary can be avoided. The vocabulary has following format and will be available on this URL: http://nil.uhul.cz/lod/ns/nuts/ . @prefix unit: <http://qudt.org/1.1/vocab/unit#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix nfi: <http://nil.uhul.cz/lod/nfi#> . @prefix geo: <http://www.opengis.net/ont/geosparql#> . <http://nil.uhul.cz/lod/geo/nuts#CZ032> a <http://www.w3.org/2015/03/inspire/au#AdministrativeUnit> ; rdfs:comment "CZ032 - Plzeňský" ; rdfs:label "CZ032" ; owl:sameAs <http://nuts.geovocab.org/id/CZ032> , <http://estatwrap.ontologycentral.com/dic/geo#CZ032> ; geo:asWKT "POLYGON((13.7657560325118 49.5140373364391,13.7478475772677 49.4868312489771, … Data published by the NFI are mostly statistical, therefore the UHUL FMI could use available mathematical and physical vocabularies for the data definition, e.g.:
● http://purl.org/NET/scovo#
● http://qudt.org/1.1/vocab/unit# And also vocabularies for the geographical relations and entities, some created and recommended during SmOD project:
● http://www.opengis.net/ont/geosparql# ● http://www.w3.org/2015/03/inspire/au#
D3.5 Final Data Harmonisation SmartOpenData project (Grant no.: 603824)
Version 1.0 Page 58 of 78 © SmartOpenData Consortium 2015
Transformation SmartOpenData technical meetings, documents and hackathons helped us with testing the tools suitable for presenting our outcomes. We tested the D2RQ and r2rml‐parser92 for publishing from our database of the NFI results and the Virtuoso for further data processing and visualisation. Nevertheless, it depends on several conditions if we will use the D2RQ for our data transformation in production environment, for example the D2RQ long‐term support, security, java technology support by the ministry of the agriculture etc. Data published now at http://nil.uhul.cz was created with r2rml‐parser. When the links had been set up (described in the previous chapter) the transformation could be done. For the transformation the mapping has been defined in R2RML syntax93. The data has not been translated from the native Czech language in the rdb database, therefore the language attribute had to be used. Below is example of the mapping file for an estimate of the forest cover: # # forest_cover # @prefix map: <#>. @prefix rr: <http://www.w3.org/ns/r2rml#>. @prefix au: <http://www.w3.org/2015/03/inspire/au>. @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>. @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>. @prefix owl: <http://www.w3.org/2002/07/owl#>. @prefix dc: <http://purl.org/dc/elements/1.1/>. @prefix geo: <http://www.opengis.net/ont/geosparql#>. @prefix nfi: <http://nil.uhul.cz/nfi#>. @prefix scovo: <http://purl.org/NET/scovo#>. @prefix unit: <http://qudt.org/1.1/vocab/unit#> . @prefix nfi: <http://nil.uhul.cz/lod/nfi#> . ### NIL database mappings map:spatial rr:logicalTable <#forest>; rr:subjectMap [ rr:template 'http://nil.uhul.cz/lod/nfi/forest_cover#{"id_result"}'; rr:class nfi:forest_cover; ]; rr:predicateObjectMap [ rr:predicate rdf:value; rr:objectMap [ rr:column "point_estimate";] ; ]; rr:predicateObjectMap [ rr:predicate scovo:max; rr:objectMap [ rr:column "upper_limit"] ; ]; rr:predicateObjectMap [ rr:predicate scovo:min; rr:objectMap [ rr:column "lower_limit"] ; ]; rr:predicateObjectMap [
92 https://github.com/nkons/r2rml‐parser 93 http://www.w3.org/TR/r2rml/
D3.5 Final Data Harmonisation SmartOpenData project (Grant no.: 603824)
Version 1.0 Page 59 of 78 © SmartOpenData Consortium 2015
rr:predicate unit:units; rr:objectMap [ rr:constant unit:Percent;] ; ]; rr:predicateObjectMap [ rr:predicate nfi:adomainId; rr:objectMap [ rr:column "adomain"] ; ]; rr:predicateObjectMap [ rr:predicate nfi:adomain; rr:objectMap [ rr:column "adomain_label" ; rr:language "cs"] ; ]; rr:predicateObjectMap [ rr:predicate nfi:adomain_description; rr:objectMap [ rr:column "adomain_description" ; rr:language "cs"] ; ]; rr:predicateObjectMap [ rr:predicate nfi:nfi_cycle; rr:objectMap [ rr:constant "2001 - 2004"; rr:termType rr:Literal;] ; ]; rr:predicateObjectMap [ rr:predicate geo:hasGeometry; rr:objectMap [rr:template 'http://nil.uhul.cz/lod/geo/nuts#{"gdomain_label"}'; rr:termType rr:IRI;] ; ]; . ... All transformation have been done using r2rml‐parser, which can also create a triple store with a dynamic connection to the database, however at this moment only one‐time transformation has been done. Final RDF/XML or TURTLE files are available from the http://nil.uhul.cz/lod/ns/* , where * represents name of a vocabulary , e.g. http://nil.uhul.cz/lod/nfi/forest_cover/ or http://nil.uhul.cz/lod/nfi/forest_cover.ttl . The estimates are presented in temporal cycles; therefore the outcomes can be compared between time periods. However user should always get default values, which are latest, but if someone needs elder data a link should look like e.g. http://nil.uhul.cz/lod/nfi/forest_cover/AGS2001‐2004.rdf, where AGS2001‐2004 stands for “Average Growing Stock during 2001‐2004 period”. In order to have also “a raw HTML” or human readable version of the estimates, an XSLT transformation has been used with the RDF/XML output. The forest cover can be also accessed from this link: http://nil.uhul.cz/lod/nfi/forest_cover.html . The NFI data are suitable for adoption the RDF Data Cube vocabulary described in Section 3. All estimates could be defined as observations and specified by dimensions; below there is an example of an estimate of the forest area divided according to the forest species: @prefix sa: <http://nil.uhul.cz/lod/nfi/species-area/> . @prefix qb: <http://purl.org/linked-data/cube#> . @prefix smod: <http://www.w3.org/2015/03/inspire/smod#> . @prefix scovo: <http://purl.org/NET/scovo#> . @prefix nuts: <http://nil.uhul.cz/lod/ns/nuts#> . @prefix ts: <http://nil.uhul.cz/lod/ns/species-area#> . @prefix sdmx-attribute: <http://purl.org/linked-data/sdmx/2009/attribute#> . @prefix unit: <http://qudt.org/1.1/vocab/unit#> . ... sa:ob4882 a qb:Observation, nfi:ObSpeciesArea ;
D3.5 Final Data Harmonisation SmartOpenData project (Grant no.: 603824)
Version 1.0 Page 60 of 78 © SmartOpenData Consortium 2015
qb:dataSet sa:SA2001-2004 ; nfi:refArea nuts:CZ0 ; nfi:cycle <http://reference.data.gov.uk/id/gregorian-interval/2001-01-01T00:00:00/P3Y> ; nfi:treeSpecies ts:ts2500 ; sdmx-attribute:unitMeasure unit:Hectare ; smod:areaHa "5586"^^xsd:double ; scovo:max "6833"^^xsd:double ; scovo:min "4339"^^xsd:double . … In order to model above data the NFI had to define terms in a vocabulary for forest species used, similar concept has been used for other estimates. Example below: @prefix ts: <http://nil.uhul.cz/lod/ns/species-area#> . @prefix skos: <http://www.w3.org/2004/02/skos/core#>. @prefix nfi: <http://nil.uhul.cz/lod/ns/nfi#> . ... ts:ts2500 a skos:Concept ; skos:prefLabel "DBC"@cs ; skos:prefLabel "Red oak"@en ; skos:definition "Dub červený"@cs ; skos:definition "Red oak"@en ; skos:notation "2500"^^nfi:UHULID ; skos:inScheme <http://nil.uhul.cz/lod/ns/species-area> ; skos:broader ts:ts6400 . ...
D3.5 Final Data Harmonisation SmartOpenData project (Grant no.: 603824)
Version 1.0 Page 61 of 78 © SmartOpenData Consortium 2015
3 Harmonising Observations and Measurements The final SMOD model suggests to adopt the RDF Data Cube vocabulary94 to encode in RDF environmental measurements and observations. In this section we illustrate application of the Data Cube framework on the example of a water quality measurement taken from the Italian pilot. We discuss third‐parties vocabularies as well as custom terms used to encode environmental measurements.
3.1 RDF Data Cube: Example Environmental observations are essentially numeric values accompanied with numerous attributes that allow to interpret and describe these values, e.g.:
“Average concentration of benzene in the water of the Sciaguana lake in 2013 was 0.1 µg/kg”
In the example above “0.1” is the observation value, which is by itself does not give us much information. However, if we consider its attributes, we can interpret the value:
● “benzene” ‐ what was measured? ● “concentration” ‐ what quality of benzene” was measured? ● “2013” ‐ when was it measured? ● “the Sciaguana lake” ‐ where was it measured? ● “µg/kg” ‐ in what units was it measured?
3.1.1 Data Cube Components Snippet below demonstrates how to encode the example observation in RDF using the RDF Data Cube approach:
<http://data.smartopendata.eu/WaterbaseLakes/HazardousSubstances/Observation/0> a qb:Observation ; dbpedia-owl:average "0.1"^^xsd:float ; arpa-components:hasObservedDeterminand <http://data.smartopendata.eu/WFD/Determinand/71-43-2> ; sdmx-attribute:unitMeasure <http://data.smartopendata.eu/WFD/UnitOfMeasure/9> ; sdmx-dimension:refPeriod <http://data.smartopendata.eu/WaterbaseLakes/HazardousSubstances/Dataset/> ; arpa-components:station <http://data.smartopendata.eu/WaterbaseLakes/so/Station/IT19LW09453> .
In terms of Data Cube properties highlighted in bold are called components. In order to represent pilots’ observations, we re‐used components defined by the Statistical Data and
94 http://www.w3.org/TR/vocab‐data‐cube/
D3.5 Final Data Harmonisation SmartOpenData project (Grant no.: 603824)
Version 1.0 Page 62 of 78 © SmartOpenData Consortium 2015
Metadata eXchange (SDMX) code lists95. Table below summarises SDMX properties and properties from other vocabularies used in pilots’ data:
Prefix Namespace Terms used
sdmx-dimension http://purl.org/linked‐data/sdmx/2009/dimension# sdmx-dimension:refPeriod
sdmx-attribute http://purl.org/linked‐data/sdmx/2009/attribute# sdmx-attribute:unitMeasure
dbpedia-owl http://dbpedia.org/ontology/ dbpedia-owl:min dbpedia-owl:max dbpedia-owl:average dbpedia-owl:mean
dcterms http://purl.org/dc/terms/ dcterms:subject dcterms:source
schema http://schema.org/ schema:minValue schema:maxValue
In addition to the components presented in the table above, custom components have been defined for the Italian and Portuguese‐Spanish pilots:
Prefix Namespace Terms used
arpa-components
http://smod‐fp7.github.io/components/arpa‐components.ttl arpa-components:basePhenomenonarpa-components:hasObservedDeterminand arpa-components:station arpa-components:numberOfSamples
tragsa-components
http://smod‐fp7.github.io/components/tragsa‐components.ttl tragsa-components:workUnit
Components Values Whenever possible, values of components have been encoded via existing SKOS concepts schemes or other resources. Values of sdmx-dimension:refPeriod Temporal aspect of measurements was represented using the reference time URI set developed by data.gov.uk. For example, in the Portuguese‐Spanish pilot climatology measurements are aggregated over several years, 1981‐2010. We encoded this time period using the following pattern:
<http://reference.data.gov.uk/id/gregorian-interval>/<start-datetime>/P<n-of-years>Y Hence, the URI of the period of time that corresponds to 21 years starting from 1981 looks as follows:
<http://reference.data.gov.uk/id/gregorian-interval/1981-01-01T00:00:00/P21Y> 95 SDMX guidelines contain standard code lists that are intended to be generic and reusable across various datasets.
D3.5 Final Data Harmonisation SmartOpenData project (Grant no.: 603824)
Version 1.0 Page 63 of 78 © SmartOpenData Consortium 2015
Values of sdmx-attribute:unitMeasure Table below summarises values of Units of Measurement (UoM) used in the Portuguese‐Spanish pilot:
Measurement Unit of Measurement URI
Average Annual Rainfall Level Runoff
mm^3 qudt-unit:CubicMillimeter
Slope Annual Humidity Level
percent qudt-unit:Percent
Average/Minimum/Maximum Annual Temperature degrees Celsius qudt-unit:DegreeCelsius
Annual Evapotranspiration Level mm qudt-unit:Millimeter
Annual Radiation Level Kcal/cm^2 qudt-unit:KilocaloriePerSquareCentimeter
Annual Insolation Level sun hours per year qudt-unit:NumberPerYear
In the Italian pilot, unit of measurements are defined by EEA code list96. We have transformed this list into an RDF vocabulary defining each unit of measurement as an instance of the qudt-unit:Unit class. For example, below is the definition of μg/l:
<http://data.smartopendata.eu/WFD/UnitOfMeasure/9> a qudt:Unit ; rdfs:label "μg/l" ; rdfs:comment "microgrammes per liter" .
The complete code list is published together with the Portuguese‐Spanish data97. Values of arpa-components:hasObservedDeterminand Values of the observed determinand are also defined in the EEA code list98. We have transformed it into an RDF vocabulary, defining every compound as an instance of the custom class smod:Determinand, e.g.:
<http://data.smartopendata.eu/WFD/Determinand/71-43-2> a smod:Determinand ; rdfs:label "Benzene" .
The complete code list is published together with the Portuguese‐Spanish data99.
96 http://dd.eionet.europa.eu/dataelements/48239 97 https://s3‐eu‐west‐1.amazonaws.com/smod‐repo/tragsa‐release3/rdf.zip 98 http://dd.eionet.europa.eu/datasets/latest/Groundwater/tables/HazSubstGW_Disagg/elements/DeterminandCode 99 https://s3‐eu‐west‐1.amazonaws.com/smod‐repo/tragsa‐release3/rdf.zip
D3.5 Final Data Harmonisation SmartOpenData project (Grant no.: 603824)
Version 1.0 Page 64 of 78 © SmartOpenData Consortium 2015
3.1.2 Data Cube Datasets Data Cube finalises definition of observations by specifying which dataset each observation belongs to. It is done through the property qb:dataset, as shown below for the running example:
<http://data.smartopendata.eu/WaterbaseLakes/HazardousSubstances/Observation/0> qb:dataSet <http://data.smartopendata.eu/WaterbaseLakes/HazardousSubstances/Dataset/> . <http://data.smartopendata.eu/WaterbaseLakes/HazardousSubstances/Dataset/> a qb:DataSet .
Like observations, a dataset may contain components as well:
<http://data.smartopendata.eu/WaterbaseLakes/HazardousSubstances/Dataset/> rdfs:comment "Aggregated data on hazardous substances reported in the Waterbase - Lakes dataset of the European Environmental Agency."@en ; dcterms:source <http://www.eea.europa.eu/data-and-maps/data/waterbase-lakes-10> ; dcterms:subject <http://www.eionet.europa.eu/gemet/concept/9214> ; arpa-components:basePhenomenon "concentration".
The values of the dataset’s components hold for all the observations of the dataset. Thus, for example, we know the example observation is from the dataset that is available at http://www.eea.europa.eu/data‐and‐maps/data/waterbase‐lakes‐10.
3.1.3 Data Cube Structures When define, the components are grouped into structures, e.g.:
<http://data.smartopendata.eu/WaterbaseRivers/HazardousSubstances/DSD/> a qb:DataStructureDefinition ; rdfs:comment "Data structure definition for hazardous substances reported in the Waterbase - Rivers dataset of the European Environment Agency and used in the Italian pilot of the SmartOpenData project, http://www.smartopendata.eu/"@en ; qb:component [qb:attribute dcterms:subject ; qb:componentAttachment qb:DataSet ] ; qb:component [qb:attribute dcterms:source ; qb:componentAttachment qb:DataSet ] ; qb:component [qb:attribute arpa-components:basePhenomenon ; qb:componentAttachment qb:DataSet ] ; qb:component [qb:attribute sdmx-attribute:unitMeasure] ; qb:component [qb:attribute arpa-components:hasObservedDeterminand ] ; qb:component [qb:attribute arpa-components:station ] ; qb:component [qb:measure dbpedia-owl:average ] ; qb:component [qb:dimension sdmx-dimension:refPeriod ] .
Structures have two main objectives. Firstly, they allow to change the default (qb:Observation) level of attachment of a component. In other words, one can specify whether the value of a component is specific to each observation or it can be generalised over a dataset. Secondly, such structures can be re‐used across similar datasets. For example, we used the structure from the snippet above for the Waterbase ‐ Lakes dataset:
D3.5 Final Data Harmonisation SmartOpenData project (Grant no.: 603824)
Version 1.0 Page 65 of 78 © SmartOpenData Consortium 2015
<http://data.smartopendata.eu/WaterbaseLakes/HazardousSubstances/Dataset/> qb:structure <http://data.smartopendata.eu/WtaerbaseLakes/HazardousSubstances/DSD/> .
… and for the Waterbase ‐ Rivers dataset:
<http://data.smartopendata.eu/WaterbaseRivers/HazardousSubstances/Dataset/> qb:structure <http://data.smartopendata.eu/WtaerbaseRivers/HazardousSubstances/DSD/> .
Definitions of the datasets and structures of the Italian and Portuguese‐Spanish pilots are available at http://smod‐fp7.github.io/dsd/arpa‐dsd‐dataset.ttl and http://smod‐fp7.github.io/dsd/tragsa‐dsd‐dataset.ttl correspondingly.
D3.5 Fina
Version 1
4 Co In this dmodel. domainintersecwhich ionly, su
Voc
SmOD Site
SmOD
SmOD AdminUnits
SmOD GeograRegion
SmOD Distrib
SmOD Land C
SmOD EnviroMonitoFacility
SmOD Parcels
SmOD Vocab
al Data Harmo
1.0
onclusi
deliverable The mode
ns of the ction betwes relevant tuch as Envir
cabulary
Protected
Land Use
nistrative
Bio‐aphical ns
Species bution
Corine Cover
nmental oring y
Cadastral s
Custom ulary
onisation
ion
we reporteel is based pilots condeen the domto most of onmental M
Italian Pi
Page 66 o
ed on final ion severa
ducted witmains of thethe pilot. HMonitoring
lot PortuSpani
of 78
iteration of al INSPIRE thin the pre pilots, forHowever, thFacility and
uguese‐ish Pilot
C
SmartOp
© S
the task ofthemes tharoject. As r example, ihere are INSd Cadastral P
Czech Pilot
penData proje
SmartOpenDa
f data harmat were chTable 8 shn the topicSPIRE topicsParcel.
Slovak P
ect (Grant no.
ata Consortium
monisation thosen to rehows, therc of protectes used by o
Pilot Iris
: 603824)
m 2015
o SmOD epresent re is an ed sites, one pilot
sh Pilot
D3.5 Fina
Version 1
Vocabthird p
Own vocabu
But in awere http://wpropertdiscusseOther p
●
●
Overall,standardependdatasettransfocompliamodel cWe beliexamplcatchmfeaturefrom threlated TechnicdevisedRDF. We pretransfothe movarious the GUIour dat
al Data Harmo
1.0
ularies of parties
ularies
all the pilotaggre
www.w3.orties that weed them in pilots publisCzech Pilohttp://nil.uSlovak Pilohttps://dat
, our obserds facilitatding on thets, such armations, bant datasetcomparing tieve that ines of suchent area in of interesthe EEA Watto developcal contribud and imple
esented resrmations aoment. On data pre‐pI of the RDFta. We repo
onisation
ts the modegated rg/2015/03/ere defineddetail in seshed their cot ‐ the Nuhul.cz/lod/ot ‐ SK SmOta.sazp.sk/v
rvation is ted the pre settings as Slovak, but also toos. Other pilto the requ some caseh cases ara water bat being monterbase datment of theutions of thmented in
sults of usnd comparthe one haprocessing sF plugin enaorted on sev
Page 67 o
Table 8: V
el has beeninto
/inspire/sm in the scopctions dedicustom termNFI vocabunfi/) OD Environmocab/conta
that a comocess of dnd requiremnot only k advantageots, such aired domaines custom tee smod:csin, and smnitored by tabases. Thee SmOD moe data harmdifferent pi
sing the RDed it to Graand, rich fusteps and pabled intuitiveral challe
of 78
Vocabulary usa
n extended a
mod#, whipe of the Itcated to eams in their nulary of fo
mental buraminated‐si
mmon datadata harmoments of eused the es of the tras Portuguesn extensionerms could catchmentmod:featuthe Environese cases sodel. monisation ilots: CSV‐to
DF plugin fafterizer, a unctionalityprepare dative and inteenging cases
SmartOp
© S
age by pilot
with customSmOD ch currentalian and Pch of the piamespacesorest attri
rdens / Contes/
a model baonisation toeach pilot. model as ansformatiose‐Spanish,n. be found intName, thureName, mental Fachould be co
task includo‐RDF, XML
for OpenRetool that i
y of OpenRa for RDF meractive cons in Annex
penData proje
SmartOpenDa
m terms. Thcustom
ntly contaPortuguese‐lot. : butes (wil
ntaminated
ased on tho a greatePilots with a target
on tools tha used a sm
n other INSat specifiethat specifiility. Both ponsidered f
de three difL‐to‐RDF an
efine to pes being actRefine allowmappings. Onstruction oA, but over
ect (Grant no.
ata Consortium
hese customm vocains classe‐Spanish pil
l be avail
sites voca
e existing r or lesserINSPIRE‐coschema f
at exist for Iall fragmen
PIRE themees the namies the namproperties ofor the futu
fferent appd Relationa
erform CSVtively develwed us to On the otheof RDF skelerall we man
: 603824)
m 2015
m terms cabulary es and lots. We
able at
bulary ‐
INSPIRE r extent ompliant for RDF INSPIRE‐nt of the
es. Good me of a me of the originate ure work
proaches al DB‐to‐
V‐to‐RDF oped at perform er hand, etons for naged to
D3.5 Final Data Harmonisation SmartOpenData project (Grant no.: 603824)
Version 1.0 Page 68 of 78 © SmartOpenData Consortium 2015
perform all the required transformations. Perhaps, the weakest side of this approach is its scalability. Although, we didn’t hit this limitation, we are aware the existing issues with memory usage by OpenRefine100. Grafterizer is presented as an alternative solution which is being actively developed and already provides several features not available in OpenRefine, such as reusing utility functions in one transformation, changing operation order and editing transformation operation. In the scope of the Slovak pilot the existing XML‐to‐RDF transformations produced by the GeoKnow project were customised to use the SmOD vocabularies and modified for usage within the Unified Views ETL framework101. Finally, in the settings of the Czech pilot, transformations of data from Relational Database to RDF were covered. The D2RQ and r2rml‐parsers are being evaluated for a definitive solution.
100 https://github.com/OpenRefine/OpenRefine/wiki/FAQ:‐Allocate‐More‐Memory 101 https://github.com/jindrichmynarz/TripleGeo/tree/sazp/xslt
D3.5 Final Data Harmonisation SmartOpenData project (Grant no.: 603824)
Version 1.0 Page 69 of 78 © SmartOpenData Consortium 2015
5 References [SMODD33] SmartOpenData EU/FP7 project, Report on the Initial Data Harmonisation D3.3 publicly available at http://www.smartopendata.eu/sites/default/files/SmartOpenData_D3.3_Initial_Data_Harmonisation.pdf [SMODD32] SmartOpenData EU/FP7 project, Report on the Initial SmartOpenData Model D3.2 publicly available at http://www.smartopendata.eu/sites/default/files/SmartOpenData_D3.2_Initial%20Data%20Model.pdf [SMODD34] SmartOpenData EU/FP7 project, Report on the Final SmartOpenData Model D3.4 to be published at the website of the project http://www.smartopendata.eu/public‐deliverables [SMODD52] SmartOpenData EU/FP7 project, Report on the First Iteration of pilots, D5.2 to be pubslihed at the website of the project http://www.smartopendata.eu/public‐deliverables [HM08] Halpin, Terry; Morgan, Tony (March 2008), Information Modeling and Relational Databases: From Conceptual Analysis to Logical Design (2nd ed.), Morgan Kaufmann, ISBN 978‐0‐12‐373568‐3 [SMODD31] SmartOpenData EU/FP7 project, Review of geographic resources metadata and related metadata standards D3.1 publicly available at http://www.smartopendata.eu/sites/default/files/SmartOpenData%20D3.%201%20Review%20of%20geographic%20resources%20metadata%20and%20related%20metadata%20standards.pdf
D3.5 Fina
Version 1
AnnCha
LanguaIn both out of aus to sp
We hav
●
● First opNATURAabout Pmembesite:
We cretable wsubset
al Data Harmo
1.0
nex A: Gllenge
age Tag CItalian anda column thpecify only o
ve identifiedsplit input data slice customise l
ption was A2000SITESProtected Ser states lan
eated RDF mwith Sicilian could be ge
onisation
Geners and
ustomisat Portugueshat containsone languag
Figure
d two possibdataset ho
literal node
applied toS table of ites in all Enguages. Th
<protec
mappings wsites only. Wenerated by
Page 70 o
ating RSolutio
tion e pilots we s data in mge tag per o
e 27: RDF plu
ble options orizontally b
e
o produce the NATURU membershe goal is to
ctedsiteURI>
with “it” lanWe used Oy pre‐proce
of 78
RDF wons
faced the nultiple langone Literal n
gin of OpenR
to generateby language
RDF of thRA2000 dats. Names, do produce t
> rdfs:label
nguage tag penRefine tssing input
SmartOp
© S
with Op
need to genguages. RDFnode, as sho
Refine, langua
e RDF with e and run R
he Sicilian tabase. Thidescriptionsthe followin
l <SITENAME>
and run thto create sudata befor
penData proje
SmartOpenDa
penRef
nerate languF plugin for own in Figur
ge tag
multiple lanRDF transfo
Protected is table cos and alike ng RDF tripl
^^@it .
hem on a such a subsee uploading
ect (Grant no.
ata Consortium
fine:
uage taggedOpenRefinre 1.
nguage tagsormations f
Sites out ntains infoare all givele for every
subset of thet. Even thog it in Refin
: 603824)
m 2015
d literals e allows
s: for each
of the ormation en in the y Sicilian
he input ough the e, doing
D3.5 Fina
Version 1
it in ReRDF maSecond adminisexcerptmunicip
The goa
If we sptwo trip
Second In ordethe mu
One canothingthe run
al Data Harmo
1.0
fine allows appings. option we strative unit from the palities.
al is to gene
pecify “sp” ples:
<http://dataBaña"@sp . <http://data"Soure"@sp .
triple is obr to tell Renicipality is
n see in thg is output. ning examp
<http://dataBaña"@sp .
onisation
one to sto
used in theits from bomunicipali
Figure 28
erate a triple
<munici
as the lang
a.smartopend
a.smartopend.
viously not efine to genSpanish, w
Figure 29: R
he preview With this mple:
a.smartopend
Page 71 o
ore all the p
e Portuguesoth Portugaties datase
8: Excerpt from
e of the foll
ipalityURI> r
uage tag (a
data.eu/sp-p
data.eu/sp-p
correct. nerate “sp” e can custo
DF plugin of O
that the cmapping, on
data.eu/sp-p
of 78
pre‐process
e‐Spanish pal and Spaiet with Spa
m the aux_04
lowing form
ramon:name <
s shown in
pt-pilot/so/
pt-pilot/so/
tagged liteomise the va
OpenRefine, l
condition fanly one trip
pt-pilot/so/
SmartOp
© S
ing steps in
pilot to genen. For examanish “A Ba
40400_munici
m for every
<name>@<lang
Figure 3), w
/Municipalit
/Municipalit
erals out of alue of the
literal node cu
ails on the ple is produ
/Municipalit
penData proje
SmartOpenDa
n one place
erate RDF omple, Figuraña” and P
ipality.csv
municipality
g-tag> .
we will gene
y/ES11101000
y/PT16211061
the name Literal node
ustomisation
Portugueseced out of
y/ES11101000
ect (Grant no.
ata Consortium
e together w
out of datasre 28 illustrPortuguese
y of the dat
erate the fo
07> ramon:na
15> ramon:na
column one as follows
e municipathe two re
07> ramon:na
: 603824)
m 2015
with the
sets with rates an “Soure”
taset:
ollowing
ame "A
ame
ly when s:
ality and cords of
ame "A
D3.5 Fina
Version 1
In ordemappinthe valu
As a res
RDF ouIn the Observa
Each obfrom th
Column“ES1110animal.“specie
More tWith RDsame inPortuguother
al Data Harmo
1.0
r to outputng of the “nue of the “n
sult, we will
<http://data"Soure"@pt .
ut of a LisPortugueseatoryTile:
<observatory
bservatory he input file
n “speciesC010007010 In order tosCode” colu
than one DF plugin fonput dataseuese‐Spanisinstances
onisation
t similar tripname” columname” colum
if(cell
l generate s
a.smartopend.
t of Valuee‐Spanish p
ytileURI> sm
tile can supwith obser
Figu
Code” conta010000100o do so, weumn when
forEach(v
Root Nodor OpenRefet. We triesh pilot. For
defines
Page 72 o
ple for the mn to the lmn as follow
ls[“idState”
second tripl
data.eu/sp-p
es pilot we ha
mod:supports
pport multirvatory tiles
re 30: Excerpt
ains a list 01”. For eace applied thmapping it
value.split("
des ine it is posed this funcr example, instances
of 78
Portugueseliteral taggews:
”].value ==
e for the Po
pt-pilot/so/
ad to gene
s <animalspe
iple animal s:
t from Observ
of all the ch value in the followinto the anim
"~~~"), v, "
ssible to genctionality o“Work Unis of tw
SmartOp
© S
e municipalied with the
“PT”, value
ortuguese m
/Municipalit
rate the fo
eciesURI> .
species, as
vationTiles.cs
species suthe list we wg custom e
mal species n
"AnimalSpeci
nerate morof OpenReft Location”
wo classes
penData proje
SmartOpenDa
ity, we nee language “
, null)
municipality
y/PT16211061
ollowing RD
s shown in
sv file
upported bwant to conexpression tnode:
ies/" + v)
e than one ine to tran” model (ses: smod:
ect (Grant no.
ata Consortium
d to add on“pt” and cu
y:
15> ramon:na
DF triple fo
the excerp
by the tile nstruct a URto the valu
root nodesnsform dataee Annex B):WorkUnit
: 603824)
m 2015
ne more ustomise
ame
or every
pt below
with id RI of the e of the
s for the a of the ) among t and
D3.5 Fina
Version 1
cp:Capd_060We courelation
The prosince thduplicatdatasetThe RD
Currentsolution
102 Refer
al Data Harmo
1.0
dastralP04_workuniuld generatnship, as illu
oblem withhe same pates of parct. For examF of these r
<http://datasmod:WorkUni gsp:pilot/so/Par<http://datacp:Cadastral<http://datasmod:WorkUni gsp:pilot/so/Par<http://datacp:Cadastral
tly, there isn was to ma
r to Section 2.
onisation
Parcel. on.wkt102, ate work unustrated in F
Figu
this solutiorcel can cocel instanceple, there arecords will
a.smartopendit ; :sfWithin <hrcel/ES11333a.smartopendlParcel . a.smartopendit ; :sfWithin <hrcel/ES11333a.smartopendlParcel .
s no way iap instances
1.2, “Data Pre
Page 73 o
Input datand containnit and parFigure 31.
re 31: Excerpt
on is that tntain morees, as manyare two reclook as foll
data.eu/sp-p
http://data.s300010101400data.eu/sp-p
data.eu/sp-p
http://data.s300010101400data.eu/sp-p
n the RDF s of parcels
e‐Processing”
of 78
taset for ns two columrcel instanc
t from Observ
the input de than one wy as there acords with “ows:
pt-pilot/so/
smartopendat0001> . pt-pilot/so/
pt-pilot/so/
smartopendat0001> . pt-pilot/so/
plugin to s in a separa
for more info
SmartOp
© S
this momns: “idWoces togethe
vationTiles.cs
dataset contwork units.are duplica“idParcel” =
/WorkUnit/ES
ta.eu/sp-pt-
/Parcel/ES11
/WorkUnit/ES
ta.eu/sp-pt-
/Parcel/ES11
generate sate project.
ormation abou
penData proje
SmartOpenDa
odel was rkUnit” ander with the
sv file
tains duplic As a resulttes of “idP= “ES11333
113330001010
-
333000101014
113330001010
-
333000101014
uch instanc
ut data pre‐pr
ect (Grant no.
ata Consortium
generatedd “idParcel”e gsp:sfW
cates of “idt, the RDF cParcel” in th000101014
01400001001>
400001> a
01400001002>
400001> a
ces just on
rocessing
: 603824)
m 2015
d from ”. Within
dParcel”, contains he input 400001”.
> a
> a
nce. Our
D3.5 Fina
Version 1
AnnRDF Figures SpanishDocumefp7.githavailabl
Chem
al Data Harmo
1.0
nex B: PF Mode
below illuh pilot and tentation hub.io/tragsle at http://
mical Cha
onisation
Portugels
ustrate ORMtheir translaof the sa3/orm/Ob/smod‐fp7.g
racterist
Figu
Figu
Page 74 o
guese‐
M models ation to RDFORM mo
bjectTypeLisgithub.io/tr
tics
re 32: Chemic
ure 33: Chemi
of 78
Spanis
developed F(S). odels is st.html, anragsa3/orm/
cal Characteri
ical Character
SmartOp
© S
sh pilo
for the 3d
availablend the con/Constraint
istics: ORM M
ristics: RDF m
penData proje
SmartOpenDa
ot: ORM
d release o
online nstraint vatValidationR
Model
odel
ect (Grant no.
ata Consortium
M and
of the Port
at http:alidation reReport.htm
: 603824)
m 2015
tuguese‐
//smod‐eport is l
D3.5 Fina
Version 1
Clima
al Data Harmo
1.0
atology
onisation
Page 75 o
Figure 34: C
Figure 35:
of 78
Climatology: O
Climatology:
SmartOp
© S
ORM Model
RDF Model
penData proje
SmartOpenDa
ect (Grant no.
ata Consortium
: 603824)
m 2015
D3.5 Fina
Version 1
Fores
Geom
al Data Harmo
1.0
stry Tile
metry
onisation
Page 76 o
Figure 36: F
Figure 37: F
Figure 38:
of 78
Forestry Tile:
Forestry Tile
Geometry: O
SmartOp
© S
ORM Model
RDF: Model
ORM Model
penData proje
SmartOpenDa
ect (Grant no.
ata Consortium
: 603824)
m 2015
D3.5 Fina
Version 1
Work
al Data Harmo
1.0
k Unit Eco
onisation
osystem
Fig
Fig
Page 77 o
Figure 39:
gure 40: Work
gure 41: Work
of 78
: Geometry: R
k Unit Ecosyst
k Unit Ecosyst
SmartOp
© S
RDF Model
tem: ORM Mo
tem: RDF Mo
penData proje
SmartOpenDa
odel
del
ect (Grant no.
ata Consortium
: 603824)
m 2015
D3.5 Fina
Version 1
Work
al Data Harmo
1.0
k Unit Loc
onisation
cation
Fi
F
Page 78 o
gure 42: Wor
Figure 43: Wo
of 78
rk Unit Locatio
ork Unit Locat
SmartOp
© S
on: ORM Mod
ion RDF mod
penData proje
SmartOpenDa
del
el
ect (Grant no.
ata Consortium
: 603824)
m 2015