Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee.
Project Acronym: DataBio
Grant Agreement number: 732064 (H2020-ICT-2016-1 – Innovation Action)
Project Full Title: Data-Driven Bioeconomy
Project Coordinator: INTRASOFT International
DELIVERABLE
D4.3 – Data sets, formats and models (Public version)
Dissemination level PU -Public
Type of Document Report
Contractual date of delivery M20 – 31/8/2018
Deliverable Leader SINTEF
Status - version, date Final – v1.0-Public, 12/12/2018
WP / Task responsible WP4 (T4.5 and T4.6)
Keywords: data set, metadata, datastream
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 2
Executive Summary
The D4.3 document starts with an introduction to the DataBio project and other documents
related to D4.3 followed by an introduction to data sharing and data economy in the context
of DataBio.
The FAIR principle is introduced as a foundation for data finding, access, interoperability and
reuse - and as a further motivation for meta data and discovery of datasets through data
registries, in particular the DataBio Hub. It is also options for further support for data sharing
and data exchange in particular through the use of linked data and industrial data platforms
for data sharing and data exchange.
The context of datasets in DataBio, is presented including external drivers for data sharing
and data exchange, stakeholders and license models. Data interoperability through
ontologies, models, formats and standards and data access through standard services and
APIs is introduced related to the DataBio standardisation engagement in particular in the
Geospatial and Earth Observation areas.
Furthermore, an overview of the requirements for datasets and datastreams in DataBio
grouped by pilots and the platform itself is presented. This is followed by a detailed
description of the datasets in DataBio in a metadata template from the description of the
datasets in the DataBio hub, for existing, improved, new and other relevant datasets. The final
section gives an example of how a dataset can be used for application development, followed
by concluding remarks.
The deliverable also comprises contributions from WP5 on the EO Datasets and from the tasks
T4.5 Big Data Variety Management and T4.6 Data Acquisition with Security support in WP4.
The first phase of the DataBio project has focused on the usage and creation of datasets based
on the needs and requirements of the DataBio pilots. The next phase will continue with this,
but will also have increased focus on interoperability aspects of datasets through the use of
ontologies and potential standard data models and access mechanisms/services and APIs.
There will be an increased focus on secure data sharing and data exchange beyond the
individual pilots to support a growing data economy in the DataBio areas of agriculture,
forestry and fishery.
Relation with Other DataBio Platform Deliverables The DataBio project includes three piloting work packages (WP1-3) and two related platform work packages (WP4 handling data in general including IoT data and WP5 5 focusing on Earth Observation and geospatial data) that support the pilots (Figure 1). The DataBio platform provides Big Data capabilities to the pilots by forming software pipelines of components
This is a public version of Deliverable D4.3 “Data sets, formats and models”.
Confidential information from the original document has been omitted.
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 3
through which data flows from the sources in agriculture, forestry and fishery through data management, analytics, and visualization stages in the pilots.
Figure 1: Work packages and their roles in DataBio
The platform developed in DataBio is described in the Deliverables D4.1, D4.2, D4.3 (WP4) and D5.1, D5.2, D5.3 (WP5) (Figure 1). Deliverables D4.1-3 define the Milestone M7 Service ready for Trial 1, whereas Deliverables D5.1-3 define the Milestone M9 EO Services ready for integration. The platform services and pipelines have been in trials since April 2018 (M16). More specifically, the public deliverable D4.1 Platforms and interfaces describes the software components to be utilized by the pilots. Most of components are already in use in the first pilot trials. In addition, this deliverable reports the outcome of a matchmaking process, in which the pilots selected which components to deploy in their pilots.
Deliverable D4.2 Services for tests builds on D4.1 and provides an overview of the component pipelines as identified at month 16 (M16) of the project. It also provides guidelines for successful implementation and deployment of the pipelines. This deliverable, D4.3 Datasets, formats and models is due at the end of August 2018. While the two earlier reports deal with software modules, this report will focus on the data sets and streams employed in DataBio. Data formats, standards and models enabling easy findability, access, interoperability, and reusability of data (FAIR principle) will be dealt with. Thus, in this deliverable we will address topics beyond the coverage of single pilots. Deliverable D5.1 EO component specification includes an analysis of the EO dataset and component related requirements provided by the pilots. It was published in end of 2017 and contains an overview of best practices of EO access and initial component and dataset requirements based on the DataBio pilot needs.
Components &
IoT datasets
Agro Pilot 1
Agro Pilot 2
Agro Pilot 13
Forest Pilot 1
Forest Pilot 2
Forest Pilot 7
Fishery Pilot 1
Fishery Pilot 2
Components &
Earth Observation
datasets
WP4
WP1-3
Fishery Pilot 6
...
...
...
DataBio platformwith big data components
and datasetsWP5
Deliverables
D4.1, D4.2, D4.3
Milestone M7
Deliverables
D5.1, D5.2,
D5.3
Milestone M9
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 4
Deliverable D5.2 EO component and interfaces describes, building on D5.1, the Earth Observations component pipelines similarly as D4.2 does for IoT components. It also includes examples of data experimentations with the pipelines. Deliverable D5.3 EO services and tools builds on 5.1 and 5.2 and describes how the technical components from DataBio can be scaled-up to services and tools that are installed as Software as a Service (SaaS) or on premise. It further provides the information how and under which conditions these services and tools can be externally accessed.
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 5
Deliverable Leader: Arne-Jørgen Berre (SINTEF)
Contributors:
Ståle Walderhaug (SINTEF),
Pekka Siltanen (VTT), Caj Södergård (VTT),
Miguel Ángel Esbrí (ATOS), Javier Hitado Simarro (ATOS),
Ephrem Habyarimana (CREA),
Iason Kastanis (CSEM), Margus Freudenthal (CYBER),
Allan Aasbjerg Nielsen (DTU), Marco Corsi – (e-geos),
Kostas Akasoglou (EXUS), Ioannis Komnios (EXUS),
Adamantios Maragkos (EXUS), Anuj Sharma (EXUS),
Charikleia Stefanou (EXUS), Dimitris Vassiliadis (EXUS),
Petr Lukes (FMI), Eva Klien (Fraunhofer),
Ivo Senner (Fraunhofer), Fabiana Fournier (IBM),
Inna Skarbovsky (IBM), Christian Zinke (InfAI),
George Bravos (INTRASOFT),
Vassilis Chatzigiannakis (INTRASOFT),
Karel Charvat (LESPRO), Karel Charvat, jr (LESPRO),
Tomas Reznik (LESPRO), Anu Kosunen (METSAK),
Virpi Stenman (METSAK), Seppo Huurinainen (MHGS),
Veli-Matti Plosila (MHGS), Panagiotis Elias (NP),
Kostas Karalas (NP), Stamatis Krommidas (NP),
Kostas Mastrogiannis (NP), Natassa Miliaraki (NP),
Ilias Panos (NP), Menelaos Perdikeas (NP),
Savvas Rogotis (NP), Pavlos Tsagkis (NP),
Marco Folegani (MEEO), Ingo Simonis (OGCE),
Soumya Brahma (PSNC), Raul Palma (PSNC),
Juliusz Pukacki (PSNC), Jarkko Vähäkangas (Senop),
Andrey Sadovykh (Softeam),
Marc Gilles (Spacebel), Yves Coene (Spacebel),
Anca Liana Costea (TerraS), Adrian Stoica (TerraS),
Delia Teleaga (TerraS), Jesus Estrada Villegas (TRAGSA),
Asuncion Roldan Zamarron (TRAGSA), Michal Kepka (UWB),
Karel Jedlička (UWB), Tomas Mildorf (UWB),
Erwin Goor (VITO), Jarmo Kalaoja (VTT),
Tuomas Paaso (VTT), Kari Rainio (VTT), Renne Tergujef (VTT)
Reviewers:
Per Gunnar Auran (SINTEF Fishery)
Tomas Mildorf (UWB)
Virpi Stenman (METSAK)
Approved by: Athanasios Poulakidas (INTRASOFT)
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 6
Document History
Version Date Contributor(s) Description
0.1 05.06.2018 Ståle Walderhaug Initial ToC
0.2 21.06.2018 Ståle Walderhaug /
Arne J. Berre ToC with section assignments
0.3 01.08.2018 Datasets included from partners
0.4 15.08.2018 Adrian Stoica,
Terrasigna
D5.i2 datasets included. Added FAIR data.
Examples of use included.
0.5 20.08.2018 Ståle Walderjaug Updated with license policy information.
Added concerns section
0.6 24.08.2018 Caj Södergård, Ståle
Walderhaug Requirement in place. Datasets updates
0.7 28.08.2018
Arne J. Berre, Ståle
Walderhaug, Caj
Södergård
Version for internal review
0.8 31.08.2018 Ståle Walderhaug,
Arne J Berre Version updated after internal review
1.0 31.08.2018 Athanasios
Poulakidas Final version for submission
1.0-
Public 12.12.2018 Caj Södergård Public version of the document
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 7
Table of Contents EXECUTIVE SUMMARY ..................................................................................................................................... 2
RELATION WITH OTHER DATABIO PLATFORM DELIVERABLES ........................................................................................... 2
TABLE OF CONTENTS ........................................................................................................................................ 7
TABLE OF FIGURES ........................................................................................................................................... 9
LIST OF TABLES ................................................................................................................................................ 9
DEFINITIONS, ACRONYMS AND ABBREVIATIONS ........................................................................................... 10
INTRODUCTION .................................................................................................................................... 15
1.1 PROJECT SUMMARY ..................................................................................................................................... 15 1.2 DOCUMENT SCOPE ...................................................................................................................................... 17 1.3 DOCUMENT STRUCTURE ............................................................................................................................... 17
BACKGROUND ...................................................................................................................................... 19
2.1 DATA SHARING AND DATA ECONOMY IN DATABIO ............................................................................................. 19 2.2 FAIR PRINCIPLES ......................................................................................................................................... 19 2.3 METADATA AND DISCOVERY OF DATASETS ........................................................................................................ 21 2.4 DATA REGISTRIES, DATA SHARING AND DATA EXCHANGE ...................................................................................... 21
2.4.1 DataBioHub ................................................................................................................................... 22 2.4.2 Linked Data and Open Micka ........................................................................................................ 23 2.4.3 Industrial data spaces ................................................................................................................... 25 2.4.4 Openness and payment ................................................................................................................ 26 2.4.5 UXP – Exchange Platform - Cybernetica ....................................................................................... 26
2.5 INDUSTRIAL DATA SPACES AND CONNECTORS .................................................................................................... 27 2.5.1 EU Data Portal .............................................................................................................................. 29 2.5.2 GEOSS............................................................................................................................................ 29 2.5.3 DCAT and GeoDCAT ...................................................................................................................... 30 2.5.4 CKAN ............................................................................................................................................. 30
2.6 OTHERS ..................................................................................................................................................... 30
CONTEXT VIEW ..................................................................................................................................... 33
3.1 EXTERNAL DRIVERS FOR DATA SHARING AND DATA EXCHANGE .............................................................................. 33 3.2 DATA INTEROPERABILITY THROUGH ONTOLOGIES, MODELS, FORMATS AND STANDARDS ............................................. 35
3.2.1 Geospatial and Earth Observation ontologies and standards ...................................................... 35 3.2.2 Agricultural ontologies and standards .......................................................................................... 35 3.2.3 Forestry ontologies and standards ............................................................................................... 36 3.2.4 Fishery ontologies and standards ................................................................................................. 36
3.3 DATA ACCESS THROUGH STANDARD SERVICES AND APIS ...................................................................................... 38 3.3.1 Geospatial Standards, Data Types and Services ........................................................................... 38 3.3.2 Sensor Standards, ontologies, data representations .................................................................... 39 3.3.3 API approach ................................................................................................................................. 41
3.4 STAKEHOLDERS AND CONCERNS ...................................................................................................................... 42 3.5 LICENSE MODELS FOR DATA REUSE .................................................................................................................. 46
REQUIREMENTS VIEW .......................................................................................................................... 47
4.1 TYPES OF EO DATA AND SENSORS USED IN THE DATABIO PILOTS AND THEIR CHARACTERISTICS .................................... 47 4.2 DATASETS AND DATASTREAM REQUIREMENTS FROM PLATFORM ........................................................................... 55 4.3 DATASETS AND DATASTREAM REQUIREMENTS FROM AGRICULTURE PILOTS ............................................................. 57
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 8
4.4 DATASETS AND DATASTREAM REQUIREMENTS FROM FORESTRY PILOTS ................................................................... 62 4.5 DATASETS AND DATASTREAM REQUIREMENTS FROM FISHERY PILOTS ..................................................................... 64
DATASETS: EXISTING, IMPROVED, NEW AND OTHERS .......................................................................... 69
5.1 EXISTING DATASETS UTILIZED BY DATABIO PILOTS .............................................................................................. 69 5.1.1 Open Transport Map (UWB - D03.02) ........................................................................................... 69 5.1.2 Forest resource data (METSAK - D18.01) ...................................................................................... 71 5.1.3 Landsat 8 OLI data ........................................................................................................................ 74 5.1.4 Sentinel 3 OLCI (Ocean and Land Colour Instrument) data ........................................................... 77 5.1.5 Sentinel 3 SLSTR (Sea and Land Surface Temperature Radiometer) ............................................. 78 5.1.6 MODIS data ................................................................................................................................... 80 5.1.7 Proba-V data ................................................................................................................................. 81 5.1.8 Global Precipitation Measurement (GPM) mission data .............................................................. 82 5.1.9 KNMI (Koninklijk Nederlands Meteorologisch Instituut) precipitation data ................................. 84 5.1.10 CMEMS (Copernicus Marine Environment Monitoring Service) data ...................................... 85 5.1.11 Sentinel 2A (ESA D11.01) .......................................................................................................... 86 5.1.12 Sentinel-2 Data ......................................................................................................................... 88 5.1.13 Sentinel 3 SRAL (Synthetic Aperture Radar Altimeter) data ..................................................... 89 5.1.14 Sentinel 3 MWR (Microwave Radiometer) data ...................................................................... 89
5.2 DATASETS IMPROVED BY DATABIO .................................................................................................................. 89 5.2.1 RPAS (Remotely Piloted Aircraft Systems) data ............................................................................ 89 5.2.2 Ortophotos .................................................................................................................................... 90 5.2.3 gaiasense field (D13.01)................................................................................................................ 91 5.2.4 Land use and properties - Greek agriculture pilots (NP - D13.02) ................................................. 93 1.1.1 5.3.13 Land use and properties - Greek agriculture pilots ............................................................ 93 5.2.5 Customer and forest estate data (METSAK - D18.02) ................................................................... 96
5.3 NEW DATASETS CREATED DURING DATABIO ..................................................................................................... 98 5.3.1 Canopy height map (FMI - D14.05) ............................................................................................... 98 5.3.2 Orthophotos - (IGN - D11.02) ........................................................................................................ 99 5.3.3 GEOSS sources (D11.03) .............................................................................................................. 101 5.3.4 RPAS data (Tragsa - D11.04) ....................................................................................................... 101 5.3.5 MFE Spanish Forest Map (D11.06) .............................................................................................. 103 5.3.6 Field data - pilot B2 (Tragsa - D11.07) ........................................................................................ 105 5.3.7 Forest damage (FMI - D14.07) .................................................................................................... 107 5.3.8 Open Forest Data (METSAK - D18.01) ......................................................................................... 108 5.3.9 Hyperspectral image orthomosaic (Senop - D44.02) .................................................................. 111 5.3.10 Leaf area index (FMI - D14.06) ............................................................................................... 111 5.3.11 NASA CMR Landsat Datasets via FedEO Gateway (SPACEBEL - D07.02) ............................... 114 5.3.12 Ontology for (Precision) Agriculture (PSNC -D09.01) ............................................................. 115 5.3.13 Open Land Use (Lespro - D02.01) ........................................................................................... 117 5.3.14 Phenomics, metabolomics, genomics and environmental datasets (CERTH - DS40.01) ........ 122 5.3.15 Quality control data (METSAK - D18.04) ................................................................................ 122 5.3.16 Sentinels Scientific Hub Datasets via FedEO Gateway (SPACEBEL -D07.01) .......................... 125 5.3.17 SigPAC (Tragsa - D11.05) ....................................................................................................... 127 5.3.18 Smart POI dataset (Lespro - D02.01) ...................................................................................... 128 5.3.19 Stand age map (FMI - D14.04) ............................................................................................... 129 5.3.20 Storm and forest damage observations and possible risk areas (METSAK - D18.03a) .......... 130 5.3.21 Forest road condition observations (METSAK - D18.03b) ...................................................... 133 5.3.22 Tree species map (FMI - D14.03) ............................................................................................ 136 5.3.23 Wuudis data (MHGS - D20.01) ............................................................................................... 138
5.4 RECOMMENDED INTERACTION STRUCTURES: ATOS ......................................................................................... 139
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 9
CONCLUDING REMARKS ..................................................................................................................... 149
REFERENCES ....................................................................................................................................... 150
APPENDIX A METADATA TEMPLATE TABLE ........................................................................................... 152
Table of Figures
FIGURE 1: WORK PACKAGES AND THEIR ROLES IN DATABIO ................................................................................................. 3 FIGURE 2: HOW DISTRIBUTED STORAGE AND PAYMENTS WORK ........................................................................................... 26 FIGURE 3: FUNCTIONAL ARCHITECTURE OF THE INDUSTRIAL DATA SPACE ............................................................................. 28 FIGURE 4: OPENAIRE ................................................................................................................................................. 31 FIGURE 5: DRYAD .................................................................................................................................................... 32 FIGURE 6: THE FLUX STANDARDS AND STATUS (FROM UN ESCAP PRESENTATION OF DR HEINER LEHR) [REF-37]. .................... 37 FIGURE 7: ARCHIMATE STRATEGY DIAGRAM SHOWING HOW THE PILOT SYSTEM WILL REALIZE THE DEFINED GOALS ....................... 42 FIGURE 8: ARCHIMATE BUSINESS DIAGRAM SHOWING THE DATA PROCESSING, DATASETS AND ACTORS INVOLVED ........................ 43 FIGURE 9: ARCHIMATE DATA VIEW FOR ONE OF THE FISHERY PILOTS (B2) ............................................................................ 44 FIGURE 10: THE B2 FISHERY PILOT LIFECYCLE VIEW SHOWING HOW DATA IS PROVIDED AS INPUT TO PROCESSING STEPS ................. 44 FIGURE 11: THE B2 FISHERY PILOT PIPELINE VIEW SHOWING HOW DATASETS ARE INTERFACED .................................................. 45 FIGURE 12: EO DATA COLLECTION CONTEXT .................................................................................................................. 47
List of Tables TABLE 1: THE DATABIO CONSORTIUM PARTNERS ............................................................................................................. 15 TABLE 2: TYPES OF DATA USED IN DATABIO PILOT PROJECTS .............................................................................................. 48
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 10
Definitions, Acronyms and Abbreviations Acronym/
Abbreviation Title
ADES Application Deployment and Execution Service
AMS Application Management Client
API Application programming interface
ArchiMate ArchiMate® Specification, modelling language for Enterprise
Architecture
ATOM ATOM (Syndication Format)
BDVA Big Data Value Association
CAP Common Agriculture Policy
CCSDS Consultative Committee for Space Data Systems
CEOS Committee on Earth Observing Satellites
CEP Complex Event Processing
CETL Connect Extract Transform and Load
CKAN Comprehensive Kerbal Archive Network
CMEMS Copernicus Marine Environment Monitoring Service
CMR Common Metadata Repository
CPS Cyber Physical Systems
CSW Catalogue Service for Web
DCAT Data Catalog Vocabulary
DDS Data Distribution System
DEI Digitising European Industry
DIAS Data and Information Access Services
DSL Domain Specific Language
DWG Domain Working Group
ECMWF European Centre for Medium-Range Weather Forecasts
ECSS European Collaboration on Space Standardisation
EO Earth Observation
ERS European Remote Sensing Satellite
ESA European Space Agency
FAD Fish Aggregating Devices
FTP File Transfer Protocol
GEMET GEneral Multilingual Environmental Thesaurus
GEO Group on Earth Observation
GSCDA GMES Space Component Data Access
GUI Graphical User Interface
HLA High Level Architecture
HPC High-Performance Computing
HTTP Hypertext Transfer Protocol
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 11
IAS Invasive Alien Species
IDP Industrial Data Platform
IETF Internet Engineering Task Force
INSPIRE Infrastructure for Spatial Information in Europe
IoT Internet of Things
ISO International Organisation for Standardisation
JSON JavaScript Object Notation
KMI Koninklijk Meteorologisch Instituut
KML Keyhole Markup Language
KNMI Koninklijk Nderlands Meteorologisch Instituut
LPIS Land Parcel Identification System
NASA National Aeronautics and Space Administration
NG Next Generation
NIST National Institute of Standards and Technology
NN Nearest Neighbors
OAIS Open Archival Information System
OASIS Organization for the Advancement of Structured Information
Standards
ODBC Open Database Connectivity
OGC Open Geospatial Consortium
OLCI Ocean and Land Colour Imager
OLU Open Land Use
OTM Open Transport Map
PaaS Platform as a Service
PDP Research Data Platform
PPP Public-Private Partnership
PROTON IBM PROactive Technology ONline
RDF Resource Description Framework
REST REpresentational State Transfer
RMSE Root Mean Square Error
RPAS Remotely Piloted Aircraft Systems
SaaS Software as a Service
SAFE Standard Archive Format for Europe
SIG Special Interest Groups
SLSTR Sea and Land Surface Temperature Radiometer
SRIA Strategic Research and Innovation Agenda
STIM Smart Transducer Interface Module (from IEEE standard)
SVM Support Vector Machines
SWG Standards Working Group
TCP Transmission Control Protocol
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 12
TEP Thematic Exploitation Platform
TMS Tile Map Service
UDP Urban/City Data Platform
UI User Interface
UMM Unified Metadata Model
URL Universal Resource Locator
W3C World Wide Web Consortium
WCPS Web Coverage Processing Service
WCS Web Coverage Service
WFS Web Feature Service
WGISS Working Group on Information Systems and Services
WMS Web Map Service
WMTS Web Map Tile Service
WP Work Package
WPS Web Processing Service
WTZ Warning time horizon
XFDU XML Formatted Data Units
XML eXtensible Markup Language
Term Definition
Commercial
Mission
The products from high resolution and very high-resolution commercial
missions are purchased on the market. The term “commercial” is used
to denote both optical and radar missions.
Dataset Identifiable collection of data. In the EO Community, a dataset is
typically called “product”.
Dataset Series Collection of datasets sharing the same product specification. In the EO
Community, a dataset series is also called “collection” or “dataset” (in
GSCDA).
Exploitation
Platform
An Exploitation Platform is a virtual workspace, providing the user
community with access to (i) large volume of data (EO/non-space data),
(ii) algorithm development and integration environment, (iii) processing
software and services (e.g. toolboxes, retrieval baselines, visualization
routines), (iv) computing resources (e.g. hybrid cloud/grid), (v)
collaboration tools (e.g. forums, wiki, knowledge base, open
publications, social networking), (vi) general operation capabilities (e.g.
user management and access control, accounting, etc.).
SAFE Format The SAFE (Standard Archive Format for Europe) has been designed to
act as a common format for archiving and conveying data within ESA
Earth Observation archiving facilities.
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 13
Special attention has been taken to ensure that SAFE conforms to the
ISO 14721:2003 OAIS (Open Archival Information System) reference
model and related standards such as the emerging CCSDS/ISO XFDU
(XML Formatted Data Units) packaging format.
Sentinel-1 The Copernicus Sentinel-1 earth observation mission developed by ESA
provides continuity of data from ERS and Envisat missions, with further
enhancements in terms of revisit, coverage, timeliness and reliability of
service. The SENTINEL-1 mission comprises a constellation of two polar-
orbiting satellites, operating day and night performing C-band synthetic
aperture radar imaging, enabling them to acquire imagery regardless of
the weather. The two-satellite constellation offers a 6 days revisit time.
A summary of mission objectives is:
● Monitoring sea ice zones and the Arctic environment, and
surveillance of marine environment;
● Monitoring land surface motion risks;
● Mapping of land surfaces: forest, water and soil;
● Mapping in support of humanitarian aid in crisis situations;
● Spatial Resolution: 5m, 20m, 40m.
Source: Wikipedia and Sentinel Online Web site
(https://sentinels.copernicus.eu).
Sentinel-2 The Copernicus Sentinel-2 earth observation mission developed by ESA
provides continuity to services relying on multi-spectral high-resolution
optical observations over global terrestrial surfaces. Sentinel-2 sustains
the operational supply of data for services such as forest monitoring,
land cover changes detection or natural disasters management.
The Sentinel-2 mission offers an unprecedented combination of the
following capabilities:
● Multi-spectral information with 13 bands in the visible, near
infra-red and short wave infra-red part of the spectrum;
● Systematic global coverage of land surfaces: from 56°South to
84°North, coastal waters and all Mediterranean Sea;
● High revisit: every 5 days at equator under the same viewing
conditions;
● High spatial resolution: 10m, 20m and 60m;
● Wide field of view: 290 km.
Source: Wikipedia and Sentinel Online Web site
(https://sentinels.copernicus.eu).
Sentinel-3 The Copernicus Sentinel-3 earth observation mission developed by ESA
main objective is to measure sea-surface topography, sea- and land-
surface temperature and ocean- and land-surface colour.
A pair of Sentinel-3 satellites will enable a short revisit time of less than
two days for OLCI instrument and less than one day for SLSTR at the
equator.
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 14
Mission objectives are:
● Measure sea-surface topography, sea-surface height and
significant wave height;
● Measure ocean and land-surface temperature;
● Measure ocean and land-surface colour
● Monitor sea and land ice topography;
● Sea-water quality and pollution monitoring;
● Inland water monitoring, including rivers and lakes;
● Aid marine weather forecasting with acquired data;
● Climate monitoring and modelling;
● Land-use change monitoring;
● Forest cover mapping;
● Fire detection;
● Weather forecasting;
● Measuring Earth's thermal radiation for atmospheric
applications.
The Sentinel-3A mission has now reached the full operational capacity
and preparations for Sentinel-3B launch is-going (mission status on 6
December 2017).
Sources: Wikipedia and Sentinel Online Web site
(https://sentinels.copernicus.eu).
Third Party
Mission
ESA uses its multi-mission ground systems to acquire, process, archive
and distribute data from other satellites - so called Third Party Missions.
Source: http://earth.esa.int/missions/thirdpartymission/.
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 15
Introduction 1.1 Project Summary The data intensive target sector selected for the
DataBio project is the Data-Driven Bioeconomy.
DataBio focuses on utilizing Big Data to
contribute to the production of the best possible
raw materials from agriculture, forestry and
fishery/aquaculture for the bioeconomy
industry, in order to output food, energy and
biomaterials, also taking into account various
responsibility and sustainability issues.
DataBio will deploy state-of-the-art big data technologies and existing partners’ infrastructure
and solutions, linked together through the DataBio Platform. These will aggregate Big Data
from the three identified sectors (agriculture, forestry and fishery), intelligently process them
and allow the three sectors to selectively utilize numerous platform components, according
to their requirements. The execution will be through continuous cooperation of end user and
technology provider companies, bioeconomy and technology research institutes, and
stakeholders from the big data value PPP programme.
DataBio is driven by the development, use and evaluation of a large number of pilots in the 3
identified sectors, where also associated partners and additional stakeholders are involved.
The selected pilot concepts will be transformed to pilot implementations utilizing co-
innovative methods and tools. The pilots select and utilize the best suitable market ready or
almost market ready ICT, Big Data and Earth Observation methods, technologies, tools and
services to be integrated to the common DataBio Platform.
Based on the pilot results and the new DataBio Platform, new solutions and new business
opportunities are expected to emerge. DataBio will organize a series of trainings and
hackathons to support its take-up and to enable developers outside the consortium to design
and develop new tools, services and applications based on and for the DataBio Platform.
The DataBio consortium is listed in Table 1. For more information about the project see
www.databio.eu.
Table 1: The DataBio consortium partners
Number Name Short name Country
1 (CO) INTRASOFT INTERNATIONAL SA INTRASOFT Belgium
2 LESPROJEKT SLUZBY SRO LESPRO Czech Republic
3 ZAPADOCESKA UNIVERZITA V PLZNI UWB Czech Republic
4
FRAUNHOFER GESELLSCHAFT ZUR FOERDERUNG DER
ANGEWANDTEN FORSCHUNG E.V. Fraunhofer Germany
5 ATOS SPAIN SA ATOS Spain
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 16
6 STIFTELSEN SINTEF SINTEF ICT Norway
7 SPACEBEL SA SPACEBEL Belgium
8
VLAAMSE INSTELLING VOOR TECHNOLOGISCH
ONDERZOEK N.V. VITO Belgium
9
INSTYTUT CHEMII BIOORGANICZNEJ POLSKIEJ
AKADEMII NAUK PSNC Poland
10 CIAOTECH Srl CiaoT Italy
11 EMPRESA DE TRANSFORMACION AGRARIA SA TRAGSA Spain
12 INSTITUT FUR ANGEWANDTE INFORMATIK (INFAI) EV INFAI Germany
13 NEUROPUBLIC AE PLIROFORIKIS & EPIKOINONION NP Greece
14
Ústav pro hospodářskou úpravu lesů Brandýs nad
Labem UHUL FMI Czech Republic
15 INNOVATION ENGINEERING SRL InnoE Italy
16 Teknologian tutkimuskeskus VTT Oy VTT Finland
17 SINTEF FISKERI OG HAVBRUK AS
SINTEF
Fishery Norway
18 SUOMEN METSAKESKUS-FINLANDS SKOGSCENTRAL METSAK Finland
19 IBM ISRAEL - SCIENCE AND TECHNOLOGY LTD IBM Israel
20 MHG SYSTEMS OY - MHGS MHGS Finland
21 NB ADVIES BV NB Advies Netherlands
22
CONSIGLIO PER LA RICERCA IN AGRICOLTURA E
L'ANALISI DELL'ECONOMIA AGRARIA CREA Italy
23 FUNDACION AZTI - AZTI FUNDAZIOA AZTI Spain
24 KINGS BAY AS KingsBay Norway
25 EROS AS Eros Norway
26 ERVIK & SAEVIK AS ESAS Norway
27 LIEGRUPPEN FISKERI AS LiegFi Norway
28 E-GEOS SPA e-geos Italy
29 DANMARKS TEKNISKE UNIVERSITET DTU Denmark
30 FEDERUNACOMA SRL UNIPERSONALE Federu Italy
31
CSEM CENTRE SUISSE D'ELECTRONIQUE ET DE
MICROTECHNIQUE SA - RECHERCHE ET
DEVELOPPEMENT CSEM Switzerland
32 UNIVERSITAET ST. GALLEN UStG Switzerland
33 NORGES SILDESALGSLAG SA Sildes Norway
34 EXUS SOFTWARE LTD EXUS
United
Kingdom
35 CYBERNETICA AS CYBER Estonia
36
GAIA EPICHEIREIN ANONYMI ETAIREIA PSIFIAKON
YPIRESION GAIA Greece
37 SOFTEAM Softeam France
38
FUNDACION CITOLIVA, CENTRO DE INNOVACION Y
TECNOLOGIA DEL OLIVAR Y DEL ACEITE CITOLIVA Spain
39 TERRASIGNA SRL TerraS Romania
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 17
40
ETHNIKO KENTRO EREVNAS KAI TECHNOLOGIKIS
ANAPTYXIS CERTH Greece
41
METEOROLOGICAL AND ENVIRONMENTAL EARTH
OBSERVATION SRL MEEO Italy
42 ECHEBASTAR FLEET SOCIEDAD LIMITADA ECHEBF Spain
43 NOVAMONT SPA Novam Italy
44 SENOP OY Senop Finland
45
UNIVERSIDAD DEL PAIS VASCO/ EUSKAL HERRIKO
UNIBERTSITATEA EHU/UPV Spain
46
OPEN GEOSPATIAL CONSORTIUM (EUROPE) LIMITED
LBG OGCE
United
Kingdom
47 ZETOR TRACTORS AS ZETOR Czech Republic
48
COOPERATIVA AGRICOLA CESENATE SOCIETA
COOPERATIVA AGRICOLA CAC Italy
49 SINTEF AS SINTEF Norway
1.2 Document Scope
The main objective of this deliverable is to describe the datasets utilized, improved and
created in the DataBio project. A secondary objective is to show how the datasets are
identified based on a model-driven design process based on Archimate, involving the 26 pilot
systems in the DataBio project.
In addition to this deliverable, the datasets will be provided through the DataBioHub,
including important Archimate design diagrams.
1.3 Document Structure
This document is comprised of the following chapters:
Chapter 1 presents an introduction to the project and the document.
Chapter 2 introduces datasharing and dataeconomy in the context of DataBio.
Chapter 3 presents the context view of datasets in DataBio, including external drivers,
stakeholders and license models.
Chapter 4 provides an overview of the requirements for datasets and datastreams in DataBio
grouped by pilots and the platform itself
This is a public version of Deliverable D4.3 “Data sets, formats and models”.
Confidential information from the original document has been omitted.
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 18
Chapter 5 presents the datasets in DataBio: existing, improved, new and other relevant
datasets. The final subsection gives an example of how a dataset can be used for application
development.
Chapter 6 presents the concluding remarks.
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 19
Background 2.1 Data sharing and data economy in DataBio As part of the Digital Single Market strategy and building a European data economy, the
European Commission adopted the Communication ‘Towards a common European data
space’ in April 2018 [REF-01]. The document proposes a roadmap to “a common data space
in the EU - a seamless digital area with the scale that will enable the development of new
products and services based on data.” The DataBio domains, agriculture, forestry and fishery,
are key areas where the Commission expects that businesses can utilize the data sharing
through the data space to improve products and productivity. The Commission document
identifies reuse of public and publicly funded data to be a cornerstone in the dataspace and
has launched the “European Open Data Portal” to stimulate the development [REF-02].
An important factor in realizing a common data space is to stimulate to private businesses
and public agencies to share both private and public datasets. The guide to “Building a
European data economy” states that digital data “is an essential resource for economic
growth, competitiveness, innovation, job creation and societal progress in general” [REF-03].
Digital data should be shared in both business to business (B2B) and business to government
(B2G) contexts. The DataBio pilots involves many private stakeholders that produce, consume
and share datasets/datastreams. The pilots will demonstrate how data can be shared and
utilized in order to improve the quality and efficiency of pilot systems. All datasets and
datastreams involved in the pilot systems’ realization are identified documented in the
platform and pilot ArchiMate models. These models relate the datasets to the pilots and
interfaces, providing traceability from pilot to data, components and pipelines.
The DataBio datasets and datastreams are examples of B2B and B2G data sharing, and is
documented here in terms of
1) Rich metadata: each dataset is described with relevant metadata elements following
best practice and harmonized with e.g. Transforming Transport datasets [REF-04].
2) Data portal - the DataBioHub: each dataset and datastream is registered in the
DataBioHub - a data portal from DataBio
3) Examples: relevant examples on how to utilize datasets from DataBio is provided in
this document
2.2 FAIR Principles Most datasets from publicly funded research are still inaccessible to the majority of scientists
in the same discipline, not to mention other potential users of the data, such as company R&D
departments. About 80% of research data is not in a trusted repository. However, even if the
data openly appears in repositories, this is not always enough. As a current example, only 18%
of the data in open repositories is reusable [REF-05]. This leads to inefficiencies and delays; in
recent surveys, the time reportedly spent by data scientists in collecting and cleaning data
sources made up 80% of their work [REF-06]. These figures can be assumed to be valid also
for the bioeconomy sector.
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 20
In response to these challenges, the Commission has launched a large effort with the
objective of creating “a European Open Science Cloud to make science more efficient and
productive and let millions of researchers share and analyse research data in a trusted
environment across technologies, disciplines and borders” [REF-07]. The initial outline for the
European Open Science Cloud (EOSC) was laid out in the report from the High Level Expert
Group (Moons et al 2016). This report promotes the FAIR Data Principles, which are a set of
guiding principles in order to promote maximum use of research data (Wilkinson et al., 2016)
The FAIR principles were created in a workshop in 2014 and intend to give “a minimal set of
community-agreed guiding principles and practices” [REF-08]. Both humans and machines
should be enabled to find (F), access (A), interoperate (I) and re-use (R) research data and
metadata in an effortless but confined fashion. These principles provide guidance for
scientific data management and stewardship and are relevant to all stakeholders in the
current digital ecosystem. A Data management plan based on FAIR is since 2017 mandatory
in all EU Horizon projects [REF-09]. The FAIR principles are advanced by the Go Fair initiative
(https://www.go-fair.org/) [REF-10]. Currently, Germany, France and the Netherlands are
part of this initiative.
As comes to DataBio, the project implemented the Data Management Plan (DMP), that is a
part of the project proposal. The plan, that constitutes Deliverable D6.21 covers descriptions
of the DataBio datasets, data standards, data sharing and long-time preservation of data. The
DMP is also an important tool for the dissemination and exploitation activities. Data privacy
and ownership are essential elements, which are dealt with in T4.6.
The DataBioHub [REF-11], described below in Section 2.4.1, is a central tool for our project in
realising data management and data sharing. In addition to offering searchable public and
private dataset descriptions, it also contains descriptions of DataBio components, pipelines
and pilots as well as of their mutual relations. The hub clearly makes the DataBio data findable
by publishing the metadata according to best practices and standards (geospatial and others)
as well as applying search keywords (=tags) to the digital objects. The data is also accessible
from the DataBioHub repository, however in some cases only indirectly by consulting the
dataset owner, when the Hub only contains the metadata. DataBioHub typically contains
information about the APIs, the data model and formats as well as about the access methods
This hub also promotes interoperability as the metadata and data many times - but not always
- obey established standards, e.g. in the Earth Observation field. Finally, for reusability, the
licensing schemes are essential to permit the widest reuse possible. When will restricted data
be made available for reuse? Are the data produced and/or used in the project useable by
third parties, in particular after the end of the project? How long is it intended that the data
remains re-usable?
The DataBio data management plan related to FAIR principles is described in chapter 3 of
[REF-20].
1 https://www.databio.eu/wp-content/uploads/2017/05/DataBio_D6.2-Data-Management-Plan_v1.0_2017-06-
30_CREA.pdf
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 21
2.3 Metadata and discovery of datasets Data discoverability of (open) geo-information is vital to increase the use of geospatial data
within- and outside the geospatial expert community. This may also be supported by
experience originating from Europe. In 2003, Directive 2003/98/EC (also known as PSI – Public
Sector Information) established a minimum set of rules governing both the re-use and the
practical means of facilitating the re-use of existing documents held by public sector bodies
in the European Union. In the end, Directive 2003/98/EC had only a partial impact in the field
of data re-use. It was even hard to discover that there are data that may be re-used. In 2007,
Directive 2007/2/ES (also known as INSPIRE – INfrastructure for SPatial InfoRmation in
Europe) was established, chiefly to make it easier to discover available spatial data and
services. Moreover, discovery mechanisms represent one of the bridges between geospatial
and non-geospatial approaches for metadata management.
Metadata in the DataBio project are extraordinarily diverse from their structure, encodings,
kinds of resources they describe, handling as well as publication point of views. “Big
metadata” approaches need to be developed since also metadata meet the requirements of
three out of four V: variety, veracity and velocity. Volume is not an issue as metadata are
typically small, in a scale of kilo- or maximally megabytes. Nevertheless, the traditional
metadata approaches are based on assumptions of static resources and long-term durability
of metadata records from a variety and velocity point of view. Veracity of metadata has
always been an issue, a least, due to a loose integration of data and metadata updates. The
DataBio approach therefore aims at the following goals for metadata and discovery:
1. Tight data and metadata together: ensure updated metadata despite Big Data velocity
updates.
2. Support metadata heterogeneity: enable discovery of static (e.g. datasets) as well as
mobile/other resources (e.g., sensors active during agricultural machinery fleet
tracking) in a unified platform.
3. Use efficient encodings: support XML-based format for backwards compatibility, on
the contrary use visionary lightweight and semantics-based formats.
4. Integrate metadata in other tools: the best metadata platform is the one where a user
does not notice that (s)he works with metadata.
2.4 Data registries, data sharing and data exchange The data sets of DataBio are registered in the DataBio Hub. It is also relevant to register
datasets in other data registries like GEOSS or others.
Earth Observation (EO) data sets are of major importance for the DataBio project, and the
management and access of these has been described in more detail in the deliverables D5.1
and D5.i2. As an example, Sentinel Products available on the Sentinels Scientific Data Hub
(Sentinel-1, Sentinel-2) can be discovered and accessed via the FedEO Gateway (C07.01) that
returns Sentinel collections and datasets metadata (including product download URL) via an
OGC 13-026r8 OpenSearch interface.
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 22
Industrial data platforms including support for data sharing, data exchange and data access
are now also emerging and the DataBio project is also aiming to take advantage of these in
the next phase of the project. Below the DataBioHub is described, followed by a description
of other relevant data registries and data platforms.
2.4.1 DataBioHub
DataBioHub [REF-11] provides a registry for the project components, pipelines, and pilots for
an easy search of the different project entities. The hub is dynamic and is being updated with
more functionalities and resources as the project evolves. The data sets applied in the project
will be added to this hub, so these can be searched in combinations with the other project
resources.
It is important to note that the DataBio Hub does not offer a repository or operating
environment for the service instances and datasets themselves, as those instances will be
running on the service providers’ servers or cloud infrastructure (or DataBio -provided cloud).
Regardless of the running environment of the service instances, DataBioHub offers
descriptions and endpoints to all DataBio platform -compatible services and components (and
possibly applications) in a single location and with a coherent description.
Initially two publicly available instances of the complete digital service registries exist: one as
a project deliverable at icare.erve.vtt.fi/ServiceRegistryWeb and one public and free for non-
commercial R&D usage at www.digitalserviceshub.com.
A new service registry instance has been provided for DataBio project and can be found at
http://www.databiohub.eu/. The instance has been installed on a virtual machine on
Microsoft Azure’s cloud computing service. Infrastructure as a service (IaaS) allows easy
server management and increasing computing power and resources if needed. Virtual
machine runs on Ubuntu Linux platform and the whole machine is backed up in a recovery
service vault redundant geologically.
Digital service registry has been tailored for DataBio use, which includes following
developments:
• As service registry was initially developed to register digital services with mainly machine-readable interface descriptions, vocabulary support for new categories of software components such as applications with both human and technical interfaces have need to be added.
• New interface technologies such as OpenSearch for satellite image services have been added to service hub interface description vocabularies.
• DataBio Pilot descriptions data processing pipelines developed in pilots as well as component descriptions are now also included into registry. Dataset descriptions will be added while submitting deliverable D4.3.
• Specific rules for keyword use for DataBio have been enforced to link descriptions to BDVA reference architecture and also help linking component and service descriptions to overall architecture of DataBio platform and pilots.
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 23
• New fields for human readable description have been added to improve linking them to pilot development, data models, and DataBio deliverables with possibility to include as images the component diagrams exported from DataBio architecture models.
• Service Hub UI and its website has been tailored for DataBio and linked with other websites of DataBio project.
• Registration mechanisms for new users outside DataBio consortium have been restricted during DataBio platform development.
2.4.2 Linked Data and Open Micka
The best practices for the publication of Linked data were described in previous deliverable
D4.i1, Section “Linked Data Publication Pipeline”. In this section, we summarize the most
relevant practices, which have been applied during the DataBio project.
Theoretically Linked Data refers to a set of best practices for publishing and interlinking
structured data thereby enabling it to be accessed by both humans and machines. The data
interchange follows the RDF family of standard and SPARQL is used for querying. The key
technologies that support the Linked Data are:
• Any concept or entity can be identified by assigning specific URIs to them.
• HTTP for retrieving or description of resources.
• RDF which is generic graph-based data model used for structuring and linking data that describes concepts or entities in the real world.
• SPARQL is the standard RDF query language.
Due to the growing popularity of Linked Data, more detailed guidelines for the development and delivery of open data as Linked Data were defined. For instance, for open government data, the best practices recommended include (more detailed information was given in D4i.1:
• To prepare the stakeholders
• To select a reusable dataset
• To model data objects and their relations to represent Linked Data.
• To specify an appropriate license to ease data reuse.
• To use well-considered URI naming strategy and implementation plan.
• To describe the objects with previously defined vocabulary.
• To convert data into linked data representation by scripting or other automated processes.
• To provide machine access to the Linked Data.
• To announce new datasets on authoritative domains to initiate an implicit social contact.
• To maintain the Linked Data which is once published.
Note that even those these best practices were conceived for open government data, they
apply generally in other domains.
Regarding the publication process, there are at least three well known life cycle models
(Hyland et al., Hausenblas et al., Villazón-Terrazas et al.) for publishing linked data. All of these
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 24
models identify common needs of specifying, modelling and publishing data in the standard
open Web format. However even though all of the models somewhat deal with similar tasks
involved in the process of publishing linked data, they have some differences between those
tasks. A detailed description of these models is available in D4.i1. For our work, we are mainly
interested in the model proposed by Villazón-Terrazas et al., that includes the following
activities:
• Specification:
o Identification and analysis of the data sources to be published.
o Reusing or leveraging the data that had already been opened/published.
o Assigning meaningful URIs rather than opaque ones whenever possible.
o Definition of the license of the data sources and reusing existing ones
whenever possible.
• Modelling:
o Ontologies are to be expressed in either OWL or RDF(S).
o Reusing the existing and available vocabularies.
o Reusing the available non-ontological resources.
• Generation:
o Transformation of the specified data sources into RDF according to the
modelled vocabulary by using tools like CSV and spread sheets, RDB or XML.
o Pre-processing and/or post processing tasks for fixing accessibility issues,
reasoning issues etc.
o Linking with suitable datasets and discovering suitable relationships between
the other data items with valid properties.
• Publishing:
o Dataset publication by using tools for storing RDF (e.g. Openlink Virtuoso
Universal Server, Jena, Sesame, 4Store, YARS, OWLIM etc.) and using SPARQL
endpoint and Linked Data front end (e.g. Pubby, Talis Platform, Fuseki).
o Metadata publication by using VoID which allows expressing metadata about
RDF datasets and by OPM (Open Provenance Model).
o Dataset discovery by registering the datasets in the CKAN2 registry and
generating sitemap files for the dataset, by using sitemap4rdf.
• Exploitation
o Application and exploitation of the Linked Data for various purposes and
applications across different platform in Web technology.
Open Micka [REF-14] is a web application for management and discovery geospatial metadata
(open source under BSD license). This has been extended and applied in DataBio project in
particular in the Agriculture pilot 1 pipeline on " Metadata, linked data and graph data ".
Features of the application:
• OGC Catalogue service (CSW 2.0.2)
2 https://ckan.org/
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 25
• Transactions and harvesting
• Metadata editor
• Multilingual user interface
• ISO AP 1.0 profile
• Feature catalogue (ISO 19110)
• Interactive metadata profiles - management
• WFS/Gazetteer for defining metadata - extent
• GEMET thesaurus built-in client
• INSPIRE registry built-in client
• OpenSearch
• INSPIRE ATOM download service - automatically creation from metadata
2.4.3 Industrial data spaces
The Industrial Data Space (IDS) (renamed in April 2018 to International Data Space(s) ) is both
a research project and a non-profit user association (IDSA). IDS extends a Data marketplace
with the ability to run services inside the IDS, e.g., data analysis and processing operations.
The core requirements for and IDS related to data access are as described in the Industrial
Data Space whitepaper.3
• Data sovereignty: It is always the data owner that specifies the terms and conditions of use of the data provided
• Decentral data management: Data management remains with the respective data owner, if desired.
• Data economy: Data is viewed as an economic asset. It can be distinguished into three categories: private data, so-called »club data« (i.e. data belonging to a specific value creation chain, which is available to selected companies only), and public data (weather information, traffic information, geo data etc.).
• Easy linkage of data: Linked-data concepts and common vocabularies facilitate the integration of data between participants.
• Trust: All participants, data sources, and data services of the Industrial Data Space are certified against commonly defined rules.
• Secure data supply chain: Data exchange is secure across the entire data supply chain, i.e. from data creation to data capture to data usage.
A “Data User” that wants to access data in an IDS must comply with a set of requirements
specified by the “Data Provider” and IDS. These requirements may include payment,
standards for data protection, use period, restrictions on aggregation levels and sharing with
other parties.
3 http://www.industrialdataspace.org/wp-content/uploads/2016/09/whitepaper-industrial-data-space-eng.pdf
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 26
An example of an IDS solution is the Estonian Cybernetica platform4. Cybernetica provides
solutions for sea surveillance, customs declaration management, data sharing, voting and a
number of other applications.
2.4.4 Openness and payment
Openness with respect to data is not a binary concept and that there could be degrees of
openness when it comes to data access (eligible parties, conditions under which data can be
accessed). With the diffusion of IoT-enabled sensors/machines, storage and payment of data
has adopted blockchain technologies. In addition to secure storage, this approach allows data
consumer services that can purchase data from providers using blockchain payment. Datum
(https://datum.org) is an example of a data marketplace following this approach as illustrated
below.
Figure 2: How distributed storage and payments work
2.4.5 UXP – Exchange Platform - Cybernetica
Unified eXchange Platform (UXP) is a technology that enables peer-to-peer data exchange
over encrypted and mutually authenticated channels. It is based on a decentralised
architecture where each peer has an information system that will be connected with other
peers’ systems.
4 https://cyber.ee/en/
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 27
UXP is created by the authors of the world-renowned e-Government system of Estonia, the
X-Road, which according to the World Bank Development Report is what allowed Estonia to
become a truly digital society.
UXP-based solutions have been implemented across four continents to enable running online
government services for 35 million people from different countries and cultures. We make
this possible by fitting our technology naturally into your existing ecosystem, with full
integration support and minimal changes required.
Seamless Data Exchange: UXP connects any number of databases in an efficient and secure
way, helping you build a network of agreements that allows controlled exchange between
any members in your ecosystem.
UXP benefits:
• Less is More. UXP means less paperwork, less bureaucracy, less time spent on futility. In Estonia, digital services save every citizen one work-week per year. What would you do with your week?
• Affordable. UXP can be implemented into any ecosystem – be it a tiny country or a supranational association. With very low maintenance cost and marginal implementation investment, UXP is cost-effective and allows you to move ahead one step at a time.
• Reliable. UXP has been heavily tried and tested since its launch as Estonia’s X-Road in 2001. No downtime has been observed since and the system survived the world’s first cyber conflict in 2007.
• Secure. We use extensive security measures to guarantee the protection and integrity of your data. UXP is secure-by-design, as its decentralised architecture has no single point-of-failure. All traffic is encrypted with 2048-bit keys. These are minimal requirements of the system – cryptographic algorithms can be altered to provide even stronger encryption at the request of our customers.
• Scalable. UXP is scalable to any size of infrastructure. Unlimited amount of security servers can be linked together, making it fit for local and international applications.
• Private. We use a distributed architecture, eliminating the creation of a superdatabase, which could be prone to exploitation. All transactions are signed and timestamped, making it possible to monitor all queries made by officials against private citizens
2.5 Industrial data spaces and connectors The “Industrial Data Space” is a virtual data space using standards and common governance
models to facilitate the secure exchange and easy linkage of data in business ecosystems. It
thereby provides a basis for creating and using smart services and innovative business
processes, while at the same time ensuring digital sovereignty of data owners.
The following section introduces the concept of the Industrial Data Space Connector by citing
from the Reference Architecture Model for the Industrial Data Space published by the
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 28
Fraunhofer-Gesellschaft in cooperation with the Industrial Data Space Association. We
introduce Connectors on the functional level5.
Figure 3 shows the Functional Architecture of the Industrial Data Space. It defines,
irrespective of existing technologies and applications, the functional requirements of the
Industrial Data Space, and the features to be implemented resulting thereof.
Figure 3: Functional Architecture of the Industrial Data Space
The Connector is the central functional entity of the Industrial Data Space. It facilitates the
exchange of data between participants. The Connector is basically a dedicated
communication server for sending and receiving data in compliance with the Connector
specification (see Section 3.5.1 in the Reference Architecture Model). A single Connector can
be understood as a node in the peer-to-peer architecture of the Industrial Data Space. This
means that a central authority for data management is not required. Connectors can be
installed, managed and maintained both by Data Providers and Data Consumers. Typically, a
Connector is operated in a secure environment (e.g., beyond a firewall). This means that
internal systems of an enterprise cannot be directly accessed. However, the Connector can,
for example, also be connected to a machine or a transportation vehicle. Each company
participating in the Industrial Data Space may operate several Connectors. As an option,
intermediaries (i.e., the Service Provider) may operate Connectors on behalf of one or several
participating organizations. The data exchange with the enterprise systems must be
established by the Data Provider or the Data Consumer.
Data Providers can offer data to other participants of the Industrial Data Space. The data
therefore has to be described by metadata. The metadata contains information about the
Data Provider, syntax and semantics of the data itself, and additional information (e.g., pricing
information or usage policies). To support the creation of metadata and the enrichment of
data with semantics, vocabularies can be created and stored for other participants in the
Vocabulary and Metadata Management component. If the Data Provider wants to offer data,
5 For further details related to the other layers of the Reference Architecture Model please refer to the official
document: https://www.fraunhofer.de/content/dam/zv/de/Forschungsfelder/industrial-data-space/Industrial-
Data-Space_Reference-Architecture-Model-2017.pdf
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 29
the metadata will automatically be sent to one or more central metadata repositories hosted
by the Broker. Other participants can browse and search data in this repository. Connectors
can be extended with software components that help transform and/or process data. These
Data Apps constitute the App Ecosystem. Data Apps can either be purchased via the App Store
or developed by the participants themselves. App Providers may implement and provide Data
Apps using the AppStore. Every participant possesses identities required for authentication
when communicating with other participants. These identities are managed by the Identity
Management component. The Clearing House logs each data exchange between two
Connectors.
2.5.1 EU Data Portal
The European Union Open Data Portal (EU ODP) [REF-02] gives access to open data published
by EU institutions and bodies. All the data via this catalogue are free to use and reuse for
commercial or non-commercial purposes. They can be reused in databases, reports, or
projects. A variety of digital formats are available from the EU institutions and other EU
bodies. Total datasets available as per the July 2018 is 12418.
The goal by providing easy access to data — free of charge — is to help organizations to use
the data in innovative ways and unlock their economic potential. The portal is also designed
to make the EU institutions and other bodies more open and accountable.
The data concerned include: geographic, geopolitical and financial data; statistics; election
results; legal acts; data on crime, health, the environment, transport and scientific research.
The portal provides:
• a standardised catalogue, giving easier access to EU open data;
• a list of apps and web tools reusing these data;
• a SPARQL endpoint query editor;
• REST API access;
• tips on how to make best use of the site (see the Search and SPARQL manuals).
2.5.2 GEOSS
The Group on Earth Observations (GEO) [REF-12]works to connect the demand for sound and
timely environmental information with the supply of data and information about the Earth
that is collected through observing systems and made available by the GEO community.
GEOSS (Global Earth Observation System of systems) is a set of coordinated, independent
Earth Observation, information and processing systems that interact and provide access to
diverse information for a broad range of users in both public and private sectors. It facilitates
the sharing of environmental data and information collected from the large array of observing
systems contributed by countries and organizations within GEO.
The ‘GEOSS Portal’ offers a single Internet access point for users seeking data, imagery and
analytical software packages relevant to all parts of the globe. It connects users to existing
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 30
databases and portals and provides reliable, up-to-date and user-friendly information – vital
for the work of decision makers, planners and emergency managers.
It is an objective that DataBio datasets suitable for GEOSS will be added to the GEOSS portal.
2.5.3 DCAT and GeoDCAT
GeoDCAT is a Geospatial extension to DCAT-AP (DCAT application profile for data portals in
Europe). DCAT-AP is a metadata profile meant to provide an interchange format for data
portals operated by EU Member States. It is based on and compliant with the W3C Data
Catalog (DCAT) vocabulary. Data Catalog Vocabulary (DCAT) is an RDF vocabulary designed to
facilitate interoperability between data catalogues published on the Web. By using DCAT to
describe datasets in catalogues, publishers increase discoverability and enable applications to
consume metadata from multiple catalogues. It enables decentralized publishing of
catalogues and facilitates federated dataset search across them.
GeoDCAT was developed in the framework of the EU Programme “Interoperability Solutions
for European Public Administrations” (ISA). GeoDCAT-AP is meant to provide a DCAT-AP
compliant representation for the set of metadata elements included in INSPIRE metadata and
the core profile of ISO 19115:2003. GeoDCAT objectives:
• The GeoDCAT-AP specification does not replace the INSPIRE Metadata Regulation nor the INSPIRE Metadata Technical Guidelines based on ISO 19115:2003 and ISO 19119 [REF-13]
• Its purpose is to give owners of geospatial metadata the possibility to achieve more by providing an additional RDF syntax binding
• Its basic use case is to make spatial datasets, data series, and services searchable on general data portals, thereby making geospatial information better searchable across borders and sectors
2.5.4 CKAN
CKAN [REF-15] is one of the world’s leading open source data portal platform. It is a data
management system that makes data accessible by providing tools to streamline publishing,
sharing, finding and using data. CKAN is aimed at data publishers (national and regional
governments, companies and organizations) wanting to make their data open and available.
Once the data is published, users can use its faceted search features to browse and find the
data they need, and preview it using maps, graphs and tables – whether they are developers,
journalists, researchers, NGOs, or citizens.
2.6 Others OpenAire (https://www.openaire.eu/) as a place to put open data and get a Digital Object
Identifier (DOI) for the dataset. EU funded projects are expected to add the open datasets
created to this portal, and this is also the intention of DataBio. An example of an OpenAire
dataset is shown in the Figure below.
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 31
Figure 4: OpenAire
Dryad (https://datadryad.org), the Dryad Digital Repository, is a curated resource that makes
the data underlying scientific publications discoverable, freely reusable, and citable. Dryad
provides a general-purpose home for a wide diversity of datatypes.
Dryad’s vision is to promote a world where research data is openly available, integrated with
the scholarly literature, and routinely re-used to create knowledge.
The Dryad mission is to provide the infrastructure for, and promote the re-use of, data
underlying the scholarly literature.
Dryad is governed by a non-profit membership organization. Membership is open to any
stakeholder organization, including but not limited to journals, scientific societies, publishers,
research institutions, libraries, and funding organizations.
Publishers are encouraged to facilitate data archiving by coordinating the submission of
manuscripts with submission of data to Dryad. Learn more about submission integration.
Dryad originated from an initiative among a group of leading journals and scientific societies
in evolutionary biology and ecology to adopt a joint data archiving policy (JDAP) for their
publications, and the recognition that easy-to-use, sustainable, community-governed data
infrastructure was needed to support such a policy. An example from Dryad is shown in Figure
5.
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 32
Figure 5: DRYAD
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 33
Context view The datasets, formats and models are identified, described and used within the context of the
DataBio project.
3.1 External drivers for data sharing and data exchange Data sharing consists of minimum two stakeholder which are providing and/or consuming
mostly structured data about an entity (person, business, property or event). External
regulations may set the rules and conditions of data sharing - on how to provide or consume
data. These conditions might have a conductive or restrictive impact on data sharing
processes. Most prominent regulation (legislation) is the GDPR, which set the conditions of
processing personal data. Personal data are any information relating to an identified or
identifiable natural person. For example: To process personal data, the purpose of processing
has to be defined (and validated) and the data consumer has to make sure that the data are
only processed for the defined purpose. In business context, not only legislations are setting
the rules, moreover all contracts define the specific rulesets for processing data.
Such regulations and rules have two main impacts. Firstly, to enable data sharing, the
infrastructure (software) must ensure the compliance to external requirements and rules,
such as the GDPR. Secondly, the data sharing process need to be defined and specified
according to those regulations. While both process in the first run implies costs and efforts,
in the second run it enables trust and long-term collaboration within a community such as
bioeconomy. Furthermore, regulations and activities of public bodies can enable trustful
environment of data-sharing, such as Open Data Policies.
Beside the regulations and rules data sharing also depending on the knowledge domain,
application scenario and intended use, data is differently represented, stored and published.
Data may be intended for human users or for machine processing. Data can be in very diverse
formats and their multimodality (text, image, video, audio) as well as its structural level
(unstructured, semi-structured and structured) can be geared to a specific purpose. Both have
impact on data providing and consuming processes. Furthermore, data, datasets, knowledge
bases and knowledge building blocks are often not stable, are successively expanded and
versioned as well as increasingly developed collaboratively and decentralized. Depending on
the stability and size of the datasets, data is materialized or computed by processing routines
and made available via APIs. Insofar as data is made accessible, for example under Open Data
principles, the target group of the data users must be identified, possible business models
defined, license requirements provided or used, the provenance and trustworthiness of the
data disclosed.
Data that is untrustworthy and whose usability is in question are hardly unusable, at least in
a professional environment. This heterogeneity presents data publishers / data owners and
data users with major problems. Various initiatives (W3C, Go-FAIR, DCMI) recommend the
use of metadata that are specially designed. These initiatives are important external drivers
that have impact on data sharing in data economy and especially in bioeconomy. The more
clarity of the process and requirements of data sharing, the more users will succeed.
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 34
As a rough measure of the quality and sustainability of the published data, the 5-star scheme
according to Tim Berners Lee can be used [REF-16]:
★ Make your data available on the web under an open license. The format does not matter
★★ Provide data in a structured format (e.g. Excel instead of a scanned image of a
spreadsheet)
★★★ Use open, non-proprietary formats (e.g. CSV instead of Excel)
★★★★ Use URIs to label things so your data can be linked
★★★★★ Link your data with other data to create contexts
For example, the W3C offers a huge set of recommendations on which formats, languages,
and vocabularies used to design and link data as well as metadata (RDF, RDFS, OWL, SPARQL,
SHACL). Furthermore, the W3C offers a best practice for dealing with data to be published
(https://www.w3.org/TR/dwbp/). For example, to provide metadata for both human users
and computer applications, and describes the overall features of the dataset as well as the
schema and internal structure of the distributions.
Further, as described in Section 2.2, the Go-Fair Initiative [REF-17] developed a structured
guideline to publish data sustainable. It uses four categories: Firstly, “To be Findable”, which
mainly set some recommendations of identifier and “rich metadata”. Secondly, “To be
Accessible”, which refer to the usage of standards and well-designed protocols. Thirdly, “To
be Interoperable”, which are guidelines to ensure quality and transparent representations
and fourthly “To be Re-usable”, which makes sure the data can be accessed and provided
sustainably.
In addition to these guides, the Dublin Core MetaData Initiative [REF-18] offers a variety of
vocabularies in different formats for describing metadata related to raw data and data
aggregates. Particular emphasis is placed on the provision of:
• Authors and contributors
• Description of the data in text
• Categorization
• License information,
• Versioning and updating rules.
How concrete license information has to be designed is currently not defined and part of
different research approaches. One structured definition of a license can be found on [REF-
19].
Due to the domain-specific complexity and heterogeneity of the data representation, there is
no one big truth that leads the data economy of an application scenario to success. Rather,
this is seen as a collection of recommendations that address dedicated aspects of the design
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 35
of data to ensure the sustainable usability of the data provided, and thus providing users with
the greatest possible support.
3.2 Data interoperability through ontologies, models, formats and
standards DataBio aims at supporting data interoperability through use of suitable standard ontologies,
and data models in the general domains of Geospatial and Earth Observation data and in the
specific domains of Agriculture, Forestry and Fishery, and also to impact future
standardisation where this is found feasible.
3.2.1 Geospatial and Earth Observation ontologies and standards
In the Geospatial domain the DataBio project will aim to use and extend the standards of
OGC, ISO/TC211 and INSPIRE in particular related to the requirements of Big Data. In the
Earth Observation domain, the objective is to use and extend the international Earth
Observation standards and services/APIs as described further in D5.1.
3.2.2 Agricultural ontologies and standards
In the spring of 2018 the DataBio project has engaged in the new established Agriculture
Working Group of OGC, the Open Geodata Consortium.
The mission of the OGC Agriculture Working Group is to identify geospatial interoperability
issues and challenges within the agriculture domain, then examine ways in which those
challenges can be met through application of existing OGC standards, or through
development of new geospatial interoperability standards under the auspices of OGC.
• Examination of the possibilities for agricultural information exchange standard alignment and harmonization between UN/CEFACT, ISO TC 23, ISOBus, AgroXML, OGC, W3C, etc.
• Development of a reference architecture for use of OGC encoding and interface standards in common agricultural activities.
• Renewal of MOU with IUSS WGSIS for coordination on SoilML / ISO 28258 and related standards.
• Coordination with the agricultural interest groups within ESIP and RDA.
• Coordination and exchange with other related initiatives such as GEOSS, GODAN, CGIAR, GlobalGAP, Open Ag Data Alliance, etc.
• Organization of Agricultural Geoinformatics Summits at OGC Technical Committee meetings.
Through previous projects DataBio partners have been engaged in the creation of ontologies
and data models like FOODIE6 and SENSLOG7. The FOODIE ontology extends INSPIRE data
model for Agriculture and Aquaculture Facilities themes. These ontologies and data models
6 http://foodie-cloud.github.io/model/FOODIE.html 77 https://sdi4apps.eu/2016/11/opensensorsnetwork-pilot-senslog-api-for-farmtelemetry-module/
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 36
will be used in the DataBio project and related to the emerging standardisation interest in the
Agriculture area.
3.2.3 Forestry ontologies and standards
Forest information is standardised so that actors engaged in the forest sector could develop
and use harmonised information systems. There are several parallel, successive actors in the
forest sector value chain who have to exchange information when implementing measures.
Although the basic concepts and measurement units of forestry have already been quite
carefully defined for decades, almost every actor has implemented them differently in their
information systems until recent years. As a result, it has been difficult or almost impossible
to convert the information and transfer it from one system to another. Forest information
standards facilitate the use of open materials and data transfer between actors, which in turn
improves operational efficiency for the forest sector.
This website is maintained by the Finnish Forest Centre and Forestry Development Centre
Tapio8.
The forest information standards used by the information systems have been published as
xml schema documentation. The schema defines the structure and content of information so
that different information systems can exchange standardized information.
Available Forest Information Standards include a standard forestry data model, a standard for
special features data, a standard for forestry and micro stand forestry information, a standard
set of wood and forestry trades trading, a standard for wood and timber statistics as well as
Forest Centre messages for official use. The new official standard messages published
recently in 2018 include a message mix for wood harvesting and forest management, as well
as self-monitoring messages.
The standardization forum is currently working on a forest data update message and the first
official version of the message is to be released during 2018. Additionally, in autumn 2018,
the forest information standard compliance with the Y platform developed by the Population
Register Center will be explored, a redesign of a wide-ranging special feature code will be
planned and the interface between the digitized forest management recommendations and
the forest information standardization will be considered.
3.2.4 Fishery ontologies and standards
There are fewer established ontologies and standards in the Fishery domain, but in particular
FAO, the Food and Agriculture Organization of the United Nations has established a Fisheries
Glossary.
The FAO Fisheries Glossary has been jointly upgraded by the Fisheries and Aquaculture
Department and the Meeting Programming and Documentation Service. This upgrade stems
from the need to have it become an integral part of the FAO Term Portal. It includes additional
8 Forestry oriented standards - https://www.metsatietostandardit.fi/en/.
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 37
features, languages, and access to alternative definitions for currently existing terms in the
FAO Term Portal. As at October 2014, the FAO Fisheries Glossary consists of approximately
1580 terms and definitions, grouped by subject areas, with relevant language equivalents
being developed when new terms are added
(http://www.fao.org/faoterm/collection/fisheries/en/).
In addition, there is a recent CEN CENELEC Workshop on Aquaculture, that might be relevant
also for some of the DataBio activities, https://www.cen.eu/News/Workshops/Pages/WS-
2016-14.aspx .
The UN/CEFACT FLUX (Fisheries Language for Universal eXchange) standards for information
exchange is designed to overcome the barrier with diverse national reporting standards.
Figure 6: The FLUX standards and status (from UN ESCAP presentation of Dr Heiner Lehr) [REF-37].
The type of data exchanged include:
• Information between stakeholders on stocks, quotas and catches
• Real time monitoring of vessel positions (VMS) and on-going fishing activities
• Reporting of fish landed and sales
• Vessel data and characteristics
• License and fishing authorisation requests
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 38
3.3 Data access through standard services and APIs Besides taking advantage of existing standard services and interfaces in the Geospatial and
Earth Observation area, DataBio will also look into the usage and promotion of suitable APIs
for data access and other services.
3.3.1 Geospatial Standards, Data Types and Services
3.3.1.1 OGC View Services
View services make it possible to display, navigate, zoom to, or overlay spatial datasets and
to display legend information and any relevant content of metadata (EU Commission
DIRECTIVE 2007/2/EC, Art. 11.1 b).
A Web Map Services (WMS) provides geodata in the form of georeferenced image data in
raster or vector image formats, such as Portable Network Graphics (PNG), Graphics
Interchange Format (GIF) or Scalable Vector Graphics (SVG). In a configuration step of the
WMS, it is also possible to query attribute information stored in an image coordinate.
The Web Map Tile Service (WMTS) enables application to serve map tiles of spatially
references data using tile images with predefined content, extent and resolution. It can be
used to develop scalable, high performance services for web-based distribution of
cartographic maps.
3.3.1.2 OGC Download Services
Download services, enabling copies of spatial datasets, or parts of such sets, to be
downloaded and, where practicable, accessed directly (EU Commission DIRECTIVE 2007/2/EC,
Art. 11.1 c). A download service supports either the complete transfer of a geodataset or the
access to individual objects. The downloaded data is available to the user on his own IT system
and can be further processed if appropriate rights have been granted.
A Web Feature Service (WFS) provides a web-based access to vector-based objects or data.
New data models should be created exclusively on GML version 3.2. This service may be
limited to download predefined datasets without further individual query or selection
possibility of the contents (see http://www.opengeospatial.org/standards/sensorml).
The Web Coverage Service (WCS) provides georeferenced raster data, in particular of multi-
dimensional data stocks which represent phenomena with spatial or temporal variability. It
includes e.g. earth observation, height models or temperature distribution (see
http://www.opengeospatial.org/standards/sensorml ).
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 39
3.3.1.3 Other Services
In addition to the already mentioned OGC services or interfaces, respectively, there are
service dealing with geospatial data which don’t implement these standards. In particular, if
it comes to semi- or even unstructured data, different approaches might become more
feasible.
Representational State Transfer (REST) does not describe a specific standard but rather an
architectural style for distributed hypermedia systems. REST does not suggest any specific
protocol or data format. Nevertheless, HTTP and JSON is widely used for such services.
Vector Data Formats
Geography Markup Language (GML) is a format focused on, but not exclusively, describing
vector data based on the Extensible Markup Language (XML). Since version 3 it is possible to
use extensions e.g. for coverages. GML does limit the description of geospatial objects as 2D-
and 3D-data only, but allows the inclusion of other information such as temporal data. This is
the preferred data format to be served by an OGC Web Service.
GeoJSON allows to describe and exchange geospatial information based on the JavaScript
Object Notation (JSON). While limited to 2D-Data, it provides support for a variety of different
geometry types such as Points, Lines and Polygons. Beside the geometric information an
object can hold additional properties to describe features. These objects are called feature
objects. Furthermore, GeoJSON allows to define so called FeatureCollections containing a set
of different features.
Well-known text (WKT) is a simple text-based markup language to describe geospatial
information. Originally described by the OGC, the current standard is specified by ISO/IEC
13249-3:2016 and ISO 19162:2015. Unlike GeoJSON, it is possible to describe not only 2D
features, but 3D features as well. This format is widely used to add geospatial information to
table-structured data such as SQL Databases or CSV files (comma-separated values).
3.3.2 Sensor Standards, ontologies, data representations
3.3.2.1 OGC Sensor Observation Service
The Sensor Observation Service (SOS) is an OGC standard and describes web services to store
and to query real-time sensor data and sensor data time series. SOS is part of the Sensor Web
Enablement. The offered sensor data comprises descriptions of sensors themselves, which
are encoded in the Sensor Model Language (SensorML, see below), and the measured values
in the Observations and Measurements (O&M) encoding format. The web service as well as
both file formats are open standards and specifications of the same name defined by the
Open Geospatial Consortium (OGC). If the SOS supports the transactional profile (SOS-T), new
sensors can be registered on the service interface and measuring values be inserted. A SOS
implementation can be used both for data from in-situ as well as remote sensing sensors.
Furthermore, the sensors can be either mobile or stationary.
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 40
3.3.2.2 OGC Sensor Model Language
SensorML is an OGC standard and provides standard models and an XML encoding for
describing sensors and measurement processes. SensorML can be used to describe a wide
range of sensors, including both dynamic and stationary platforms and both in-situ and
remote sensors. It provides a provider-centric view of information in a sensor web, which is
complemented by Observations and Measurements (O&M) which provides a user-centric
view. Functions supported include:
• sensor discovery,
• sensor geolocation,
• processing of sensor observations,
• a sensor programming mechanism,
• subscription to sensor alerts.
Latest version of the standard is 2.0 published in the year 2012 (see
http://www.opengeospatial.org/standards/sensorml).
3.3.2.3 OGC SensorThings API
SensorThings API is an OGC standard providing an open and unified framework to
interconnect IoT sensing devices, data, and applications over the Web. It is an open standard
addressing the syntactic interoperability and semantic interoperability of the Internet of
Things. It complements the existing IoT networking protocols such CoAP, MQTT, HTTP,
6LowPAN. While the these protocols are addressing the ability for different IoT systems to
exchange information, OGC SensorThings API is addressing the ability for different IoT
systems to use and understand the exchanged information. As an OGC standard,
SensorThings API also allows easy integration into existing Spatial Data Infrastructures or
Geographic Information Systems.
Latest version of the standard is 1.0 published in the year 2015.
3.3.2.4 ISO 19156:2011 Geographic information - Observations and measurements
O&M standard defines a conceptual schema for observations, and for features involved in
sampling. The standard provides models for the exchange of information describing
observation acts and their results, both within and between different scientific and technical
communities. Observations commonly involve sampling of a feature-of-interest. The standard
defines a common set of sampling feature types classified primarily by topological dimension,
as well as samples for ex-situ observations. The schema includes relationships between
sampling features (sub-sampling, derived samples). The standard concerns only externally
visible interfaces and places no restriction on the underlying implementations other than
what is needed to satisfy the interface specifications in the actual situation.
The last version of the standard was published in the year 2011.
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 41
3.3.2.5 W3C Semantic Sensor Network Ontology
This W3C ontology describes sensors and observations, and related concepts. It does not
describe domain concepts, time, locations, etc. these are intended to be included from other
ontologies via OWL imports. This ontology is developed by the W3C Semantic Sensor
Networks Incubator Group (SSN-XG). The ontology is based around concepts of systems,
processes, and observations. It supports the description of the physical and processing
structure of sensors. Sensors are not constrained to physical sensing devices: rather a sensor
is anything that can estimate or calculate the value of a phenomenon, so a device or
computational process or combination could play the role of a sensor. The representation of
a sensor in the ontology links together what it measures (the domain phenomena), the
physical sensor (the device) and its functions and processing (the models).
Last version of the SSN ontology was published in the year 2011.
3.3.2.6 NGSI-9/10
The FI-WARE version of the Open Mobile Alliance (OMA) NGSI-9 interface is a RESTful API via
HTTP. Its purpose is to exchange information about the availability of context information.
The three main interaction types are:
• one-time queries for discovering hosts (also called 'agents' here) where certain context information is available
• subscriptions for context availability information updates (and the corresponding notifications)
• registration of context information, i.e. announcements that certain context information is available (invoked by context providers).
The FI-WARE version of the OMA NGSI 10 interface is a RESTful API via HTTP. Its purpose is to
exchange context information. The three main interaction types are:
• one-time queries for context information
• subscriptions for context information updates (and the corresponding notifications)
• unsolicited updates (invoked by context providers).
3.3.2.7 IoT Architecture -Thing, Resource, Entity
IoT-Lite Ontology (http://iot.ee.surrey.ac.uk/fiware/ontologies/iot-lite). Surprisingly, there
are no standards with regards to events. As a result, each event processing tool has its own
programming model and semantics. The same goes for data representation of events.
3.3.3 API approach
The API approach is largely tested and relatively well used. There are many categories of APIs;
web-based system (e.g. REST), operating system (e.g., Cocoa), database system (e.g., Django)
and hardware system. APIs typically include three elements: access control, request
(operation and parameters) and response (data/service).
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 42
Lately, businesses have changed their view on API from a technology to a business enabler.
Gartner9 introduces the concept of “API economy” together with “Digital business”. APIs
allows data to be provided and consumed across platforms, systems and services using
standards in a secure and reliable manner. However, there are some challenges with security
and efficiency related to the use of API as a data access mechanism.
A successful example of API-based data access is Transport for London: sharing 200 data
elements through an API10. The API is used by 600 different apps that 42% of London’s
population use11.
3.4 Stakeholders and concerns Using ArchiMate as a specification tool in the DataBio project, each dataset/datastream is
related explicitly to a set of pilot systems, stakeholders, components and/or pipelines. The
ArchiMate motivation and strategy diagrams specify the goals, drivers and outcomes of each
pilot system, indicating the relevance and use of the datasets/streams. Figure 7 shows a
strategy diagram from the B2 fishery pilots where the goals and outcomes are realized
through extensive data collection and processing.
Figure 7: ArchiMate strategy diagram showing how the pilot system will realize the defined goals
Furthermore, ArchiMate is used to model pilot applications that realize outcomes. Figure 8
shows how the “Provide decision support for pelagic fisheries planning” (shown in Figure 7)
9 https://www.gartner.com/smarterwithgartner/ 10 https://api.tfl.gov.uk/ 11 https://tfl.gov.uk/info-for/open-data-users/open-data-policy
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 43
is supported by a set of process steps, including datasets, stakeholders and interactions. The
application diagram identifies EO Data, Vessel Operation Data, Meteorological Forcast and
Catch reports as required datasets/streams.
Figure 8: ArchiMate business diagram showing the data processing, datasets and actors involved
Each dataset can then be broken down into subsets (from ArchiMate Business Objects to
ArchiMate DataObject) as shown in Figure 9.
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 44
Figure 9: ArchiMate data view for one of the fishery pilots (B2)
The pilots are realized both from datasets and DataBio components. Each pilot system utilizes
a set of components to implement the required big data processing steps: collection,
preparation, analysis, visualization and access. Figure 10 shows how the B2 fishery pilot is
designed.
Figure 10: The B2 fishery pilot lifecycle view showing how data is provided as input to processing steps
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 45
To further specify the application architecture, a pipeline view is created for each pilot
system. The pipeline shows the component and dataset interfaces. Figure 11 shows the B2
fishery pilot pipeline with all its components and datasets.
Figure 11: The B2 fishery pilot pipeline view showing how datasets are interfaced
All pilots in DataBio are modelled in ArchiMate following this methodology. This allows for
traceability from stakeholder and goal to application realization:
• A stakeholder has a goal that will have an outcome
• An outcome is created from a set of actions
• An action requires a set of resources
• A resource can be a dataset or component (processing)
• Datasets and components are combined in an architecture through interfaces and responsibilities.
Using Softeam’s Modelio software, users can navigate through the DataBio ArchiMate models
for pilots and components to understand, compare and document the system/subsystems.
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 46
3.5 License models for data reuse There exists a wide range of licencing schemes for publishing datasets. E.g, data.world lists
13 common schemes ranging from the most open to the most restrictive [REF-21]. These
licences are typically Creative Common (CC) licenses, which origins from the Open Source
domain.
In addition to the more or less open models, there are several models for commercial
licensing of closed datasets for b-b and b-g purposes, including International Data Spaces
(IDS), Unified eXchange Platform UXP) and Sharemind from Cybernetica.
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 47
Requirements view 4.1 Types of EO data and sensors used in the DataBio pilots and their
characteristics Remote sensing is one of the most common ways to extract relevant information about the
Earth and our environment. Remote sensing acquisitions, done through both active (synthetic
aperture radar, LiDAR) and passive (optical and thermal range, multispectral and
hyperspectral) sensors, provide a variety of information about the land and ocean processes.
Different types of Earth Observation data have been developed over the last forty years
bringing significant changes in the context of the Big Data concept.
A typical Big Data application chain may require EO input data in addition to other sensor data
as depicted below.
Figure 12: EO Data Collection Context
A significant part of the 26 DataBio pilots use EO (Earth Observation) data as input for their
specific purposes, in the context of efficient resource use and increasing productivity in
agriculture, forestry and fishery. The general data types, including EO data, used in DataBio
pilot projects are listed in the table below.
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 48
Table 2: Types of data used in DataBio pilot projects N
o. o
f p
ilot Category
Lea
der
Name of
pilot
Partn
ers
AOI Data used in the pilot
1
AG
RIC
ULT
UR
E
A. P
reci
sio
n H
ort
icu
ltu
re in
clu
din
g vi
ne
and
oliv
es A1. Precision
agriculture in
olives, fruits,
grapes and
vegetables N
P A1.1
Precision
agriculture
in olives,
fruits,
grapes
NP,
GAIA
Epiche
irein
Greece (Pilot Site A:
Chalkidiki - 600 ha,
Pilot Site B: Stimagka
- 3 000 ha, Pilot Site
C: Veria - 10 000 ha)
data directly from the field,
collected from a network
of telemetric IoT stations
called GAIAtrons; remotely
with image sensors on in-
orbit platforms; and by
monitoring the application
of inputs and outputs in
the farm (e.g. in-situ
measurements, farm logs,
farm profile)
2 A1.2
Precision
agriculture
in vegetable
seed crops
C.A.C.,
VITO
Eastern Italy.
Location: 5 farms,
Emilia Romagna
Region, for the total
acreage of 14,79
hectares in the first
year. To be expanded
to other crops in the
same Region and in
Region Marche.
satellite imagery, weather
and soil data and
yield/seed maturity
predictions
3 A1.3
Precision
agriculture
in
vegetables -
2 (Potatoes)
NB
Advies
, VITO
Veenkoloniën region
in the Netherlands
historical yield data - field
characteristics (sample
data yield data, potato
varieties, planting data
etc.), historical earth
observation data
4 A2. Big Data
management
in
greenhouse
eco-systems
A2.1 Big
Data
manageme
nt in
greenhouse
eco-
systems
CREA,
CERTH
greenhouse
horticulture in the
Thessali Region,
Greece
experimental data: whole
genome genotypic data,
metabolomics and
phenomic (lab) data;
observational data:
phenomics (field), sensor
data, environmental
indoor and outdoor
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 49
5
B. A
rab
le P
reci
sio
n F
arm
ing B1. Cereals
and biomass
crops
Vit
o B1.1
Cereals and
biomass
crops 1
TRAGS
A
Cabreros del Río,
Castile - Leon, Spain,
“Ribera del Porma”
Farmers Community:
24.270 ha
high resolution (Sentinel-2
type) satellite images,
complemented with sensor
data and, in some specific
cases, with RPAS
(Remotely Piloted Aircraft
Systems) data and external
data
6 B1.2
Cereals and
biomass
crops 2
NP,
GAIA
Epiche
irein
Elassona, Greece-
2500 ha of maize as
targeted crops
data directly from the field,
collected from a network
of telemetric IoT stations
called GAIAtrons; remotely
with image sensors on in-
orbit platforms; and by
monitoring the application
of inputs and outputs in
the farm (e.g. in-situ
measurements, farm logs,
farm profile)
7 B1.3
Cereals,
biomass
crops 3
(Biomass
crops
monitoring
and
performanc
e
predictions)
CREA,
VITO,
NOVA
MONT
24 sites in Emilia
Romagna, Italy (120
ha) - CREA sorghum
pilot, 3 sites in Emilia
Romagna and
Veneto, Italy (6 ha) -
CREA fiber hemp
pilot, 4 sites in North
and South-Western
Sardinia, Italy (65 ha)
- NOVAMONT
cardoon pilot
satellite imagery,
telemetry IoT data (air
temperature, air moisture,
solar radiation, leaf
wetness, rainfall, wind
speed and direction, soil
moisture, soil temperature,
soil EC / salinity, PAR,
barometric pressure),
phenotypic data collected
for each cropping season
8 B1.4
Cereals,
biomass
crops 4
(Cereal crop
monitoring)
LESPR
O
8300 ha - Rostenice
(Vyskov, Czech
Republic); target
crops: cereals -
winter wheat, spring
barley, grain maize
EO data (Landsat 8 -
Landsat data repository -
(https://espa.cr.usgs.gov),
Sentinel 2A/B -
(https://scihub.copernicus.
eu/), Google Earth Engine
platform for fast viewing
EO data:
(https://earthengine.googl
e.com/), field boundaries
from Czech LPIS database
as shp or xml
(http://eagri.cz/public/app
/eagriapp/lpisdata/),
ortophotos, topography
maps, cadastral maps – as
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 50
WMS service, farm data -
Crop rotation, crop
treatments records, yield
maps, soil maps
9 B2.
Machinery
management
and
environment
al issue
B2.1
Machinery
manageme
nt
LESPR
O,
ZETOR,
Federu
AOI in Czech Republic telemetry data from
machinery, other farm
data
10
C. S
ub
sid
ies
and
Insu
ran
ce C1.
Insurance e-
GEO
S C1.1
Insurance
NP 12000 ha in North
Greece - targeted
crops: 7 types
(wheat, stone fruits
etc.)
EO data, field data (soil
temperature, humidity -
multi-depth, ambient
temperature, humidity,
barometric pressure, solar
radiation, leaf wetness,
rainfall volume, wind
speed and direction),
historical and current
weather data, via the IoT
strations network,
enriched with yield data
information extracted from
the work calendar and
stored in the NP’s cloud
infrastructure
11 C1.2 Farm
Weather
Insurance
Assessment
e-
GEOS
AOI in Italy Copernicus satellite data
series, meteorological
data, other ground
available data
12 C2. CAP
Support
C2.1 CAP
Support
e-
GEOS,
TerraS,
Tragsa
AOI in Northern Italy
(50.000 ha) - 2
targeted crop types,
AOI in Southeastern
Romania (10.000
sqkm.) - 3 - 10 crop
types
data related to parcel
information and provided
by the users, satellite
optical and SAR data, in-
situ / field data
13 C2.2 CAP
Support -
Greece
NP,
GAIA
Epiche
irein
AOI in Northern
Greece (50.000 ha) -
2 targeted crops: dry
beans, peaches
data directly from the field,
collected from a network
of telemetric IoT stations
called GAIAtrons; remotely
with image sensors on in-
orbit platforms (EO data),
anonymized IASC data
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 51
14
FOR
ESTR
Y
A. M
ult
iso
urc
e an
d d
ata
cro
wd
sou
rcin
g /
e-s
erv
ice
s A1. Easy data
sharing and
networking
MH
GS 2.2.1 Easy
data
sharing and
networking
MHGS,
VTT,
SPACE
BEL,
METSA
K, FMI
two estates, called
“Rangunkorven
yhteismetsä” and
“Taipale”, both
located in Central
Finland; follows up
the implementation
according to the
defined specifications
in Czech Republic and
Belgium by WP2
partners
forestry data transferred
via the Finnish forestry
standard XML format , real
time updates from field
measurements, forest
owners’ forest
management plans and
other notifications from
forest owners, forestry
operators and other
stakeholders; processed
data: forest estate,
geometry of
compartments, type of the
forest work, sample plot
locations, measured data
per sample plot,
measurement averages per
compartment,
measurement date and
user information; control
significant vegetation
changes, such as clear-cuts
and forest damage areas to
act in time
15 A2.
Monitoring
and control
tools for
forest
owners
2.2.2
Monitoring
and control
tools for
forest
owners
MHGS,
FMI,
TRAGS
A,
METSA
K
AOI in Finland forestry data, real time
updates from field
measurements, forest
owners’ forest
management plans and
other notifications from
forest owners, forestry
operators and other
stakeholders; processed
data: forest estate,
geometry of
compartments, type of the
forest work, sample plot
locations, measured data
per sample plot,
measurement averages per
compartment,
measurement date and
user information; control
significant vegetation
changes, such as clear-cuts
and forest damage areas to
act in time
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 52
16
B. F
ore
st H
ealt
h /
Re
mo
te /
Cro
wd
sen
sin
g, In
vasi
ve s
pec
ies
/ d
amag
e B1. Forest
damage
remote
sensing
TRA
GSA
2.3.1 Forest
damage
remote
sensing
MHGS,
VTT,
SENOP
,
METSA
K,
SPACE
BEL
the main
demonstration areas
are the Hippala and
Rangunkorpi forest
plots in South-
Eastern Finland; in
Wallonia, FMI, with
the support of
Spacebel, aims to
develop a remote
sensing service to
provide a spatial
distribution of the
vulnerability and risk
exposure to diseases
and other potential
hazards based on
Sentinel-2 or
Sentinel-1+Sentinel-2
EO data (in particular
optical Sentinel-2 satellite
data), precise data from
airborne and field
measurements, used to
train and validate the
method
17 2.3.2-FH
Monitoring
of forest
health
TRAGS
A,
SENOP
,
CSEM,
CiaoT,
FMI,
VTT
large areas in the
Iberian Peninsula -
Spain (Extremadura,
Andalucia, Castilla y
León, Castilla La
Mancha, Madrid
remote sensing images
(satellite + aerial + UAV),
field dat
18 B2. Invasive
alien species
control –
plagues –
forest
management
2.3.2 IAS -
Invasive
alien
species
control and
monitoring
TRAGS
A,
SENOP
,
CSEM,
CiaoT,
FMI,
VTT
Spain - the Iberian
Peninsula, the Canary
Islands and the
Balearic Islands
EO data (Sentinel 2,
Landsat 8), several
alphanumeric Big Data
databases - centralized
data - WORLDCLIM dataset
(provided by the
International Journal of
Climatology - 19
bioclimatic raster layers
with a resolution of 1 km),
foreign trade database
from Spanish Finance
Ministry, Immigration
Database by Spanish
Statistical Institute,
tourism dataset from
Ministry of Energy,
Tourism and Digital
Agenda, GHS - population
grid (developed by JRC),
Spanish terrestrial
transport netword (ESRI
shp), provided by the
National Geographic
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 53
Institute), NUTS-2, NUTS-3,
Municipalities maps from
GADM - Global
Administrative Areas
19
C. F
ore
st d
ata
man
agem
ent
serv
ice
s C1. Web-
mapping
service for
government
decision
making
MET
SAK
2.4.1 Web-
mapping
service for
government
decision
making
FMI,
VTT,
SPACE
BEL
Czech Republic,
Wallonia (Belgium)
Sentinel-2 satellite data,
distributed by European
Space Agency, forest
management maps, in-situ
LAI (Leaf-area index)
observations - in-situ data
from total 189 forest plots
with varying species
composition and structures
20 C2. Shared
multiuser
data
environment
2.4.2
Shared
multiuser
data
environmen
t
METSA
K, VTT
Finland centralized forest resource
data - original data source
for forest resource data
can be laser scanning, field
measurement, growth
modelling or notification
from forest owner or
forestry operator. Other
data sources for Kemera
financing data, forest use
declarations, access and
authorization.
21
FISH
ERY
A. F
ish
ing
vess
els
imm
edia
te o
per
atio
nal
ch
oic
es A1. Oceanic
tuna
fisheries
immediate
operational
choices
SIN
TEF
Fish
ery A1. Oceanic
tuna
fisheries
immediate
operational
choices
EHU-
UPV
South Atlantic, Indian
Ocean
EO data (Sentinel 3,
CMEMS products), data
from on board monitoring
systems / fleet sensor
observations (vessel
engines sensors - velocity
and heading, position of
the vessel, fish catches -
species, weight), weather
and sea condition
information.
22 A2. Small
pelagic
fisheries
immediate
operational
choices
A2. Small
pelagic
fisheries
immediate
operational
choices
SINTEF
Ocean
small pelagic fishing
fleet, covering the
North Atlantic Ocean
time series measurements
collected from a variety of
sources (power system,
navigation system,
weather sensors, deck
machinery), sonar /
hydroacoustic data; EO
data evaluated for
inclusion in the pilot
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 54
23
B. F
ish
ing
vess
el t
rip
an
d f
ish
eri
es p
lan
nin
g B1. Oceanic
tuna
fisheries
planning
AZT
I B1. Oceanic
tuna
fisheries
planning
AZTI South Atlantic, Indian
Ocean
EO data, large datasets of
historical data (logbooks,
VMS, GPS, Buoys,
Observers), fuel
consumption data,
captures data, weather
forecast
24 B2. Small
pelagic
fisheries
planning
B2. Small
pelagic
fisheries
planning
SINTEF
Ocean
small pelagic fishing
fleet, typically
covering the North
Atlantic Ocean
extensive datasets within
fisheries activity and catch
statistics, combined with
information from that time
and history of the same
such as meteorological and
oceanographic data
(meteorological and
oceanographic hindcasts
and forecasts), moon
phase, time of day, time of
year, sonar data
25
C. F
ish
erie
s su
stai
nab
ility
an
d v
alu
e C1. Pelagic
fish stock
assessments
SIN
TEF
Fish
erie
s C1. Pelagic
fish stock
assessment
s
SINTEF
Fisheri
es
northeast Atlantic;
Norwegian coast
hydroacoustics,
oceanographic and
meteorological data (ocean
surface currents,
temperatures etc.),
collected in-situ or through
remote sensing, estimates
of fish species and
densities, catch reports,
oceanographic simulations,
stock simulations
26 C2. Small
pelagic
market
predictions
and
traceabilit
C2. Small
pelagic
market
predictions
and
traceability
SINTEF
Fisheri
es
the small pelagic
fisheries in the North
Atlantic Ocean
centralized data: market
trends by the World bank
and Norwegian Seafood
Council (market insight
data, statistics, trade
information, consumption
and consumer insight),
pelagic auction data (a
database containing
information about all
pelagic catches landed in
Norway in the last
decades), provided by
Norges Sildesalgslaget,
distributed/local data: fish
stock observations
(hydroacoustic and sonar
instruments), quality
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 55
measurements, vessel
operations data (motion
and cost of operation)
As datasets, they can also be grouped as:
1. Existing datasets utilized by DataBio pilots: datasets that are available and have
relevance for the pilot systems in DataBio. The DataBio project demonstrates its
usefulness and provide recommendations for use.
2. Existing datasets that the DataBio project has improved in terms of easier or better
findability, accessibility, interoperability or reusability.
3. New datasets created by the DataBio project by combining or processing existing data
sources.
Subsequently, the types of EO data and sensors (classified into optical and SAR data) used in
the DataBio pilots are presented in terms of their main features: objectives of the mission,
spatial, temporal and radiometric resolution, coverage, data access etc., with special regard
on the aim of using these EO data in pilots, including derived EO products/results.
4.2 Datasets and datastream requirements from Platform This section describes the platform requirements that are related to EO datasets and
datastreams. Each requirement (EO-xxxxxx) has a textual description, zero to more
implementations in DataBio, and one or more relationships to requirements specified in the
pilots. Full details and navigation are provided in the ArchiMate models.
ID Requirement
EO-441020 The DataBio Platform shall discover EO metadata through interfaces compliant with the OGC 13-026r8 specification.
Implementations N/A
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 56
Derived from
EO-441031 Discover available historical EO products
Implementations
Derived from
EO-441032 Discover extreme weather data
Implementations N/A
Derived from
EO-441040 The Proba-V data shall be discoverable using an Opensearch interface which can be integrated in FedEO.
Implementations N/A
Derived from
EO-442020 The interface to access the catalog where the Sentinel-2 data is stored (if stored remotely) shall be granted to the pilots.
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 57
Implementations N/A
Derived from
EO-442030 The Proba-V data shall be accessible using the product URLs as returned in the OpenSearch responses (discovery step).
Implementations N/A
Derived from
EO-444010 The Proba-V MEP platform should provide a processing cluster allowing parallel computing and data analytics on Proba-V data and selected Sentinel-2 derived vegetation indices at country/region range.
Implementations N/A
Derived from
4.3 Datasets and datastream requirements from Agriculture pilots This section describes the Agriculture pilots’ requirements that are related to datasets and
datastreams. Each requirement (R1.x.y_z) has a textual description, zero to more
implementations in DataBio, and one or more relationships to requirements specified in the
pilots. Full details and navigation are provided in the ArchiMate models.
ID
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 58
R1.2.1_6 A pilot needs the growth model
Implementations
Derived platform requirements
R1.2.1_7 A pilot needs EO data (historical and current).
Implementations
Derived platform requirements
R1.2.1_8 A pilot needs weather data (historical and current)
Implementations
Derived platform requirements
R1.3.1_4 A pilot need that the current solution has to be improved, developed and scaled from 34KHa to several municipalities and NUTS-2 level
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 59
Implementations
Derived platform requirements
R1.3.1_6 A pilot needs availability of historical and actual EO data (including vegetation indices(e.g., NDVI, EVI, NDRE, NDMI)
Implementations
Derived platform requirements
R1.3.1_7 A pilot needs analysis on EO, DEM, soil and crop data by applying machine learning algorithms to identify management zones within the fields and its export in vector format (shp, isoxml)
Implementations
Derived platform requirements
R1.3.1_8 A pilot needs analysis of spatial variability of crop status and alerting service
Implementations
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 60
Derived platform requirements
R.1.3.1_9 A pilot needs reporting - by field or aggregated for crop type
Implementations
Derived platform requirements
R.1.3.1_11 A pilot needs: (1) Components enabling to harness satellite data for applications in farm telemetry, with particular interest in Crop Monitoring and Predictions. (2) Components for crop monitoring and real-time analytics using real-time streaming data from wireless sensor networks; capability to trigger alarm/notifications/recommendations in order to improve farm operations and productivity
Implementations
Derived platform requirements
R.1.4.1_3 A pilot needs availability of current and historical EO data (including for example vegetation indices such as NDVI,LAI)
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 61
Implementations
Derived platform requirements
R.1.4.1_4 A pilot needs availability of weather data (integrated together with weather stations data). Parameters will be temperature, rainfall and humidity.
Implementations
Derived platform requirements
R.1.4.1_5 A pilot needs analysis on historical EO and weather data by applying machine learning algorithms to assess the impact of the bad weather conditions
Implementations
Derived platform requirements
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 62
4.4 Datasets and datastream requirements from Forestry pilots This section describes the Forestry pilot requirements that are related to datasets and
datastreams. Each requirement (R2.x.y_z) has a textual description, zero to more
implementations in DataBio, and one or more relationships to requirements specified in the
pilots. Full details and navigation are provided in the ArchiMate models.
ID
R2.2.2_1 A pilot needs damage & quality reporting features to the Wuudis mobile app (MHG), Needs standard development (METSAK), Integrations
Implementations
Derived platform requirements
R2.3.1_1 A pilot needs new satellite and RS map layers provided via WMS/WMTS interface, Customizable map layers development to the Wuudis (MHG), Real-time forest management service development based on multiple forest big data sources
Implementations
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 63
Derived platform requirements
R2.3.2_1 A pilot needs learning about methodologies to assess and monitor forest health status
Implementations
Derived platform requirements
R2.4.1_1 A pilot needs shared repository of Sentinel-1 and Sentinel-2 satellite images.
Implementations
Derived platform requirements
R.2.4.1_2 A pilot needs cloud environment with components for satellite data pre-processing (components FMI 1-4)
Implementations
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 64
Derived platform requirements
R2.4.2_2 A pilot needs XML standard development (METSAK) for the forest damages and forest stand information update, Integrations and X-road approach for data transfer services as well as development of the data visualization/ map service for the forest damage information
Implementations
Derived platform requirements
4.5 Datasets and datastream requirements from Fishery Pilots This section describes the Fishery pilots’ requirements that are related to datasets and
datastreams. Each requirement (R3.x.y_z) has a textual description, zero to more
implementations in DataBio, and one or more relationships to requirements specified in the
pilots. Full details and navigation are provided in the ArchiMate models.
ID Requirement Implementations
R3.3.1_1 A pilot needs satellite data streams of sea surface temperature, sea surface salinity, sea level anomalies, ice concentrations, chlorophyll-a concentrations.
Implementations
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 65
Derived platform requirements
R3.3.1_2 A pilot needs ocean current simulation data streams.
Implementations
Derived platform requirements
R3.3.1_3 A pilot needs buoys data and position of the vessel
Implementations
Derived platform requirements
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 66
R3.3.2_1 A pilot needs meteorological data to be available in the vessel power system
Implementations
Derived platform requirements
R3.3.2_2 A pilot needs meteorological data to be collected by interfacing with existing sensors, or new sensors provided
Implementations
Derived platform requirements
R3.3.2_3 A pilot needs meteorological data to be collected by interfacing with existing sensors, or new sensors provided
Implementations
Derived platform requirements
R3.3.2_4 A pilot needs satellite data streams of sea surface temperature, sea
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 67
surface salinity, sea level anomalies, ice concentrations, chlorophyll-a concentrations.
Implementations
Derived platform requirements
R3.4.1_1 A pilot needs missing data sources including fishery-dependent data, fishery-independent data, oceanography.
Implementations
Derived platform requirements
R3.4.1_2 A pilot needs fishery-dependent data: landed catch (Sildes, ICES), scientific surveys (IMR), ERS (Norwegian directorate of fisheries)
Implementations
Derived platform requirements
R3.4.1_3 A pilot needs fishery-independent data: Publically available scientific survey data, hydro acoustics from fishing vessels (perhaps through ratatosk C17.01)
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 68
Implementations
Derived platform requirements
R3.4.1_4 A pilot needs oceanographic data: Satellite streams of sea surface temperature, sea surface salinity, sea level anomalies, ice concentrations, chlorophyll-a concentrations. (ICES, met.no, SPACEBEL, ..)
Implementations
Derived platform requirements
R3.4.2_2 A pilot needs machine learning & data analysis components for finding covariations (multivariate/PCA analysis) and estimating price prediction models.
Implementations
Derived platform requirements
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 69
Datasets: existing, improved, new and others This section presents the datasets identified by the DataBio projects as relevant for the
selected domains, agriculture, forestry and fishery. The datasets are grouped into three
sections based on their availability:
1. Existing datasets utilized by DataBio pilots: datasets that are available and have
relevance for the pilot systems in DataBio. The DataBio project demonstrates its
usefulness and provide recommendations for use
2. Existing datasets that the DataBio project has improved in terms of easier or better
findability, accessibility, interoperability or reusability.
3. New datasets created during the DataBio project by collecting new data or combining
or processing existing data sources.
4. Other datasets that might be of (future) relevance to DataBio pilots or similar systems.
Please note that many datasets are missing some parameters in the description. The datasets
are continuously being added to the DataBioHub and most of the parameters will be included
as they are harvested automatically from the data source.
The datasets are presented with the available metadata. The full metadata template structure
is provided in Appendix A.
5.1 Existing datasets utilized by DataBio Pilots
5.1.1 Open Transport Map (UWB - D03.02)
Field Value
Internal Name of
the Dataset
D03.02
Name of the
Dataset/API
Provider
Open Transport Map
Short Description The Open Transport Map displays a road network which
– is suitable for routing –
– visualizes average daily Traffic Volumes for the whole EU –
– visualizes time related Traffic Volumes (in OTN Pilot Cities - Antwerp,
Birmingham, Issy-le-Moulineaux, Liberec region) –
Talking technical, the Open Transport Map
– can serve as a map itself as well as a layer embedded in your map –
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 70
– is derived from the most popular open dataset - OpenStreetMap –
– is accessible via both GUI and API –
– covers the whole European Union –
Version 1.0
Initial Availability
Date
07.03.2017
Data Type geographic data
Personal Data no
Rightsholder Plan4all
Other Rights
Information
Open Data Commons Open Database License (ODbL)
Dataset/API
Owner/Responsibl
e
UWB
Dataset/API
Owner/Responsibl
e Contacts
Technology
Name of the
System
Open Transport Map
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 71
Dataset Data
Model/API
Interface
GUI, WMS, WFS, shapefile, all described at http://opentransportmap.info
Data Model:
Standards,
Glossaries and
metadata
standards
WMS, WFS, shapefile, PostGIS
Data Identifier -
Standard used
Data Model -
Specific Data
Model
http://opentransportmap.info/img/OTM_physicalModelAndCodelists.s
vg
Data Volume 20 Gb
Update Frequency irregularly
Data Archiving and
preservation
Geographical
Coverage
European Union
Timespan 2015-present
5.1.2 Forest resource data (METSAK - D18.01)
Field Value
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 72
Internal Name of the Dataset D18.01
Name of the Dataset/API
Provider
Forest resource data / MESTAK
Short Description The pilot uses METSAK’s forest resource data concerning
privately owned Finnish forests from METSAK’s forest
resource data system. The forest resource data consists of
basic data of tree stands (development class, dominant tree
species, scanned height, scanned intensity, stand
measurement date), strata of tree stands (mean age, basal
area, number of stems, mean diameter, mean height, total
volume, volume of logwood, volume of pulpwood), growth
place data (classification, fertility class, soil type, drainage
state, ditching year, accessibility, growth place data source,
growth place data measurement date), geometry and
compartment numbering. The forest resource data is
available in a standard format for external use with consent
of a forest owner.
Extended Description The forest resources are invented once in a decade per certain
area using remote sensing (airborne laser scanning) and aerial
photographs. The new data is analysed and in some parts
measured in the field. Other updates on the forest resource
data are yearly growth calculations, possible notifications of
forest use or other forestry operations or so called Kemera
financing operations and possible new aerial photographs to
be interpreted.
Version Oracle database and data model version 2.5.2.
Initial Availability Date from year 2010 onwards
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 73
Data Type Oracle database model
Personal Data User information
Rightsholder METSAK
Dataset/API
Owner/Responsible
Forest resource data / METSAK / Aapo Lindberg
Dataset/API
Owner/Responsible Contacts
Oracle database model/ [email protected]
Technology Oracle database
Name of the System Aarni
Data Model: Standards,
Glossaries and metadata
standards
Oracle database model for forest resource data
Data Identifier - Standard
used
N/A
Data Model - Specific Data
Model
Oracle database model
Data Volume 1984 GB
Update Frequency Online
Data Archiving and
preservation
Real time backup procedures as well as database copy once a month
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 74
Geographical Coverage Finland
Timespan Data available from year 2010 onwards
Access Level METSAK users
Access Mechanism Active directory user management
5.1.3 Landsat 8 OLI data
Field Value
Internal Name of the
Dataset
Landsat 8 OLI
Name of the Dataset/API
Provider
NASA and the U.S. Geological Survey
Extended Description Landsat 8 (formerly the Landsat Data Continuity Mission,
LDCM), a collaboration between NASA and the U.S. Geological
Survey, provides moderate-resolution measurements of the
Earth’s terrestrial and polar regions in the visible, near-
infrared, short wave infrared, and thermal infrared. Landsat 8
provides continuity with the more than 40-year long Landsat
land imaging dataset. Landsat 8 carries two push-broom
instruments: The Operational Land Imager (OLI) and the
Thermal Infrared Sensor (TIRS).
The spectral bands of the OLI sensor provides enhancement
from prior Landsat instruments, with the addition of two
additional spectral bands: a deep blue visible channel (band 1)
specifically designed for water resources and coastal zone
investigation, and a shortwave infrared channel (band 9) for
the detection of cirrus clouds.
The TIRS instrument collects two spectral bands for the
wavelength covered by a single band on the previous TM and
ETM+ sensors.
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 75
Landsat 8 mission’s objectives are:
· to provide data continuity with Landsats 4, 5, and
7;
· to offer 16-day repetitive Earth coverage, an 8-
day repeat with a Landsat 7 offset;
· · to build and periodically refresh a global archive
of sun-lit, substantially cloud-free, land images.
Data Type Level 0 (L0) Data Products
Description of the products: L0 data products are image data
with all data transmission and formatting artefacts removed.
These products are time provided, spatial, and band-
sequentially ordered multispectral digital image data.
Level 1 Radiometric (L1R) Data Products
Description of the products: L1R data products consist of
radiometrically corrected image data derived from L0 data
scaled to at-aperture spectral radiance or reflectance. Level 1
Systematic (L1G) Data Products
Description of the products: L1G data products consist of L1R
data products with systematic geometric corrections applied
and resampled for registration to a cartographic projection,
referenced to the World Geodetic System 1984 (WGS84).
Level 1 Gt (L1Gt) Data Products
Description of the products: L1Gt data products consist of L1R
data products with systematic geometric and terrain
corrections applied and resampled for registration to a
cartographic projection, referenced to the WGS84.
Level 1 Terrain (L1T) Data Products
Description of the products: L1T data products consist of L1R
data products with systematic geometric corrections applied,
using Ground Control Points (GCPs) or onboard positional
information to resample the image data for registration to a
cartographic projection, referenced to the WGS84. The data
are also terrain corrected for relief displacement.
Level-2 Data Products
Description of the products: Surface Reflectance are available
on demand, courtesy of the USGS (U.S. Geological Survey).
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 76
They provide an estimate of the surface spectral reflectance as
it would be measured at ground level in the absence of
atmospheric scattering or absorption.
Landsat 8 Tier 1 data
Description of the products: They are the Landsat scenes with
the highest available data quality and are considered suitable
for time-series analysis. Tier 1 includes Level-1 Precision and
Terrain (L1TP) corrected data that have well-characterized
radiometry and are inter-calibrated across the different
Landsat instruments.
Landsat 8 Tier 2 data
Description of the products: Landsat 8 Tier 2 products are the
ones that do not meet the Tier 1 criteria during processing. Tier
2 includes Systematic Terrain (L1GT) and Systematic (L1GS)
processed data, as well as any L1TP data that do not meet the
Tier 1 specifications due to significant cloud cover, insufficient
ground control, and other factors.
Landsat 8 Real-Time data
Description of the products: The Real-Time Tier contains data
immediately after acquisitions that use estimated parameters.
Real-Time data are reprocessed and assessed for inclusion into
Tier 1 or Tier 2 as soon as final parameters are available.
Access Mechanism Landsat Level-1 data products are available for immediate
download.
There are several ways of accessing Landsat-8 Level 1
products:· EarthExplorer (https://earthexplorer.usgs.gov/) –
provides a graphical user interface to define areas of interest
(AOI) by place name, address, zip code or creating an AOI on
the interactive map. Queries can be applied to multiple
collections simultaneously. The Bulk Download Application is
an easy-to-use tool for downloading large quantities of
satellite imagery and geospatial data on Earth Explorer. Once
scenes are added to a Bulk Order via Earth Explorer, the Bulk
Download Application can be used to automatically retrieve
them with little to no user interaction and the application will
automatically iterate through the scene list and download
each until all have been processed.
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 77
· GloVis (https://glovis.usgs.gov/ );
· LandsatLook Viewer (https://landsatlook.usgs.gov/ ).
· Surface Reflectance and other Level-2 science
products are available on request through:
· USGS Earth Resources Observation and Science
(EROS) Center Science Processing Architecture (ESPA)
On Demand Interface;
· ESPA Application Programming Interface (API);
· EarthExplorer – allows ordering of only surface
reflectance (SR) data products.
5.1.4 Sentinel 3 OLCI (Ocean and Land Colour Instrument) data
Field Value
Internal Name of the
Dataset
Sentinel 3 OLCI
Name of the Dataset/API
Provider
ESA
Short Description The Sentinel-3 mission carries multiple instruments to
measure sea-surface topography, sea and land-surface
temperature, ocean- and land-surface colour, contributing to
the Copernicus marine, land, atmosphere, emergency, security
and cryosphere applications. It is based on a constellation of
two identical satellites, Sentinel-3A and Sentinel-3B, launched
separately.
Extended Description Primary geophysical products provided by the Sentinel-3
mission are:
· global coverage Sea Surface Height (SSH) for ocean
and coastal areas;
· enhanced resolution SSH products in coastal zones
and sea-ice regions;
· global coverage Sea Surface Temperature (SST) and
sea-Ice Surface Temperature (IST);
· global coverage ocean colour and water quality
products;
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 78
· global coverage ocean surface wind speed
measurements;
· global coverage significant wave height
measurement;
· global coverage atmospheric aerosol consistent over
land and ocean;
· global coverage total column water vapour over land
and ocean;
· global coverage vegetation products;
· global coverage land ice/snow surface temperature
products;
· ice products (e.g., ice surface topography, extent,
concentration).
Secondary geophysical products provided by the Sentinel-3
mission are:
· global coverage fire monitoring products (e.g. fire
radiated power, burned area, risk maps);
· · inland water (lakes and rivers) surface height data.
Geographical Coverage One Sentinel-3 satellite provides a revisit time of 27 days (385
orbits). OLCI’s field of view and its swath width of 1270 km,
allows global coverage at the equator to be provided in 2–4
days with one satellite and in less than two days with two
satellites.
Access Mechanism On 9 March 2018, Level-1 and Level-2 Sentinel-3 OLCI PDUs,
full and reduced resolution, began to be released through the
Sentinel-3 Pre-Operational Data Hub.
5.1.5 Sentinel 3 SLSTR (Sea and Land Surface Temperature Radiometer)
Field Value
Internal Name of the
Dataset
Sentinel 3 SLSTR
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 79
Name of the Dataset/API
Provider
ESA
Short Description The Sentinel-3 mission carries multiple instruments to
measure sea-surface topography, sea and land-surface
temperature, ocean- and land-surface colour, contributing to
the Copernicus marine, land, atmosphere, emergency, security
and cryosphere applications.
Extended Description The sensors / main instruments of the Sentinel-3 mission are:
· Ocean and Land Colour Instrument (OLCI);
· Sea and Land Surface Temperature Radiometer (SLSTR);
· SAR Radar Altimeter (SRAL);
· MicroWave Radiometer (MWR);
· Precise Orbit Determination (POD), which consists of 3
instruments: DORIS: a Doppler Orbit Radio positioning system;
GNSS: a GPS receiver, providing precise orbit determination
and tracking multiple satellites simultaneously; LRR: to
accurately locate the satellite in orbit using a Laser Retro-
Reflector system.
The Sea and Land Surface Temperature Radiometer (SLSTR) is
a dual scan temperature radiometer, which has been selected
for the low Earth orbit (800 - 830 km altitude) ESA Sentinel-3
operational mission as a part of the Copernicus (Global
Monitoring for Environment and Security) programme. SLSTR
is the successor of the (A)ATSR series (aboard the ERS and
ENVISAT missions).
The main objective of SLSTR products is to provide global and
regional Sea and Land Surface Temperature (SST, LST) to a very
high level of accuracy (better than 0.3 K for SST) for both
climatological and meteorological applications.
SLSTR is mostly known for its marine applications (SST – Sea
Surface Temperature), but it also provides information related
to biomass burning (fire detection and classification). SLSTR
also contributes to climate studies by bringing several of the
required Essential Climate Variables (ECVs) to the scientific
community.
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 80
Geographical Coverage The mean global coverage revisit time for dual view SLSTR
observations is 1.9 days at the equator (one operational
spacecraft) or 0.9 days (in constellation with a 180° in-plane
separation between the two spacecraft) with these values
increasing at higher latitudes due to orbital convergence.
Timespan
Access Level Sentinel-3 SLSTR products are made available systematically
and free of charge to all data users including the general public,
scientific and commercial users.
Access Mechanism Sentinel-3A SLSTR data products are available via the
Copernicus Open Access Hub.
5.1.6 MODIS data
Field Value
Internal Name of the
Dataset
MODIS
Name of the Dataset/API
Provider
NASA
Short Description The Moderate-resolution Imaging Spectroradiometer (MODIS)
is a scientific instrument (radiometer) on board the NASA Terra
and Aqua satellite platforms, launched in 1999 and 2002
respectively to study global dynamics of the Earth atmosphere,
land, ice and oceans.
Extended Description MODIS captures data in 36 spectral bands ranging in
wavelength from 0.4 um to 14.4 um and at varying spatial
resolutions (2 bands at 250 m, 5 bands at 500 m and 29 bands
at 1 km), providing complete global coverage of the Earth
every 1 to 2 days. Both Terra and Aqua platforms are in sun
synchronous, near polar (98 degree) orbits at 705 km altitude
but with a descending local equatorial crossing time of
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 81
10:30am in the case of Terra and a 1:30pm ascending crossing
time for Aqua.
MODIS Terra Global Level 3 Mapped Thermal SST products
consists of sea surface temperature (SST) data derived from
the 11 and 12 um thermal IR infrared (IR) bands (MODIS
channels 31 and 32). Daily, weekly (8 day), monthly and annual
MODIS SST products are available at both 4.63 and 9.26 km
spatial resolution and for both daytime and night-time passes
Rightsholder MODIS products are available courtesy of GSFC – NASA.
Geographical Coverage The orbit of the Terra satellite goes from north to south across
the equator in the morning and Aqua passes south to north
over the equator in the afternoon resulting in global coverage
every 1 to 2 days
5.1.7 Proba-V data
Field Value
Internal Name of the
Dataset
Proba-V
Name of the Dataset/API
Provider
Vito
Short Description The Proba-V mission provides multispectral images to study
the evolution of the vegetation cover on a daily and global
basis. The 'V' stands for Vegetation. This mission is extending
the dataset of the long-established Vegetation instrument,
flown as a secondary payload aboard France's SPOT-4 and
SPOT-5 satellites launched in 1998 and 2002 respectively. The
Proba-V mission has been developed in the frame of the ESA
General Support Technology Program (GSTP). The
Contributors to the Proba-V mission are Belgium, Luxembourg
and Canada.
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 82
Extended Description Proba-V’s main applications are related to monitoring plant
and forest growth, as well as inland water bodies. The
Vegetation instrument can distinguish between different land
cover types and plant species, including crops, to reveal their
health, as well as detect water bodies and vegetation burn
scars.
The VEGETATION instrument is pre-programmed with an
indefinite repeated sequence of acquisitions. This nominal
acquisition scenario allows a continuous series of identical
products to be generated, aiming to map land cover and
vegetation growth across the entire planet every two days.
Geographical Coverage The mission, developed as part of ESA's Proba Programme, is
an ESA EO mission providing global coverage every two days,
with latitudes 35-75°N and 35-56°S covered daily, and
between 35°N and 35°S every 2 days
Timespan
Access Level
Access Mechanism PROBA-V products can be ordered and downloaded from the
PROBA-V Product Distribution Portal (PDP) at
http://www.vito-eodata.be/. Products are usually available
within 24 hours after sensing time (max 48 hours). Figure 8
shows the portal’s main page.
URI https://www.vito-
eodata.be/PDF/portal/Application.html#Home
5.1.8 Global Precipitation Measurement (GPM) mission data
Field Value
Internal Name of the
Dataset
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 83
Name of the Dataset/API
Provider
Short Description Global Precipitation Measurement (GPM) is an international
satellite mission to provide next-generation observations of
rain and snow worldwide every three hours.
Extended Description NASA and the Japanese Aerospace Exploration Agency (JAXA) launched the GPM Core Observatory satellite on February 27th, 2014, carrying advanced instruments that set a new standard for precipitation measurements from space.
The foundation of the GPM mission is the Core Observatory satellite provided by NASA and JAXA. Data collected from the Core satellite serves as a reference standard that unifies precipitation measurements from research and operational satellites launched by a consortium of GPM partners in the United States, Japan, France, India, and Europe.
The Core satellite measures rain and snow using two science instruments: the GPM Microwave Imager (GMI) and the Dual-frequency Precipitation Radar (DPR). The GMI captures precipitation intensities and horizontal patterns, while the DPR provides insights into the three dimensional structure of precipitating particles. Together these two instruments provide a database of measurements against which other partner satellites’ microwave observations can be meaningfully compared and combined to make a global precipitation dataset.
Rightsholder NASA
Update Frequency The GPM constellation of satellites can observe precipitation
over the entire globe every 2-3 hours
Geographical Coverage The GPM constellation of satellites can observe precipitation
over the entire globe every 2-3 hours
Access Mechanism https://pmm.nasa.gov/data-access/downloads/gpm
URI https://pmm.nasa.gov/GPM
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 84
5.1.9 KNMI (Koninklijk Nederlands Meteorologisch Instituut) precipitation data
Field Value
Internal Name of the
Dataset
KNMI
Name of the Dataset/API
Provider
KNMI Data Centre (KDC)
Short Description The KNMI Data Centre (KDC) provides access to weather,
climate and seismological datasets of KNMI (Koninklijk
Nederlands Meteorologisch Instituut).
Extended Description The primary tasks of KNMI are weather forecasting,
monitoring of climate changes and monitoring seismic activity.
KNMI is also the national research and information centre for
climate, climate change and seismology.
Rightsholder KDC
Geographical Coverage KNMI Products cover the Netherlands and surrounding areas.
Access Mechanism Access to most is unrestricted and provided under the
'OpenData' policy of the Dutch government. For what
concerns the specific precipitation KNMI dataset described in
this document, the access is free, but a registration is needed.
The Multisensor Evolution Analysis (MEA) technology (C41.01
Databio component) provides access to the above mentioned
KNMI precipitation data.
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 85
5.1.10 CMEMS (Copernicus Marine Environment Monitoring Service) data
Field Value
Internal Name of the
Dataset
CMEMS
Name of the Dataset/API
Provider
Copernicus
Short Description The CMEMS (Copernicus Marine Environment Monitoring
Service) provides regular and systematic core reference
information on the state of the physical oceans and regional
seas. The observations and forecasts produced by the service
support all marine applications.
Extended Description From May 2015, Copernicus Marine Environment Monitoring
Service (CMEMS) is working on an operational mode. It follows
the MyOcean demonstration phase that enabled to open the
service on a pre-operational mode during 6 years.
The service is meant for any user requesting generic
information on the ocean, and especially downstream service
providers who use this information as an input to their own
value-added services to end-users. The CMEMS can be defined
as:
• An integrated Service;
• An Open and Free service;
• Providing access to a single Catalogue of products;
• A reliable service;
• A sustainable service.
Data Type Copernicus Marine products are delivered in netCDF format
(.nc). They can easily be downloaded through the CMEMS
interface. Data are directly available through services like CSW
catalog (Catalog Services for Web), WMS (Web Map Service),
Subsetter, Direct Get File, FTP.
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 86
Access Mechanism In order to provide a standard access to the CMEMS Products,
the FedEO Gateway (C07.01) has been extended with an
additional connector for the CMEMS web service interface. By
this way, CMES products can be retrieved and downloaded
through the same components FedEO Gateway (C07.01) and
the Data Manager (C07.04) via the same standard OpenSearch
Interface compliant with OGC 13-026r8 than other EO
products such as Sentinel products
URI
5.1.11 Sentinel 2A (ESA D11.01)
Field Value
Internal Name of the Dataset D11.01
Name of the Dataset/API
Provider
Sentinel 2A
Short Description Sentinel 2B data provided by ESA. Multiples geographical
areas and various times
Extended Description https://sentinel.esa.int/web/sentinel/sentinel-data-access
https://scihub.copernicus.eu/twiki/do/view/SciHubWebPortal/API
HubDescription
Data Type EO data
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 87
Rightsholder ESA. License CC-BY.
Dataset/API Owner/Responsible ESA
Dataset/API Owner/Responsible
Contacts
Name of the System Sentinel
Dataset Data Model/API
Interface
REST
Data Model: Standards,
Glossaries and metadata
standards
SENTINEL SAFE
Data Volume ~GB
Update Frequency Every 5 days
Data Archiving and preservation Locally on TRAGSATEC Premises
Geographical Coverage Extremadura, Galicia
Timespan 2016 - End of the Project
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 88
5.1.12 Sentinel-2 Data
Field Value
Internal Name of the Dataset D14.01, D14.02
Name of the Dataset/API
Provider
Sentinel-2 Data
Short Description · Sentinel-2 L1 data (C14.01). Sentinel-2 L1 data
archive. ESA. Czech Republic
· Sentinel-1 IWS data (C14.02). Sentinel-1 L1 data
archive. EO data. Czech Republic
· Sentinel-2 HR Optical data (C14.03) Sentinel-2
archive. European Space Agency (ESA). Global
coverage
Extended Description NP has the data for its pilot areas (Τ1.2.1, Τ1.4.1, Τ1.4.2)
corresponding to 6 tiles. Thematic Exploitation Platforms,
such as the Forestry TEP (C16.10), are available for online
analytics.
Rightsholder ESA
Data Model: Standards,
Glossaries and metadata
standards
SENTINEL-SAFE format
Data Volume L1 data: Approximately 6Gb per scene
Update Frequency L1: 10 days revisit time, up to 5 days in Q2 2017
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 89
Geographical Coverage · Sentinel-2 L1 data (C14.01). Sentinel-2 L1 data
archive. ESA. Czech Republic
· Sentinel-1 IWS data (C14.02). Sentinel-1 L1 data
archive. EO data. Czech Republic
· Sentinel-2 HR Optical data (C14.03) Sentinel-2
archive. European Space Agency (ESA). Global
coverage.
Timespan L1: June 2015 - now
5.1.13 Sentinel 3 SRAL (Synthetic Aperture Radar Altimeter) data
5.1.14 Sentinel 3 MWR (Microwave Radiometer) data
5.2 Datasets improved by DataBio This section presents datasets that are improved by DataBio through processing or other data
management mechanisms.
5.2.1 RPAS (Remotely Piloted Aircraft Systems) data
Field Value
Internal Name of the
Dataset
RPAS
Name of the Dataset/API
Provider
TRAGSA
Short Description RPAS data, property of TRAGSA, are provided according to the
pilot needs. The images acquired are provided in 6 spectral
bands: RGB, Red Edge, NIR, Thermal, as well as point-cloud
Extended Description The delivery of RPAS imagery started in October 2017 and the
areas covered represent small parcels (hectares) within pilot
areas in the areas in the Iberian Peninsula - Spain
(Extremadura, Andalucia, Castilla y León, Castilla La Mancha,
Madrid). RPAS imagery are stored in TRAGSA Premises.
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 90
Timespan From 2017
5.2.2 Ortophotos
Field Value
Internal Name of the
Dataset
Ortophotos
Name of the Dataset/API
Provider
TRAGSA
Short Description The National Geographic Information Centre of Spain provides
a mosaic of the latest orthophotos of the National Plan for
Aerial Orthophotography.
Extended Description They are delivered in ETRS89 - The European Terrestrial
Reference System 1989 datum for the Iberian Peninsula,
Balearic Islands, Ceuta and Melilla, and WGS84 for the Canary
Islands and UTM projection in the corresponding zone.
Each unit (mosaic) covers a MTN50 sheet (National
Topographic Map at 1:50 000 scale).
All datasets are processed by TRAGSA to produce improved
images. Specifically, orthophotos will be transformed by an
orthorectification method developed under WP5. Component
C11.03 – Radiometric Corrections is a tool that provides colour
correction and homogenization process of orthophotos from
different areas and/or dates. This tool increase orthophotos
homogeneity and improve their subsequent possibilities of
use, both for agrarian and environmental purposes, using
image analysis automatized processes.
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 91
Access Mechanism Ortophotos are provided by Spanish National Geographic
Institute at
http://centrodedescargas.cnig.es/CentroDescargas/catalogo.
do
5.2.3 gaiasense field (D13.01)
Field Value
Internal Name of the Dataset D13.01
Name of the Dataset/API
Provider
gaiasense field
Short Description Dataset composed of measurements from NP’s telemetric IoT
agro-climate stations called GAIATrons.
Extended Description Dataset composed of field-sensing measurements from NP’s
network of telemetric IoT stations, called GAIAtrons. GAIAtrons
offer configurable data collection and transmission rates and
come in two variants. The GAIATron Atmo stations measures
atmospheric parameters (e.g. ambient temperature, humidity,
wind speed, direction, solar irradiance) whereas the GAIATron
Soil stations measures soil parameters (e.g. multi-depth soil
temperature, humidity). The coverage area for each station
varies and their spatial distribution is influenced by the
microclimatic variability of the monitored area.
Version 1.0
Initial Availability Date Beginning of 2016
Data Type Sensor measurements (numerical data) and metadata
(timestamps, sensor id, etc.)
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 92
Personal Data No personal data is being recorder and/or stored
Rightsholder NP
Dataset/API Owner/Responsible NP
Dataset/API Owner/Responsible
Contacts
Technology NODEJS, Python, Apache, Linux, MySQL, JSON
Name of the System GAIAtrons (IoT telemetry stations for in-field measurements
collection)
GAIABus DataSmart RealTime Subcomponent (for cloud-based
monitoring, validation, parsing and cross-checking of the incoming
data streams)
Dataset Data Model/API
Interface
Data Model: Standards,
Glossaries and metadata
standards
No standards are being used in glossaries and metadata
Data Identifier - Standard used No standards are being used
Data Model - Specific Data
Model
Custom data model that is designed to optimally address the needs
of the offered smart farming applications
Data Volume several GBs/year
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 93
Update Frequency The update frequency depends on the velocity of the incoming
data streams. GAIAtrons offer configurable data collection and
transmission rates, per station and monitored parameter, based
on the needs of the application
Data Archiving and preservation Data is preserved in local warehouses
Geographical Coverage Greek Pilot Areas (DataBio Pilots A1.1, B1.2, C1.1, C2.2)
Timespan 2016 until now
Access Level Restricted
Access Mechanism Query
5.2.4 Land use and properties - Greek agriculture pilots (NP - D13.02)
Anonymised IACS data
Field Value
Internal Name of the Dataset D13.02
Name of the Dataset/API
Provider 1.1.1 5.3.13 Land use and properties - Greek agriculture pilots
Short Description Dataset comprised of agricultural parcel positions expressed in
vectors along with several attributes and extracted multi-
temporal vegetation indices associated with them.
Extended Description Dataset comprised of thousands of agricultural parcel positions
expressed in vectors along with several attributes including
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 94
cultivating crop types, variety codes, and description. Further,
each object/parcel has been assigned with several extracted
statistical descriptors of different vegetation indices such as
NDVI, NDWI and SAVI that capture its status in various temporal
instances.
Version 1.0
Initial Availability Date Beginning of 2016
Data Type Parcel Geometries (WKT), alphanumeric parcel-related data and
metadata (e.g. timestamps)
Personal Data The dataset has been pseudonymized and the most revealing
fields within a data record (farmers’ identifiers) have been
replaced by artificial identifiers (parcel id). The
pseudonymization of the data allows the data to be tracked to
its origins, as the goal is to provide smart farming services to the
farmers, however, by following this process personal data can
no longer be attributed to a specific data subject without the
use of additional information. Fully aligned with the new GDPR,
NP keeps the additional information separately and all technical
and organizational measures have been established, ensuring
that the personal data are not directly attributed to an identified
or identifiable natural person.
Rightsholder NP
Dataset/API Owner/Responsible NP
Dataset/API Owner/Responsible
Contacts
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 95
Technology PostgreSQL, Python
Name of the System
Dataset Data Model/API
Interface
Data Model: Standards,
Glossaries and metadata
standards
No standards are being used in glossaries and metadata
Data Identifier - Standard used No standards are being used
Data Model - Specific Data
Model
Custom data model that is designed to optimally address the needs
of the offered smart farming applications
Data Volume several GBs/year
Update Frequency Periodically. The update frequency depends on the velocity of
the incoming EO data streams and the assignment of vegetation
indices statistics to each parcel. Currently, new Sentinel-2
products are available every 5 days approximately and the
dataset is updated in regular intervals
Data Archiving and preservation
Geographical Coverage Several areas within the Greek territory, including DataBio Pilots
A1.1, B1.2, C1.1 and C2.2.
Timespan 2016 until now
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 96
Access Level Restricted
Access Mechanism Query
URI
5.2.5 Customer and forest estate data (METSAK - D18.02)
Field Value
Internal Name of the Dataset D18.02
Name of the Dataset/API
Provider
Customer and forest estate data / Metsään.fi
Short Description The forest resource data is connected with the customer
and forest estate data of METSAK. The essential part of the
Metsään.fi eService use is the information on who owns
certain forest estates and who has the rights to read and to
use the forest resource data of a certain forest owner. The
pilot uses METSAK’s customer information system, which
contains all this data.
Version XML-file versions 1.4, 1.5, 1.6, 1.7
Initial Availability Date Year 2012 onwards
Data Type Relational database
Personal Data Private Forest Owners and Forest Service Providers
Rightsholder METSAK
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 97
Dataset/API Owner/Responsible Metsään.fi data / Anu Kosunen
Dataset/API Owner/Responsible
Contacts
XML data / Anu Kosunen/[email protected]
Technology XML writer provides the standardized XML data from the Forest
Resource Database
Name of the System Metsään.fi
Dataset Data Model/API
Interface
Metsään.fi user interface, Web Service and SOAP interfaces on the
back ground.
Data Model: Standards,
Glossaries and metadata
standards
XML standards
https://www.metsatietostandardit.fi/en/
Data Identifier - Standard used XML
Data Model - Specific Data
Model
https://www.bitcomp.fi/metsatietostandardit/
Data Volume 450 GB
Update Frequency Constant updates when needed.
Data Archiving and preservation N/A
Geographical Coverage Finland
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 98
Timespan from 2012 onwards
Access Level Registered users: Private Forest Owners and Forest Service
Providers
Access Mechanism https://tunnistaminen.suomi.fi
URI https://www.metsaan.fi/
5.3 New Datasets created during DataBio
5.3.1 Canopy height map (FMI - D14.05)
Field Value
Internal Name of the
Dataset
D14.05
Name of the Dataset/API
Provider
Canopy height map
Short Description Stand age (growth stages) according to canopy height model
derived from aerial stereo-orthophoto interpretation of Czech
Land Survey (data available countrywide every second year).
Spatial resolution 5 m. Distinguished 4 different growth stages
and absolute canopy height.
Extended Description Canopy height map, 20m resolution, pixel value corresponds to the
height of dominant tree species
Initial Availability Date Will be prepared in Q3 2017
Data Type GeoTiff
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 99
Rightsholder Property of FMI
Dataset/API
Owner/Responsible
Raster dataset
Dataset/API
Owner/Responsible
Contacts
Data Volume 4 GB
Update Frequency Fixed
Geographical Coverage Czech Republic
Timespan 2017
5.3.2 Orthophotos - (IGN - D11.02)
Field Value
Internal Name of the Dataset D11.02
Name of the Dataset/API
Provider
IGN Ortophotos
Short Description Orthophotos provided by Spanish National Geographic
Institute.
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 100
Extended Description Multiples geographical areas and various times.
Orthophotos from PNOA (Spanish coverage). RGB&NIR
bands. GSD= 25 cm. RMSE < 0,5 m
Initial Availability Date 2006
Data Type Images (WMS, PNG…)
Personal Data No
Rightsholder IGN. License CC-BY.
Dataset/API Owner/Responsible IGN
Dataset/API Owner/Responsible
Contacts
Name of the System Sentinel
Dataset Data Model/API
Interface
REST
Data Volume TB
Update Frequency Yearly
Geographical Coverage Whole Spanish Surface
Timespan 2016 - End of the Project
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 101
Access Level Free
URI http://centrodedescargas.cnig.es/CentroDescargas/catalogo
.do#selectedSerie
5.3.3 GEOSS sources (D11.03)
Field Value
Internal Name of the Dataset D11.03
Name of the Dataset/API
Provider
GEOSS Sources
Data Type EO data
Dataset/API Owner/Responsible TRAGSA-TRAGSATEC
Dataset/API Owner/Responsible
Contacts
Name of the System GEOSS
5.3.4 RPAS data (Tragsa - D11.04)
Field Value
Internal Name of the Dataset D11.04
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 102
Name of the Dataset/API
Provider
RPAS
Short Description RPAS data and Images
Extended Description RGB & Multispectral (6 bands: RGB+Red Edge + NIR) &
Thermal & point-cloud. Spatial features TBD according to
the pilot needs
Version
Initial Availability Date October 2017
Data Type Images: TIFF and JPEG
Personal Data No
Rightsholder Under agreement. Property of TRAGSA Group
Other Rights Information N/A
Dataset/API Owner/Responsible TRAGSA Group
Dataset/API Owner/Responsible
Contacts
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 103
Dataset Data Model/API
Interface
No API interface. Files in local folders.
Data Identifier - Standard used TIFF. JPEG.
Data Model - Specific Data
Model
.TIFF, .JPG, .LAS
Data Volume 60 Gb
Update Frequency 1-2 times year.
Data Archiving and preservation Stored in TRAGSA Premises
Geographical Coverage Small parcels within pilots areas. Hectares.
Timespan Meeting pilot needs.
Access Level Private.
Access Mechanism Under Request.
URI No.
5.3.5 MFE Spanish Forest Map (D11.06)
Field Value
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 104
Internal Name of the Dataset D11.06
Name of the Dataset/API
Provider
MFE50
Short Description Mapa Forestal Españolo (MFE) - Spanish Forestry Map
Extended Description MAPAMA (Spanish Ministry of Agriculture, Fisheries and
Environment)
Initial Availability Date From 1997
Data Type ESRI Shapefile
Personal Data No
Rightsholder Free
Other Rights Information MAPAMA (Spanish Ministry of Agriculture, Fisheries and
Environment)
Dataset/API Owner/Responsible TRAGSA-TRAGSATEC
Dataset/API Owner/Responsible
Contacts
Name of the System MFE (Spanish Forest Map)
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 105
Dataset Data Model/API
Interface
http://www.mapama.gob.es/es/biodiversidad/servicios/banco-
datos-naturaleza/informacion-
disponible/mfe50_descargas_comunidad_madrid.aspx
Data Model: Standards,
Glossaries and metadata
standards
Cartography, vectors
Data Model - Specific Data
Model
ESRI Shape File
Data Volume ~Mb
Update Frequency Every 10 years
Geographical Coverage Spain
Timespan From 1997, updated every 10 years
Access Level Open Access, Specific license not defined
5.3.6 Field data - pilot B2 (Tragsa - D11.07)
Field Value
Internal Name of the Dataset D11.07
Name of the Dataset/API
Provider
Field data - pilot B2
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 106
Short Description Data acquired by IoT Sensors. Scientific data from field
samples.
Extended Description Direct observations and Direct & Lab measurements:
Chlorophyll content, morphology, green & dry weight,
hydric potential, Leaf Area Index (LAI), visual classification
of damages. Features TBD according to the pilot needs
Rightsholder Under agreement. Property of TRAGSA Group
Dataset/API Owner/Responsible
Contacts
Data Identifier - Standard used CSV
Data Volume ~Mb
Update Frequency Daily
Data Archiving and preservation TRAGSA Premises
Geographical Coverage Study sites TBD in: Extremadura, Galicia
Timespan Specific dates TBD according to the pilot needs: 2017-2019
Access Level TRAGSA-TRAGSATEC
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 107
5.3.7 Forest damage (FMI - D14.07)
Field Value
Internal Name of the Dataset D14.07
Name of the Dataset/API
Provider
Forest damage
Short Description In-situ observations of forest damage. FMI. Czech Republic.
Forestry statistics for selected plots - information about
the amount of salvage cutting.
Extended Description Derived from Wuudis mobile application
Initial Availability Date 2017
Data Type Photography, numeric values
Rightsholder FMI
Dataset/API Owner/Responsible
Contacts
Name of the System
Dataset Data Model/API
Interface
SQL, REST
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 108
Data Model: Standards,
Glossaries and metadata
standards
GeoTiff, CSV
Data Volume Gigabytes
Update Frequency Based on field campaigns
Geographical Coverage Czech Republic
5.3.8 Open Forest Data (METSAK - D18.01)
Field Value
Internal Name of the Dataset D18.01
Name of the Dataset/API
Provider
Open Forest Data / METSAK
Short Description The pilot uses METSAK’s forest resource data concerning
privately owned Finnish forests from METSAK’s forest
resource data system. The forest resource data consists of
basic data of tree stands (development class, dominant
tree species, scanned height, scanned intensity, stand
measurement date), strata of tree stands (mean age, basal
area, number of stems, mean diameter, mean height, total
volume, volume of logwood, volume of pulpwood), growth
place data (classification, fertility class, soil type, drainage
state, ditching year, accessibility, growth place data source,
growth place data measurement date), geometry and
compartment numbering. The forest resource data is
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 109
available in a standard format for external use with
consent of a forest owner.
Extended Description The forest resources are invented once in a decade per
certain area using remote sensing (airborne laser scanning)
and aerial photographs. The new data is analysed and in
some parts measured in the field. Other updates on the
forest resource data are yearly growth calculations, possible
notifications of forest use or other forestry operations or so
called Kemera financing operations and possible new aerial
photographs to be interpreted.
Version OGC GeoPackage with 1.2 RTree
XML version 1.7
Initial Availability Date 1.3.2018 Download services, Q2/2018 API’s
Data Type Open forest data including forest resource data as well as GIS data
Personal Data N/A
Rightsholder METSAK
Dataset/API Owner/Responsible METSAK Open forest data/ Juha Inkilä
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 110
Dataset/API Owner/Responsible
Contacts
METSAK Open forest data (Avoin metsätieto)/METSAK /
Technology WMS, WFS and REST
Name of the System Open forest data (Avoin metsätieto)
Dataset Data Model/API
Interface
XML standard/REST,
OGC GeoPackage standard / WFS, WMS from Oracle
database
Data Model: Standards,
Glossaries and metadata
standards
https://www.metsatietostandardit.fi/en/
Data Identifier - Standard used XML, OGC, WFS, WMS, REST
Data Model - Specific Data
Model
https://www.bitcomp.fi/metsatietostandardit/
Data Volume 276,8 GB on June 2018
Update Frequency Daily
Geographical Coverage Finland
Timespan From 1.3.2018 onwards
Access Level Open
URI https://www.metsaan.fi/rajapinnat
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 111
5.3.9 Hyperspectral image orthomosaic (Senop - D44.02)
Field Value
Internal Name of the Dataset D44.02
Name of the Dataset/API
Provider
Hyperspectral image orthomosaic
Short Description Orthorectified hyperspectral mosaic, n-bands, band-
matched.
Data Type ENVI /multipage TIF/single band TIF
5.3.10 Leaf area index (FMI - D14.06)
Field Value
Internal Name of the Dataset D14.06
Name of the Dataset/API
Provider
Leaf area index
Short Description Leaf area index and canopy closure for selected National
forest inventory sites in Czech Republic. Based on
interpretation of digital hemispherical photos (in total 2457
images collected for 189 sites). Provided as input
hemispherical photos and vector point layer with centroid
of forest plot and LAI values in attribute table.
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 112
Extended Description In-situ sampling of DHP was based on the scheme, which
takes into account the Sentinel-2 satellite spatial resolution
(20 m pixel size) while the number of photos and their
spatial layout was selected according to Majasalmi et al.
(2012) as star shape with 13 sampled points in four principal
azimuths - north, south, east and west. Each sampled point
was positioned 3 meters apart. Sampling scheme for digital
hemispherical photography.
The images were taken with a Nikon D5500 digital SLR
camera with a Sigma 4.5 mm circular fisheye lens. The
camera was placed on a Vanguard Espod CX203 AP tripod
and aligned horizontally with a two-axis level. All photos
were shot with lens facing north and taken as RAW
uncompressed images. In total 189 forest plots were
sampled, from which 79 stands were dominated by
coniferous trees (42% of the samples) and 110 stands with
the dominant presence of deciduous trees (58% of all
samples). All field plots were visited during the period of
maximum vegetation foliage, for 2016 and 2017 in June to
August, while in 2015 was the test period, where photos
were taken only for evergreen coniferous plots, mostly in
October.
All DHP photos were analysed in Hemisfer software (WSL,
Switzerland). The software uses the LAI value inversion
from angular distribution of canopy gaps for a set of
statistically representative set of images.
Version 1.0
Initial Availability Date 1.1.2018
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 113
Data Type Image, numeric values
Rightsholder Property of FMI
Dataset/API Owner/Responsible FMI
Dataset/API Owner/Responsible
Contacts
Technology Digital hemispherical photography
Data Model: Standards,
Glossaries and metadata
standards
GeoTiff, CSV
Data Volume Approx 10 GB
Update Frequency Based on field campaigns, three dedicated field campaigns
conducted in 2015, 2016 and 2017
Data Archiving and preservation Local file storage
Geographical Coverage Czech Republic
Timespan 2015-2017
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 114
5.3.11 NASA CMR Landsat Datasets via FedEO Gateway (SPACEBEL - D07.02)
Field Value
Internal Name of the Dataset D07.02
Name of the Dataset/API
Provider
NASA CMR Landsat Datasets via FedEO Gateway
Short Description All datasets and collections metadata (including Landsat-8
collections) provided by the NASA Common Metadata
Repository (CMR), around 32000 collections, are accessible
through an OGC 13-026r8 OpenSearch interface via the
FedEO Gateway
Extended Description All datasets and collections metadata (including Landsat-8
collections) provided by the NASA Common Metadata
Repository (CMR), around 32000 collections, are accessible
through an OGC 13-026r8 OpenSearch interface via the
FedEO Gateway (C07.01). The available geographical area
and the temporal coverage for the datasets/products are
specified in each collection metadata. In the case of Landsat-
8, the coverage is the global world starting on April 2013. To
download Landsat-8 products, an account is needed on
EROS Registration System (ERS) at the following URL
https://ers.cr.usgs.gov/register/. The download URL is
included in the catalog search response.
Collections and then products metadata including the
product download URL metadata can be accessed via the
component C07.05 FedEO Portlet acting as client of the
FedEO Gateway (C07.01). The following picture illustrates
the retrieval of Landsat-8 datasets through the FedEO
Portlet (C07.05).
Initial Availability Date April 2013
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 115
Dataset/API Owner/Responsible NASA/USGS - access point via Spacebel/ESA FedEO Gateway to
collections including Landsat-8
Dataset/API Owner/Responsible
Contacts
[email protected], [email protected]
Dataset Data Model/API
Interface
Mission/collection specific. Product metadata returned is OGC 10-
157r4 compliant. Metadata contains download URL .OGC 13-026r8
OpenSearch.
Geographical Coverage Global
Timespan Landsat-8 Starts 2013-04, other collections have other temporal
extents which can be found in the metadata and Atom dc:date
elements.
Access Mechanism Accessible through an OGC 13-026r8 OpenSearch interface
via the FedEO Gateway. To download Landsat-8 products,
an account is needed on EROS Registration System (ERS) at
the following URL https://ers.cr.usgs.gov/register/. The
download URL is included in the catalog search response.
Requires having a username and password at Sentinels
Scientific Data Hub which is to be used inside the
OpenSearch request to the FedEO Gateway
(geo.spacebel.be).
5.3.12 Ontology for (Precision) Agriculture (PSNC -D09.01)
Field Value
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 116
Internal Name of the Dataset D09.01
Name of the Dataset/API
Provider
Gateway Ontology for (Precision) Agriculture
Short Description The (FOODIE) ontology enables the representation of data
compliant with FOODIE data model in semantic format and
their interlinking with established vocabularies and
ontologies (e.g., AGROVOC).
Extended Description Thus, in line with FOODIE data model, different
agricultural-related concepts can be described and
represented, including agricultural facilities, crop and soil
data, treatments, interventions, agriculture machinery, etc.
Also, in line with FOODIE data model, the ontology is based
on the INSPIRE directive, ISO standards (e.g. 19156, 19157)
and OGC standards. The ontology can be used for different
semantic tasks, such as data semantization for the
transformation of (semi-)structured data (e.g., tabular,
relational) to semantic format; ontology-based data
access, e.g., accessing relational databases as virtual, read-
only RDF graphs; publication of linked data, including the
discovery of links with relevant datasets in the Linked Open
Data cloud.
Rightsholder Creative Commons Attribution 3.0
Dataset/API Owner/Responsible
Contacts
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 117
Dataset Data Model/API
Interface
SPARQL
Data Model: Standards,
Glossaries and metadata
standards
OWL
Data Volume Dataset 100Kb
Geographical Coverage Agnostic
Timespan Agnostic
5.3.13 Open Land Use (Lespro - D02.01)
Field Value
Internal Name of the Dataset D02.01
Name of the Dataset/API
Provider
Open Land Use
Short Description Open Land Use Map is a composite map that is intended to
create detailed land-use maps of various regions based on
certain pan-Europen datasets such as CORINE Landcover,
UrbanAtlas enriched by available regional data.
The dataset is derived from available open datasources at
different levels of detail and coverage. These data sources
include:
1) Digital cadastral maps if available
2) Land Parcel Identification System if Available
3) Urban Atlas(European Environmental Agency)
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 118
4) CORINE Land Cover 2006 (European Environmental
Agency)
5) Open Street Map
The order of the data sources is according to the level of
detail and, therefore, the priority for data integration.
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 119
Extended Description
The Open Land Use (OLU) data model joins two basic data
models of the INSPIRE Land Use specification – existing
land use and planned land use. The main difference among
INSPIRE data models and OLU model has been caused by
the fact that OLU data model connects planning and
existing land use data. In the OLU the different attributes
are used for both types of land use data.
Land use involves management and modification of natural
environment or wilderness into built environment such as
fields, pastures, and settlements. It also has been defined as
"the arrangements, activities and inputs people undertake
in a certain land cover type to produce, change or maintain
it" (FAO, 1997a; FAO/UNEP, 1999). Land use practices vary
considerably across the world. The United Nations' Food and
Agriculture Organization Water Development Division
explains that "Land use concerns the products and/or
benefits obtained from use of the land as well as the land
management actions (activities) carried out by humans to
produce those products and benefits." The OLU model also
follows INSPIRE land use specification (uses same data
attributes; the set of used attributes is larger than in the case
of Land Use Database Schema), but it works with more
simple view on data. Both models are transformable to each
other and it is also possible to migrate data from these
model to or from other datasets that are in harmony with
INSPIRE specification. The main reason for above-
mentioned differences is determine by different usage of
data and data models. OLU will be used for any land use (and
land cover) data, Land Use Database Schema serves just to
spatial planning data as a special part of land use data. There
are several datasets which could be used for creating
harmonised land use dataset. Land use is a dataset which is
used in many specialisms including agriculture, spatial or
urban planning, environment protection and maintenance
and restoration of environmental functions.Currently Open
Land Use cover all EU with different level of accuracy:
Europe
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 120
The base European dataset is derived from the set of
available data sources that are helping identify the land use
in particular locality. The list of the sources used so far on
the Pan-European level includes:
1. Urban Atlas
2. CORINE Land Cover 2012
The sources are mentioned in the order they were combined
(1 - has the highest geometrical and semantic precedence
and so on) to create the map.
Czech Republic
The dataset is derived from the set of available data
sources that are helping identify the land use in particular
locality. The list of the sources used so far includes:
1) Digital Cadastre
2) LPIS (Land Parcel Identification System)
3) Urban Atlas
4) CORINE Land Cover
Austria
The dataset is derived from the set of available data
sources that are helping identify the land use in particular
locality. The list of the sources used so far includes:
1) LPIS (Land Parcel Identification System)
2) Urban Atlas
3) CORINE Land Cover
Flanders
The dataset is derived from the set of available datasources
that are helping identify the landuse in particular locality.
The list of the sources used so far includes:
1) GRBGis Large Scale Reference Database
2) Urban Atlas
3) CORINE Land Cover
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 121
Version
Initial Availability Date 2015
Rightsholder Plan4all
Dataset/API
Owner/Responsible
LESP
Dataset/API
Owner/Responsible Contacts
Technology GML
Name of the System Open Land Use
Dataset Data Model/API
Interface
REST, OGC WMS, WFS
Data Model: Standards,
Glossaries and metadata
standards
GML
Data Volume Hundreds of GB
Update Frequency Semi annually
Geographical Coverage Europe
Timespan 2015 - present
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 122
URI Open Land Use is available on
http://sdi4apps.eu/open_land_use/
5.3.14 Phenomics, metabolomics, genomics and environmental datasets (CERTH -
DS40.01)
Field Value
Internal Name of the Dataset DS40.01
Name of the Dataset/API
Provider
Phenomics, metabolomics, genomics and environmental
datasets
Short Description This dataset includes phenomics, metabolomics, genomics
as well as environmental data. Genomic predictions and
selection data are also there.
Data Type Raw text, CSV data
Dataset/API Owner/Responsible
Contacts
[email protected], [email protected]
Data Volume 1-12 MB
Geographical Coverage Regions of Thessalia
5.3.15 Quality control data (METSAK - D18.04)
Field Value
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 123
Internal Name of the Dataset D18.04
Name of the Dataset/API
Provider
Quality control data
Short Description The quality control data consists of forest estate, number of
the financing conclusion, geometry of compartments type
of the forest work, sample plot locations, measured data per
sample plot, measurement averages per compartment,
measurement date and user information. The quality
control data will be added to the existing forest data
standard during 2017.
Extended Description The quality control data of the work done in forest is part of
the Best Practice Guidelines for Forest Management. The
data is already being collected and saved in METSAK’s
information systems, but the amount of that data needs to
be increased. The data is planned to be collected also
through a mobile application.
This pilot is about presenting the quality control data in
Metsään.fi eService for forest owners and forestry
operators, and supporting the requirement specification of
a new mobile application and its interfaces. In Metsään.fi
the forest owners should be able to follow the quality of
work done in their forests and compare it to the national
average. The forestry operators have the quality data of
their own work done in forest in Metsään.fi and also the
possibility to compare it to the national average.
Version v1.0.0
Initial Availability Date Q3/2018
Data Type Quality control data for young stand improvement and tending of
seedling stands.
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 124
Personal Data End user information and personal data.
Rightsholder METSAK
Dataset/API Owner/Responsible Mobile app. dataset owner MHGS/ Seppo Huurinainen
Dataset/API Owner/Responsible
Contacts
METSAK forest resource database (KantoRiihi) / Aki Hostikka /
Technology Mobile app. in JSON, Quality control data in XML, SOAP, REST
Name of the System Laatumetsä mobile app., METSAK Forest Resource DataBase
(KantoRiihi)
Dataset Data Model/API
Interface
Laatumetsä Mobile app. user interface,
REST
SOAP
METSAK Forest Resource Database (KantoRiihi)
Data Model: Standards,
Glossaries and metadata
standards
REST, SOAP, JSON, XML
https://www.metsatietostandardit.fi/en/
Data Identifier - Standard used XML - https://www.metsatietostandardit.fi/en/
Data Model - Specific Data
Model
https://www.bitcomp.fi/metsatietostandardit/
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 125
Data Volume Expected to be 200 GB together with Storm and Forest Damages
dataset
Update Frequency Online
Data Archiving and preservation METSAK Forest Resource Database (KantoRiihi)
Geographical Coverage Finland
Timespan From Q3/2018 onwards
Access Level Available for registered users.
Access Mechanism https://tunnistaminen.suomi.fi
URI https://www.wuudis.com/fi/laatumetsa/
5.3.16 Sentinels Scientific Hub Datasets via FedEO Gateway (SPACEBEL -D07.01)
Sentinel Products available on the Sentinels Scientific Data Hub (Sentinel-1, Sentinel-2) can
be discovered and accessed via the FedEO Gateway (C07.01) that returns Sentinel collections
and datasets metadata (including product download URL) via an OGC 13-026r8 OpenSearch
interface. The available geographical area is the global world and the temporal coverage starts
on April 2014 for Sentinel-1 and June 2015 for Sentinel-2. The access to the datasets metadata
and the products requires an account (user/password) that can be obtained at
https://scihub.copernicus.eu/dhus/#/self-registration. Access to Sentinel Products and
metadata information can be done via the user interface of the FedEO Portlet (C07.05).
Field Value
Internal Name of the Dataset D07.01
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 126
Name of the Dataset/API
Provider
Sentinels Scientific Hub Datasets via FedEO Gateway
Short Description Sentinel Products available on the Sentinels Scientific Data
Hub (Sentinel-1, Sentinel-2) can be discovered and accessed
via the FedEO Gateway (C07.01) that returns Sentinel
collections and datasets metadata (including product
download URL) via an OGC 13-026r8 OpenSearch interface.
The available geographical area is the global world and the
temporal coverage starts on April 2014 for Sentinel-1 and
June 2015 for Sentinel-2. The access to the datasets
metadata and the products requires an account
(user/password) that can be obtained at
https://scihub.copernicus.eu/dhus/#/self-registration.
Access to Sentinel Products and metadata information can
be done via the user interface of the FedEO Portlet (C07.05).
Extended Description All datasets (collections) available through the Sentinels Scientific
Hub are accessible through standard protocols via the Spacebel
component C07.01 FedEO Gateway. These collections include:
Sentinel-1, Sentinel 2, … Detailed collection information is
published by ESA/Spacebel in the FedEO Collection Catalog and can
be made available in various metadata flavours including ISO19139,
ISO19139-2, ISO MENDS, DIF-10 or visualised on a user interface.
Examples are shown below.
http://geo.spacebel.be/opensearch/request?uid=EOP:ESA:SENTINE
L_1,
http://geo.spacebel.be/opensearch/request?uid=EOP:ESA:S2MSI1C
Dataset/API
Owner/Responsible
Spacebel
Dataset/API
Owner/Responsible Contacts
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 127
Dataset Data Model/API
Interface
OGC 13-026r8 OpenSearch.
Geographical Coverage Global
Timespan Sentinel-1 Starts 2014-04 (see below), Sentinel-2 starts 2015-06
(See below), other collections have other temporal extents.
Access Mechanism Requires having a user name and password at Sentinels Scientific
Data Hub which is to be used inside the OpenSearch request to
the FedEO Gateway (geo.spacebel.be).
5.3.17 SigPAC (Tragsa - D11.05)
CAP Information System is a Land parcel identification system. It is provided by the Junta de
Castilla y Leon (Autonomic Government).
Field Value
Internal Name of the Dataset D11.05
Name of the Dataset/API
Provider
SigPAC
Short Description LPIS - Land parcel identificacion system.
Extended Description A land-parcel identification system (LPIS) is a system to identify
land use for a given country. It utilises orthophotos – basically
aerial photographs and high precision satellite images that are
digitally rendered to extract as much meaningful spatial
information as possible. A unique number is given to each land
parcel to provide a unique identification in space and time. This
information is updated regularly to monitor the evolution of the
land cover and the management of the crops.
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 128
Initial Availability Date Starting date of the project
Data Type ESRI Shape, SQLITE Databases
Personal Data No
Rightsholder FEGA - CAP payment Agency in Spain
Dataset/API Owner/Responsible
Contacts
www.mapama.gob.es
Data Model: Standards,
Glossaries and metadata
standards
More information at:
https://ec.europa.eu/jrc/en/research-topic/agricultural-monitoring
Data Model - Specific Data
Model
There are some commonalities among the european countries but
LPIS model is different in each member state.
Data Volume Lower than 1Gb
Update Frequency Yearly
Geographical Coverage Spain
Access Level Free in some regions. Private in others.
5.3.18 Smart POI dataset (Lespro - D02.01)
Field Value
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 129
Internal Name of the Dataset D02.01
Name of the Dataset/API
Provider
Smart POI dataset
Extended Description The Smart Points of Interest dataset (SPOI) is the seamless
and open resource of POIs that is available for all users to
download, search or reuse in applications and services
SPOI’s principal target is to provide information as Linked
Data together with other dataset containing road network.
The added value of the Smart approach in comparison to
other similar solutions consists in implementation of linked
data, using of standardized and respected datatype
properties and development of the completely harmonized
dataset with uniform data model and common
classification.
The SPOI dataset is created as a combination of global data
(selected points from OpenStreetMap) and local data
provided by the SDI4Apps partners or data available on the
web. The dataset can be reached by Sparql endpoint
(http://data.plan4all.eu/sparql), for detailed information
please follow: http://sdi4apps.eu/spoi.
Rightsholder It is available under Open Data Commons Open Database
License (ODbL ~
http://opendatacommons.org/licenses/odbl/)
5.3.19 Stand age map (FMI - D14.04)
Field Value
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 130
Internal Name of the Dataset D14.04
Name of the Dataset/API
Provider
Stand age map
Short Description Vector layer based on Czech forest management plans and
stand age based on detailed forest inventory. It is
countrywide with 10 years update interval.
5.3.20 Storm and forest damage observations and possible risk areas (METSAK -
D18.03a)
Field Value
Internal Name of the Dataset D18.03a
Name of the Dataset/API
Provider
Storm and forest damage observations and possible risk
areas
Short Description One of the new data concerning this pilot is storm and
forest damage observations, which are planned to be
crowdsourced. The storm damage observations consist of
location, type of the damage, evaluation of the extent of
the damage, tree species and distance from the road. The
storm and forest damage data supplements forest
resource data. Possible storm and forest damage areas are
evaluated based on the damage observations collected.
The possible risk areas are presented to the users on a map
layer.
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 131
Extended Description This information is currently gathered by METSAK with the
forest use declaration process. To improve the overall
management of storm damages and to prevent the
possible further damages it is extremely important to get
the field data and information as soon as possible. One
way to gather this type of information is to provide a
mobile app, which allows every (wo)man to report their
observations for the forestry experts at Finnish Forest
Centre. Based on the crowdsourced information forestry
experts are able to react faster the before, which can
prevent the further damages for instance caused by the
pest attacks. Also the damaged wood material could be
faster routed to the most suitable place for further
processing.
Version v1.0.0
Initial Availability Date Q4/2018
Data Type XML
Personal Data No personal data gathered
Rightsholder METSAK
Other Rights Information MHGS provides the mobile app for data collection
Dataset/API Owner/Responsible Mobile app. dataset owner MHGS/ Seppo Huurinainen
METSAK / Virpi Stenman
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 132
Dataset/API Owner/Responsible
Contacts
METSAK forest damages database/Mikko Kesälä/
Technology Mobile app. in JSON, Storm and forest damages data in XML, SOAP,
REST
Name of the System Laatumetsä mobile app,
Mestakeskus map service
(https://metsakeskus.maps.arcgis.com/home/index.html)
Dataset Data Model/API
Interface
WMS-maps, XML standardization is on going.
METSAK user interface
Laatumetsä Mobile app. user interface,
REST
SOAP
Data Model: Standards,
Glossaries and metadata
standards
REST, SOAP, JSON, XML
https://www.metsatietostandardit.fi/en/
Data Identifier - Standard used XML, OGC and WMS-maps
XML - https://www.metsatietostandardit.fi/en/
Data Model - Specific Data
Model
https://www.bitcomp.fi/metsatietostandardit/
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 133
Data Volume Expected to be 200 GB together with Quality Control dataset
(Laatumetsä)
Update Frequency Online
Data Archiving and preservation The data is stored and backuped in METSAK map service database
Geographical Coverage Finland
Timespan Q4/2018 onwards
Access Level open
Access Mechanism open
URI Mobile app: https://www.wuudis.com/fi/laatumetsa/
METSAK map service:
https://metsakeskus.maps.arcgis.com/home/index.html
5.3.21 Forest road condition observations (METSAK - D18.03b)
Field Value
Internal Name of the Dataset D18.03b
Name of the Dataset/API
Provider
Forest road condition observations / Roads.ML
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 134
Short Description One of the new data concerning this pilot is forest road
condition observations, which are planned to be
crowdsourced. The forest road condition observations
consist of location, type of the road based on digiroad
map, evaluation of the condition of the road, possible road
limitations or obstacles on the road as well as the forest
development classes for the road surroundings. The road
and forest felling potential data supplements open forest
data forest resource data. In future, possible priorities in
road improvement activities might be evaluated based on
the road condition observations collected. Both, the
observed condition of the road and related felling potential
are presented to the users on a map layer, which is openly
available.
Extended Description This information is not currently gathered by METSAK. To
increase the knowledge regarding the current road
network condition and availability is extremely important
for the logistic chain of the forest industry as well as for
ensuring the wood supply. The crowdsourcing i.e. a mobile
app can be utilized for collecting the field data and
information as soon as possible. Based on the
crowdsourced information forestry experts within the
forest industry sector are able to react faster than before,
which can prevent possible hiccups in the wood supply
chain.
Version v1.0.0
Initial Availability Date Q4/2018
Data Type WMS
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 135
Personal Data No personal data gathered
Rightsholder METSAK
Other Rights Information Roads.ML provides the mobile app for data collection
Dataset/API Owner/Responsible Mobile app. dataset owner Roads.ml/ Jussi-Pekka Martikainen
METSAK map service / Mikko Kesälä
Dataset/API Owner/Responsible
Contacts
METSAK forest road map/Mikko Kesälä/
Technology Mobile app. in PostGres, Forest road data provided as GIS interface
Name of the System Roads.ml mobile app, Mestakeskus map service (
https://metsakeskus.maps.arcgis.com/home/index.html)
Dataset Data Model/API
Interface
WMS-maps. METSAK user interface for WMS map.
Roads.ml Mobile app. user interface,
REST
Data Model: Standards,
Glossaries and metadata
standards
REST, PostGres
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 136
Data Identifier - Standard used OGC and WMS-maps
Data Model - Specific Data
Model
Based OGC standard
Data Volume Expected to be around 20 GB
Update Frequency Online
Data Archiving and preservation Postgres database
Geographical Coverage Finland
Timespan Q4/2018 onwards
Access Level open
Access Mechanism open
URI Mobile app: www.roads.ml
METSAK map service:
https://metsakeskus.maps.arcgis.com/home/index.html
5.3.22 Tree species map (FMI - D14.03)
Field Value
Internal Name of the Dataset D14.03
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 137
Name of the Dataset/API
Provider
Tree species map
Short Description Tree species map. Raster dataset based on classification of
Sentinel-2 multi-temporal data and National forest
inventory of Czech Republic. 20 m spatial resolution,
distinguished six most abundant tree species in Czech
Republic.
Data Type Raster dataset
Rightsholder Property of FMI
Dataset/API Owner/Responsible
Contacts
lukes.petr@@uhul.sz
Data Model: Standards,
Glossaries and metadata
standards
GeoTiff
Data Volume 1 Gb
Update Frequency Fixed
Geographical Coverage Czech Republic
Timespan 2017
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 138
5.3.23 Wuudis data (MHGS - D20.01)
Wuudis uses the Finnish forest information standard as basic data import/export format.
Wuudis service data model is based on the Finnish forest information standard. All
development activities during the DataBio project that will affect to the Wuudis data model
are based on Finnish forest information standard. Forest information standard includes a set
of different standardized schemas (like timber sales, logistics etc.). Some of these schemas
can be used in the DataBio and some new specifications are developed during project.
Basic information about the forest information standard: http://www.metsatietostandardit.fi/en
. Base forest information standard XML schema description can be found here:
https://extra.bitcomp.fi/metsastandardi_ehdotus/V8/MV/doc/index.html . This schema includes
basic forest property data, stands, operations, tree stratums. Everything is based on this basic
real estate information. Whole schema repository can be found here: https://www.bitcomp.fi/metsatietostandardit/
Wuudis also has open REST API that uses plain JSON which is faster than standard based XML
data transfer. With JSON interface different kind of query parameters can be also used and
data can be fetched in parts (like single stand or operation). All available resources are listed
in the WADL documentation: https://wuudis.com/api/application.wadl
One important dataset for Wuudis is different map layers. Wuudis uses global map services
like Google and Microsoft (Bing) to provide world-wide satellite map layers to the end users.
Wuudis also provides map layers from National Land Survey of Finland’s WMS/WMTS service.
More information about National Land Survey of Finland map services can be found here: http://www.maanmittauslaitos.fi/en/maps-and-spatial-data/maps/view-maps .
Field Value
Internal Name of the Dataset D20.01
Name of the Dataset/API
Provider
Wuudis data
Short Description Wuudis uses the Finnish forest information standard as basic
data import/export format. Wuudis service data model is
based on the Finnish forest information standard. All
development activities during the DataBio project that will
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 139
affect to the Wuudis data model are based on Finnish forest
information standard
Extended Description Forest information standard includes a set of different
standardized schemas (like timber sales, logistics etc.). Some
of these schemas can be used in the DataBio and some new
specifications are developed during project.
Basic information about the forest information standard:
http://www.metsatietostandardit.fi/en . Base forest information
standard XML schema description can be found here:
https://extra.bitcomp.fi/metsastandardi_ehdotus/V8/MV/doc/ind
ex.html . This schema includes basic forest property data,
stands, operations, tree stratums. Everything is based on this
basic real estate information. Whole schema repository can
be found here: https://www.bitcomp.fi/metsatietostandardit/
Wuudis also has open REST API that uses plain JSON which is
faster than standard based XML data transfer. With JSON
interface different kind of query parameters can be also used
and data can be fetched in parts (like single stand or
operation). All available resources are listed in the WADL
documentation: https://wuudis.com/api/application.wadl
One important dataset for Wuudis is different map layers.
Wuudis uses global map services like Google and Microsoft
(Bing) to provide world-wide satellite map layers to the end
users. Wuudis also provides map layers from National Land
Survey of Finland’s WMS/WMTS service. More information
about National Land Survey of Finland map services can be
found here: http://www.maanmittauslaitos.fi/en/maps-and-
spatial-data/maps/view-maps .
5.4 Recommended interaction structures: ATOS As presented in previous sections in this document, each of Databio’s pilots require a
heterogeneous set of datasets that are made available in different remote systems, formats,
encodings as well as spatial and temporal resolutions.
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 140
This section exemplary describes how some of the most commonly used datasets in the pilots
are managed/used by Databio’s components in an interoperable manner by making use of
standardized interfaces protocols (APIs).
Datas
et
name
:
DATASET NAME
Pilot: A1 /3.2.1 Oceanic tuna fisheries immediate operational choices
Comp
onent
:
C05.01 Rasdaman
API/O
perati
on:
OGC WCS - GetCoverage
Exam
ple:
Retrieve a subset area, encoded as GML, from the variable [variable name] covering the whole Indian Ocean for a specific date. Request: http://150.254.165.231:8080/rasdaman/ows?&SERVICE=WCS&VERSION=2.0.1&REQUEST=GetCoverage&COVERAGEID=mlotst&SUBSET=Lat(13.41,14.82)&SUBSET=Long(76.67,78.14)&SUBSET=ansi(%222018-06-26T00:00:00.000Z%22,%222018-06-26T00:00:00.000Z%22)&FORMAT=application/gml+xml Response: <gmlcov:ReferenceableGridCoverage
xmlns="http://www.opengis.net/gml/3.2"
xmlns:gml="http://www.opengis.net/gml/3.2"
xmlns:gmlcov="http://www.opengis.net/gmlcov/1.0"
xmlns:swe="http://www.opengis.net/swe/2.0"xmlns:wcs="http://www.opengi
s.net/wcs/2.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
gml:id="mlotst" xsi:schemaLocation="http://www.opengis.net/wcs/2.0
http://schemas.opengis.net/wcs/2.0/wcsAll.xsd">
<boundedBy>
<Envelope axisLabels="Lat Long ansi" srsDimension="3"
srsName="http://localhost:8080/def/crs-
compound?1=http://localhost:8080/def/crs/EPSG/0/4326&2=http://localh
ost:8080/def/crs/OGC/0/AnsiDate" uomLabels="degree degree d">
<lowerCorner>
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 141
13.35467349552 76.618871415346 "2018-06-26T00:00:00.000Z"
</lowerCorner>
<upperCorner>
14.8527528809232 78.2007400554937 "2018-06-26T00:00:00.000Z"
</upperCorner>
</Envelope>
</boundedBy>
<domainSet>
<gmlrgrid:ReferenceableGridByVectors
xmlns:gmlrgrid="http://www.opengis.net/gml/3.3/rgrid" dimension="3"
gml:id="mlotst-grid"
xsi:schemaLocation="http://www.opengis.net/gml/3.3/rgrid
http://schemas.opengis.net/gml/3.3/referenceableGrid.xsd">
<limits>
<GridEnvelope>
<low>182 620 21</low>
<high>199 638 21</high>
</GridEnvelope>
</limits>
<axisLabels>Lat Long ansi</axisLabels>
<gmlrgrid:origin>
<Point gml:id="mlotst-origin"
srsName="http://localhost:8080/def/crs-
compound?1=http://localhost:8080/def/crs/EPSG/0/4326&2=http://lo
calhost:8080/def/crs/OGC/0/AnsiDate">
<pos>
14.811139564662 76.66049953745515 "2018-06-
26T00:00:00.000Z"
</pos>
</Point>
</gmlrgrid:origin>
<gmlrgrid:generalGridAxis>
<gmlrgrid:GeneralGridAxis>
<gmlrgrid:offsetVector srsName="http://localhost:8080/def/crs-
compound?1=http://localhost:8080/def/crs/EPSG/0/4326&2=http://
localhost:8080/def/crs/OGC/0/AnsiDate">-0.0832266325224 0
0</gmlrgrid:offsetVector>
<gmlrgrid:coefficients/>
<gmlrgrid:gridAxesSpanned>Lat</gmlrgrid:gridAxesSpanned>
<gmlrgrid:sequenceRule
axisOrder="+1">Linear</gmlrgrid:sequenceRule>
</gmlrgrid:GeneralGridAxis>
</gmlrgrid:generalGridAxis>
<gmlrgrid:generalGridAxis>
<gmlrgrid:GeneralGridAxis>
<gmlrgrid:offsetVector srsName="http://localhost:8080/def/crs-
compound?1=http://localhost:8080/def/crs/EPSG/0/4326&2=http://
localhost:8080/def/crs/OGC/0/AnsiDate">0 0.0832562442183
0</gmlrgrid:offsetVector>
<gmlrgrid:coefficients/>
<gmlrgrid:gridAxesSpanned>Long</gmlrgrid:gridAxesSpanned>
<gmlrgrid:sequenceRule
axisOrder="+1">Linear</gmlrgrid:sequenceRule>
</gmlrgrid:GeneralGridAxis>
</gmlrgrid:generalGridAxis>
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 142
<gmlrgrid:generalGridAxis>
<gmlrgrid:GeneralGridAxis>
<gmlrgrid:offsetVector srsName="http://localhost:8080/def/crs-
compound?1=http://localhost:8080/def/crs/EPSG/0/4326&2=http://
localhost:8080/def/crs/OGC/0/AnsiDate">0 0
1</gmlrgrid:offsetVector>
<gmlrgrid:coefficients>"2018-06-
26T00:00:00.000Z"</gmlrgrid:coefficients>
<gmlrgrid:gridAxesSpanned>ansi</gmlrgrid:gridAxesSpanned>
<gmlrgrid:sequenceRule
axisOrder="+1">Linear</gmlrgrid:sequenceRule>
</gmlrgrid:GeneralGridAxis>
</gmlrgrid:generalGridAxis>
</gmlrgrid:ReferenceableGridByVectors>
</domainSet>
<rangeSet>
<DataBlock>
<rangeParameters/>
<tupleList cs=" " ts=",">
-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-
32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-
32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-
32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-
32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-
32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-
32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-
32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-
32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-
32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-
32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-
32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-
32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-
32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-
32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-
32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-
32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-
32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-
32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-
32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-
32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-
32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-
32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-
32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-
32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-
32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-
32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-
32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-
32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-
32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-
32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-
32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-
32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-
32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-
32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-
32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 143
32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-
32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-
32767
</tupleList>
</DataBlock>
</rangeSet>
<coverageFunction>
<GridFunction>
<sequenceRule axisOrder="+2 +1 +3">Linear</sequenceRule>
<startPoint>182 620 21</startPoint>
</GridFunction>
</coverageFunction>
<gmlcov:rangeType>
<swe:DataRecord>
<swe:field name="Gray">
<swe:Quantity xmlns:swe="http://www.opengis.net/swe/2.0">
<swe:label>Gray</swe:label>
<swe:nilValues>
<swe:NilValues>
<swe:nilValue reason="">-32767</swe:nilValue>
</swe:NilValues>
</swe:nilValues>
<swe:uom code="10^0"/>
</swe:Quantity>
</swe:field>
</swe:DataRecord>
</gmlcov:rangeType>
<gmlcov:metadata/>
</gmlcov:ReferenceableGridCoverage>
Dataset
name:
DATASET NAME
Pilot: A1 /3.2.1 Oceanic tuna fisheries immediate operational choices
Compone
nt:
C05.01 Rasdaman
API/Opera
tion:
OGC WCPS - ProcessCoverage
Example: Calculate the mean value from variable “mlotst” for the whole Indian Ocean for all time periods and return it as text.
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 144
Request: http://150.254.165.231:8080/rasdaman/ows?&SERVICE=WCS&VERSION=2.0.1&REQUEST=ProcessCoverages& query=for $s in ( mlotst ) return encode( avg($s), "text/csv" ) Response:
38.206644819155315
Dataset
name:
DATASET NAME
Pilot: A1 /3.2.1 Oceanic tuna fisheries immediate operational choices
Compon
ent:
C05.01 Rasdaman
API/Oper
ation:
OGC WCS - ProcessCoverage
Example: Produce a colorized map (in png format) of the whole Indian Ocean Area depending on the values of the “mlotst” variable for a specific time period Request: http://150.254.165.231:8080/rasdaman/ows?&SERVICE=WCS&VERSION=2.0.1&REQUEST=ProcessCoverages& query=for $c in ( mlotst ) return encode(switch case $c[ansi("2018-05-30"), Lat(-35:30), Long(25:115)] = 99999 return {red: 255; green: 255; blue: 255} case 18 > $c[ansi("2018-05-30"), Lat(-35:30), Long(25:115)] return {red: 0; green: 0; blue: 255} case 23 > $c[ansi("2018-05-30"), Lat(-35:30), Long(25:115)] return {red: 255; green: 255; blue: 0} case 30 > $c[ansi("2018-05-30"), Lat(-35:30), Long(25:115)] return {red: 255; green: 140; blue: 0} default return {red: 255; green: 0; blue: 0} , "image/png")
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 145
Response:
Dataset name: DATASET NAME
Pilot: Pilot 1.3.1.B1.1: Cereals and biomass crop
Component: C05.02 FIWARE IoT Hub
API/Operation: CRUD Operations under RESTful API
Example: Device Registration:
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 146
End-point URL: http://{$host_url:$host_port}/iot/devices Payload example in JSON format:
{"devices": [ {"device_id": "raspberryPI1", "entity_name": "Field1", "entity_type": "Field", "protocol": "MQTT", "timezone": "Europe/Madrid", "attributes": [ { "name": "leaf_condensation", "type": "double", "metadatas": [ { "name": "units", "type": "string" } ] }, { "name": "temperature", "type": "double", "metadatas": [ { "name": "units", "type": "string" } ] }, { "name": "humidity", "type": "double", "metadatas": [ { "name": "units", "type": "string" } ] }, { "name": "soil_humidity", "type": "double", "metadatas": [ { "name": "units", "type": "string" } ] }, { "name":"Device", "type":"string" } ],
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 147
"commands": [ { "name": "ping", "type": "command" } ] } ] }
Data Handling Management: End-point URL: http://{$host_url:$host_port}/v1/admin/config Payload example in JSON format:
{ "service":"DataBio", "servicePath":"/Tragsa", "host":"http://localhost:8080", "in":[ { "id":"Field1", "type":"Field", "providers":[ "http://localhost:8081" ], "attributes":[ { "name":"leaf_condensation", "type":"double" }, { "name":"temperature", "type":"double" }, { "name":"humidity", "type":"double" }, { "name":"soil_humidity", "type":"double" }, { "name":"Device", "type":"string" } ] } ], "out":[ { "id":"DataBioEvent1", "type":"DataBioEvent", "attributes":[ { "name":"leaf_condensation", "type":"double" }, { "name":"temperature", "type":"double" }, { "name":"humidity", "type":"double" }, { "name":"soil_humidity", "type":"double" }, { "name":"Device", "type":"string" } ], "brokers": [ { "url":"http://localhost:1026", "serviceName": "DataBio", "servicePath": "/Tragsa" } ] } ], "statements":[
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 148
"INSERT INTO DataBioEvent SELECT leaf_condensation, temperature, humidity, soil_humidity, Device FROM Field Where leaf_condensation < 90 AND temperature > 15 AND 20 < humidity < 90 AND 0 < soil_humidity > 50" ] }
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 149
Concluding remarks The DataBio project is an EU lighthouse project with twenty-six pilots running over a hundred
of piloting sites across Europe in the three main bioeconomy sectors, agriculture, forestry,
and fishery. These sectors utilize, process and produce many datasets and streams that
creates value for both businesses and governments. This deliverable provides an overview of
datasets in the context of DataBio platform and pilots allowing the reader to gain insight into
why the data is needed, what the data provides and how it can be retrieved.
The requirements from the pilots and platform identifies datasets that are needed for the
pilot applications. The ArchiMate models provides trace links to the relevant components,
requirements and application goals through, allowing users to carry out coverage and orphan
analysis as well as traditional trace navigation.
The overview of datasets shows that DataBio pilots currently utilize 14 existing datasets,
improve 6 datasets by processing or enriching with other datapoints, and finally are creating
a total of 23 datasets. Each dataset is described with metadata in the DataBioHub. The
numbers are expected to grow during the project’s lifetime.
The first phase of the DataBio project has focused on the usage and creation of datasets based
on the needs and requirements of the DataBio pilots. The next phase will continue with this,
but will also have an increased focus on interoperability aspects of datasets through the use
of ontologies and potential standard data models and access mechanisms/services and APIs.
Further there will be an increased focus on secure data sharing and data exchange beyond
the individual pilots to support a growing data economy in the DataBio areas of agriculture,
forestry and fishery.
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 150
References Reference Name of document (include authors, version, date etc. where applicable)
[REF-01] European Commission, 2018: https://eur-lex.europa.eu/legal-
content/EN/ALL/?uri=COM:2018:0232:FIN
[REF-02] European Open Data Portal: https://data.europa.eu/euodp/data/
[REF-03] European Commission, January 2017: (https://ec.europa.eu/digital-single-
market/en/policies/building-european-data-economy
[REF-04] Transforming Transport (web): https://data.transformingtransport.eu/
[REF-05] Dunning, A.(2017). ‘Are FAIR data principles FAIR?’ LIBER Webinar.
http://www.ijdc.net/article/view/567. Retrieved 2018-08-21
[REF-06] Press, G. (2016). ‘Cleaning Big Data: most time-consuming, least enjoyable data
science task, survey says’, Forbes [Internet].
https://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-
time-consuming-least-enjoyable-data-science-task-survey-
says/#3cfa77426f63. Retrieved 2018-08-21.
[REF-07] Moons, B. et al. (2016). Realising the European Open Science Cloud.
https://ec.europa.eu/research/openscience/pdf/realising_the_european_ope
n_science_cloud_2016.pdf. Retrieved 2018-08-21
[REF-08] Wilkinson, M. D. et al. (2016). The FAIR Guiding Principles for scientific data
management and stewardship. Nature Scientific Data, 3, 2016.
doi:10.1038/sdata.2016.18.
[REF-09] FORCE 11 (2014) https://www.force11.org/fairprinciples, Retrieved 2018-08-
21.
[REF-10] European Commission (2016): H2020 Programme Guidelines on FAIR Data
Management in Horizon 2020.
http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/
oa_pilot/h2020-hi-oa-data-mgt_en.pdf. Retrieved 2018-08-21.
[REF-11] DataBioHub: https://www.databiohub.eu
[REF-12] https://www.earthobservations.org/geoss.php
[REF-13] https://inspire.ec.europa.eu/sites/default/files/geodcat-ap.pdf
[REF-14] http://micka.bnhelp.cz/
[REF-15] https://ckan.org/
[REF-16] 5-star scheme, Tim Berners Lee: https://5stardata.info/de/
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 151
[REF-17] Go-Fair Initiative (https://www.go-fair.org/)
[REF-18] Dublin Core MetaData Initiative http://dublincore.org/
[REF-19] Creative Commons: https://creativecommons.org/ns
[REF-20] DataBio deliverable D6.2 “Data Management Plan”, June 30, 2017
[REF-21] Common license types for datasets (https://help.data.world/hc/en-
us/articles/115006114287-Common-license-types-for-datasets, retrieved
2019-08-21).
[REF-22] DataBio deliberable D5.i2 “EO data sets, formats and sets”, https://rid-
redmine.intrasoft-intl.com/projects/databio/dmsf?folder_id=1685
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 152
Appendix A Metadata template table
Field Value
Internal Name of the
Dataset
Name of the Dataset/API
Provider
Short Description
Extended Description
Version
Initial Availability Date
Data Type
Personal Data
Rightsholder
Other Rights Information
Dataset/API
Owner/Responsible
Dataset/API
Owner/Responsible
Contacts
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 153
Technology
Name of the System
Dataset Data Model/API
Interface
Data Model: Standards,
Glossaries and metadata
standards
Data Identifier - Standard
used
Data Model - Specific Data
Model
Data Volume
Update Frequency
Data Archiving and
preservation
Geographical Coverage
Timespan
Access Level
Access Mechanism
D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018
Dissemination level: PU -Public Page 154
URI